Managing Test Data and Stress
Testing Your SQL Applications
       Speaker: Joel Champagne



     San Francisco SQL Server User Group
              January 12, 2011




        Mark Ginnebaugh, User Group Leader
               www.bayareasql.org
Tonight’s Speaker
        g      p

   Joel Champagne
     Developer  of large enterprise applications for 20 years
     Focus on data, in particular using the Microsoft stack (SS*S)
                                                             (SS S)
      and .NET
     Involved in all stages of application life-cycle, from
      envisioning through implementation
     Areas of Interest:
             Tool development work
             Developer productivity

    Tonight’s T i Managing Test D t and Stress Testing
    T i ht’ Topic: M     i T t Data d St       T ti
     Your SQL Applications
Upcoming Training 
              Upcoming Training
• Upcoming full‐day training (minimal cost, target is 
    p      g       y       g(             , g
  late Feb or March 2011):  
  www.codexframework.com/training
   –   SQL Source Control
       SQL Source Control
   –   Stress/Volume Testing
   –   SQL Unit Testing
   –   SQL‐Hero – more details (www.codexframework.com)

• Email Joel:  joelc@codexframework.com
       l    l     l    d f        k
• Twitter:  @sqlheroguy
What I want to cover…
          What I want to cover…
•   The why & how
    The why & how
•   Large data volumes – benefits, practical looks
•   Specific examples, in detail
    S ifi            l i d il
•   Obfuscation of existing data
•   Load testing
•   Both MS and non MS options
    Both MS and non‐MS options
•   Let’s keep it interactive
In the beginning…
             In the beginning…
• … of the development process
  … of the development process
  – We can know characteristics of entities
  – We can know ways to optimize (e.g. indexes)
                   y      p        ( g        )
  – We can have good intentions
• Ultimately, the little details matter:
  Ultimately, the little details matter:
  – Style counts! – not always shortest or most 
    elegant performs best
  – SQL can seem like an art instead of a science 
    sometimes
Problem is…
                   Problem is…
• How can we know what we’ve got is going to:
   o ca e o         at e e got s go g to:
   – Perform well, not just as we develop, but years from 
     now, in production
   – Perform well if reality changes
   – Actually  behave as expected with lots of data
• A d
  And, we’d like to:
         ’d lik t
   – Work with semi‐realistic data, even before users have 
     had a chance to do a lot of interaction with the app
     had a chance to do a lot of interaction with the app
   – In some cases may want to work with a “well known” 
     data set, to support repeatable unit & system testing
Solutions
• VS 2010
  VS 2010
  – Database project ‐> Data Generation Plan
  – Premium/Ultimate
            /
  – Custom generator extensibility
• SQL‐Hero
  SQL Hero
  – Generate data option
• Others
  – Custom developed (scripting, bcp, PowerShell, 
    etc.)
Things to consider…
           Things to consider…
• “Realism”
   Realism
  – Cardinality
  – Use of NULL
  – Foreign key lookups
  – Implied rules (sequence number example)
       p          ( q                     p )
  – … essentially rules at both column and table level 
    (constraints)
  – Names, addresses, etc. – pros, cons
• Deterministic vs. True Random
Understanding reality…
               Understanding reality…
Cardinality:
Scenario #1
                Scenario #1
• New development effort empty database
  New development effort, empty database
• We have a search screen we’ve written –
  seems like it s fast but not a lot of data
  seems like it’s fast but not a lot of data
• … what about 3 years from now?
  – 12,000,000+ Customers
  – 26,000,000+ Orders
  – 16,500,000 Addresses
Options #1
               Options #1
• VS 2010
  VS 2010
• SQL‐Hero

• Data Model
• Demos
Scenario #2
                      Scenario #2
• Let’s take an end‐to‐end look at a “real example” 
   et s ta e a e d to e d oo at a ea e a p e
  from a customer…
  – Team structure
  – “The strategy”
       • A fourth database, user participation
  –   Design doc from BA
      Design doc from BA
  –   The process of creating data and testing
  –   Tuning efforts
      Tuning efforts
  –   Re‐testing
  –   Conclusions…
Scenario #3
                  Scenario #3
• Large database, on‐going development work, 
     g           ,   g g            p          ,
  post‐implementation
• As we try to modify or fix bugs, some issues rely 
  on production‐quality data
  on production quality data
• Option:  Copy prod to dev/QA
• Problems:
   – Security
   – Coordinating with on‐going dev work
• Another scenario:  just looking to add more data 
  to an existing DB
Options #3
                 Options #3
• VS 2010
  VS 2010
  – Data Transformation
• SQL Hero
  SQL‐Hero
  – “Scramble” existing data option


• Demos
Issue here:  Assumes inherently that should reseed…

Conclusions…
‐ Data generation plans are very extensible (due to custom generators and such)
but do have limitations:  prepare to invest some time in getting them “just right”
Scenario #4
                 Scenario #4
• Actual “stress testing”, being high concurrent 
  Actual  stress testing being high concurrent
  load
• Need to understand the results of high load
  Need to understand the results of high load
  – Slowness
  – Of
    Often times, blocking
            i    bl ki
  – Sometimes deadlocks we didn’t anticipate
  – Have seen lead to on‐going monitoring, tuning 
    efforts
Options #4
• SQLIO
   – SQLIOStress creates separate data and log files to simulate the I/O 
     patterns that SQL Server will generate to its data file (.mdf) and its log file 
     (.ldf). SQLIOStress does not use the SQL Server engine to perform the 
     (.ldf). SQLIOStress does not use the SQL Server engine to perform the
     stress activity so it can be used to exercise a computer before you install 
     SQL Server." (From SQLIOStress Readme.doc)
• SQL Profiler
   – C
     Can collect a workload and “replay”
           ll         kl d d “ l ”
• VS 2010
   – Extensive Load Testing support – Test rigs, multiple agents possible
   – Invocation of test cases in NET pros and cons
     Invocation of test cases in .NET – pros and cons
• SQL‐Hero
   – Executing script with concurrency option – generate high load using “real” 
     trace  simple, visualization for results 
     trace – simple, visualization for results ‐ successful
   – Data visualization of collected trace information (often the key in analysis)
   – Data visualization of trace information, over longer timeframes
   – Production monitoring   g
   – Screen shots:  daily life examples
   – Future:  Template to build web test code to invoke from VS2010 test rig
Putting it together…
            Putting it together…
• Build process
   u d p ocess
  – Build DB from SC ‐> Populate Fully ‐> Use
     • Advantage:  you know will match source of truth
     • Disadvantage:  longer‐term testing, relying on existing data
  – Scripting / automation
  – Microsoft guidance doc:
    Microsoft guidance doc: 
    http://vsdatabaseguide.codeplex.com
  – Hybrid options common
• Another important element:  Unit testing
  – Knowing if something becomes broken, early
To learn more or inquire about speaking opportunities, please
                   q            p     g pp           ,p
                           contact:

Mark Ginnebaugh, User Group Leader mark@designmind.com

SQL Server Managing Test Data & Stress Testing January 2011

  • 1.
    Managing Test Dataand Stress Testing Your SQL Applications Speaker: Joel Champagne San Francisco SQL Server User Group January 12, 2011 Mark Ginnebaugh, User Group Leader www.bayareasql.org
  • 2.
    Tonight’s Speaker g p  Joel Champagne  Developer of large enterprise applications for 20 years  Focus on data, in particular using the Microsoft stack (SS*S) (SS S) and .NET  Involved in all stages of application life-cycle, from envisioning through implementation  Areas of Interest:  Tool development work  Developer productivity Tonight’s T i Managing Test D t and Stress Testing T i ht’ Topic: M i T t Data d St T ti Your SQL Applications
  • 3.
    Upcoming Training  Upcoming Training • Upcoming full‐day training (minimal cost, target is  p g y g( , g late Feb or March 2011):   www.codexframework.com/training – SQL Source Control SQL Source Control – Stress/Volume Testing – SQL Unit Testing – SQL‐Hero – more details (www.codexframework.com) • Email Joel:  joelc@codexframework.com l l l d f k • Twitter:  @sqlheroguy
  • 4.
    What I want to cover… What I want to cover… • The why & how The why & how • Large data volumes – benefits, practical looks • Specific examples, in detail S ifi l i d il • Obfuscation of existing data • Load testing • Both MS and non MS options Both MS and non‐MS options • Let’s keep it interactive
  • 5.
    In the beginning… In the beginning… • … of the development process … of the development process – We can know characteristics of entities – We can know ways to optimize (e.g. indexes) y p ( g ) – We can have good intentions • Ultimately, the little details matter: Ultimately, the little details matter: – Style counts! – not always shortest or most  elegant performs best – SQL can seem like an art instead of a science  sometimes
  • 6.
    Problem is… Problem is… • How can we know what we’ve got is going to: o ca e o at e e got s go g to: – Perform well, not just as we develop, but years from  now, in production – Perform well if reality changes – Actually  behave as expected with lots of data • A d And, we’d like to: ’d lik t – Work with semi‐realistic data, even before users have  had a chance to do a lot of interaction with the app had a chance to do a lot of interaction with the app – In some cases may want to work with a “well known”  data set, to support repeatable unit & system testing
  • 7.
    Solutions • VS 2010 VS 2010 – Database project ‐> Data Generation Plan – Premium/Ultimate / – Custom generator extensibility • SQL‐Hero SQL Hero – Generate data option • Others – Custom developed (scripting, bcp, PowerShell,  etc.)
  • 8.
    Things to consider… Things to consider… • “Realism” Realism – Cardinality – Use of NULL – Foreign key lookups – Implied rules (sequence number example) p ( q p ) – … essentially rules at both column and table level  (constraints) – Names, addresses, etc. – pros, cons • Deterministic vs. True Random
  • 9.
    Understanding reality… Understanding reality… Cardinality:
  • 10.
    Scenario #1 Scenario #1 • New development effort empty database New development effort, empty database • We have a search screen we’ve written – seems like it s fast but not a lot of data seems like it’s fast but not a lot of data • … what about 3 years from now? – 12,000,000+ Customers – 26,000,000+ Orders – 16,500,000 Addresses
  • 11.
    Options #1 Options #1 • VS 2010 VS 2010 • SQL‐Hero • Data Model • Demos
  • 13.
    Scenario #2 Scenario #2 • Let’s take an end‐to‐end look at a “real example”  et s ta e a e d to e d oo at a ea e a p e from a customer… – Team structure – “The strategy” • A fourth database, user participation – Design doc from BA Design doc from BA – The process of creating data and testing – Tuning efforts Tuning efforts – Re‐testing – Conclusions…
  • 17.
    Scenario #3 Scenario #3 • Large database, on‐going development work,  g , g g p , post‐implementation • As we try to modify or fix bugs, some issues rely  on production‐quality data on production quality data • Option:  Copy prod to dev/QA • Problems: – Security – Coordinating with on‐going dev work • Another scenario:  just looking to add more data  to an existing DB
  • 18.
    Options #3 Options #3 • VS 2010 VS 2010 – Data Transformation • SQL Hero SQL‐Hero – “Scramble” existing data option • Demos
  • 19.
  • 20.
    Scenario #4 Scenario #4 • Actual “stress testing”, being high concurrent  Actual  stress testing being high concurrent load • Need to understand the results of high load Need to understand the results of high load – Slowness – Of Often times, blocking i bl ki – Sometimes deadlocks we didn’t anticipate – Have seen lead to on‐going monitoring, tuning  efforts
  • 21.
    Options #4 • SQLIO – SQLIOStress creates separate data and log files to simulate the I/O  patterns that SQL Server will generate to its data file (.mdf) and its log file  (.ldf). SQLIOStress does not use the SQL Server engine to perform the  (.ldf). SQLIOStress does not use the SQL Server engine to perform the stress activity so it can be used to exercise a computer before you install  SQL Server." (From SQLIOStress Readme.doc) • SQL Profiler – C Can collect a workload and “replay” ll kl d d “ l ” • VS 2010 – Extensive Load Testing support – Test rigs, multiple agents possible – Invocation of test cases in NET pros and cons Invocation of test cases in .NET – pros and cons • SQL‐Hero – Executing script with concurrency option – generate high load using “real”  trace  simple, visualization for results  trace – simple, visualization for results ‐ successful – Data visualization of collected trace information (often the key in analysis) – Data visualization of trace information, over longer timeframes – Production monitoring g – Screen shots:  daily life examples – Future:  Template to build web test code to invoke from VS2010 test rig
  • 27.
    Putting it together… Putting it together… • Build process u d p ocess – Build DB from SC ‐> Populate Fully ‐> Use • Advantage:  you know will match source of truth • Disadvantage:  longer‐term testing, relying on existing data – Scripting / automation – Microsoft guidance doc: Microsoft guidance doc:  http://vsdatabaseguide.codeplex.com – Hybrid options common • Another important element:  Unit testing – Knowing if something becomes broken, early
  • 28.
    To learn moreor inquire about speaking opportunities, please q p g pp ,p contact: Mark Ginnebaugh, User Group Leader mark@designmind.com