Query DIFF Utility Comparing the data


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Show Access query.
  • Refer to file: documentation.htm Note that CF debug info shows password, even though form does not.
  • Oracle passwords are case-sensitive in SQL*PLUS (interactive SQL), but not in ColdFusion. System tables in Oracle not visible (by default) to non-owners, probably can be GRANTed. MS-SQL shows system tables, without GRANTing. MS_SQL query shown is not very efficient, but extra joins have been left in so that whole-schema query (without tablename restriction) will show only user tables of interest.
  • Query DIFF Utility Comparing the data

    1. 1. Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão [email_address]
    2. 2. “ Why would you want to do that?” <ul><li>Specifically – verify that data transferred from one system to another is identical before deleting it at the source. </li></ul><ul><li>Generally – have a quality-assurance tool to analyze data differences: </li></ul><ul><ul><li>Verify expected matches </li></ul></ul><ul><ul><li>Report on expected differences </li></ul></ul><ul><ul><li>Identify surprise differences and matches </li></ul></ul>
    3. 3. Applications <ul><li>Find Differences in 2 similar tables </li></ul><ul><ul><li>See how a table has changed over the week </li></ul></ul><ul><ul><li>QA for data entry effort (before and after) </li></ul></ul><ul><ul><li>Verify expected changes due to code mods </li></ul></ul><ul><li>Compare the column names from MS-SQL and Oracle schemas </li></ul><ul><ul><li>Or compare length of text columns </li></ul></ul><ul><li>Analyze whether duplicate keys are duplicate rows (heavy hand on the “submit” button?) </li></ul><ul><li>See the results of modifying a query </li></ul><ul><ul><li>Inner join vs outer join </li></ul></ul><ul><ul><li>Null predicate </li></ul></ul>
    4. 4. More Applications: QA for Ported Systems or Regression Testing data conversion routine old system update new system update updated ported dbms dbms data conversion routine updated dbms ported dbms ported updated dbms Difference?
    5. 5. Reinventing the wheel? <ul><li>http://devex.macromedia.com/developer/gallery/index.cfm </li></ul><ul><ul><li>Cf_Venn : Remember Venn diagrams from school? This Custom Tag compares two lists, performing any or all of these operations: Union, Intersection, and Differences. </li></ul></ul><ul><ul><li>TDataSourceCompare : Compares all tables, columns, datatypes of 2 MS SQL database objects that are registered as datasources on your server. Used for checking your Production database and Development database to make sure your columns all match before a critical launch. DB1 First Datasource name. DB2 Second Datasource name. </li></ul></ul><ul><ul><li>CF_List_Compare is a custom tag that will compare the values of two given lists (List 1, and List 2) </li></ul></ul>
    6. 6. Reinventing the wheel? August 7, 2002 SQL Script for Comparing the Contents of Two Tables Here's some SQL code from Eli Leiba that will all you to compare two tables -- say, table A and table B -- to determine if their content is the same. Assuming that A and B have the same structure, here's how it works. First, from set theory, recall that: If ((|A| = |B|) && |A U B| = |A|)) ====>>> A = B |A| = NUMBER of rows in A |B| = NUMBER of rows in B http://www.databasejournal.com/features/mssql/article.php/1441271
    7. 7. Reinventing the wheel? http://www.databasejournal.com/features/mssql/article.php/1441271 (continued) Here's the SQL code (with T-SQL syntax): declare @cnt1 int declare @cnt2 int declare @cnt3 int declare @res bit select @cnt1 = count(*) from A select @cnt2 = count(*) from B select @cnt3 = count('x') from (select * from A UNION select * from B) as t if (@cnt1 = @cnt2) and (@cnt2 = @cnt3) begin set @res = 1 print 'A = B' end else begin set @res = 0 print 'A <> B' end go
    8. 8. Previous implementation – UNION/JOIN query in Access <ul><li>Query is messy: </li></ul><ul><ul><li>query for each column name </li></ul></ul><ul><ul><li>query for extra rows in each of the 2 tables </li></ul></ul><ul><ul><li>extra predicates for nulls </li></ul></ul><ul><ul><li>extra predicates to flag duplicate rows </li></ul></ul><ul><ul><ul><li>(should table not have a primary key) </li></ul></ul></ul><ul><li>Access permits union across dissimilar columns (different data types) </li></ul><ul><li>Access DOES permit inner join and union across linked tables from different DBMS, but I have not tested this. </li></ul>
    9. 9. Previous implementation – UNIVAC’s Symbolic Stream Generator <ul><li>Generates source code (to be compiled and executed) according to parameters </li></ul><ul><li>Old technology (no longer available) </li></ul><ul><li>Required a lot of parameters </li></ul><ul><ul><li>One spec per column (location and type) </li></ul></ul><ul><li>Compared 2 files – very limited </li></ul><ul><ul><li>A query is much more flexible </li></ul></ul>
    10. 10. General ColdFusion Approach <ul><li>Define 2 queries </li></ul><ul><ul><li>With the same column names </li></ul></ul><ul><ul><ul><li>rename with AS, if needed </li></ul></ul></ul><ul><ul><li>With a common key (possibly multi-column) </li></ul></ul><ul><ul><li>Sort on the common key </li></ul></ul><ul><li>Merge queries on the common key </li></ul><ul><ul><li>Report extra keys </li></ul></ul><ul><ul><li>Report data differences when keys match </li></ul></ul><ul><li>Feedback for user </li></ul><ul><ul><li>Column names </li></ul></ul><ul><ul><li>Notification of errors </li></ul></ul>
    11. 11. Parameters in General-Purpose Differencing <ul><li>In a FORM, prompt for: </li></ul><ul><ul><li>Datasource </li></ul></ul><ul><ul><li>Table specification (query) </li></ul></ul><ul><ul><li>Key specification </li></ul></ul><ul><ul><li>Sort order </li></ul></ul><ul><ul><li>Login parameters (username/password) </li></ul></ul>
    12. 12. Demos
    13. 13. Challenges in General-Purpose Programming <ul><li>Column name(s) not known for SELECT * </li></ul><ul><li>Number of columns not known </li></ul><ul><li>Number of key columns not known </li></ul><ul><li>Column name syntax in ORDER BY clause differs across databases </li></ul>
    14. 14. ColdFusion essential features <ul><li>CFQUERY’s returned variable: </li></ul><ul><ul><li>queryname.ColumnList </li></ul></ul><ul><ul><ul><li>ListGetAt(ISQLquery1.ColumnList,ii) </li></ul></ul></ul><ul><li>CFQUERY column indexing: </li></ul><ul><ul><li>queryname.columnname[index] </li></ul></ul><ul><li>Evalute function: </li></ul><ul><ul><ul><li>Evaluate(‘queryname.columnname’) </li></ul></ul></ul><ul><ul><ul><li>Evaluate(&quot;ISQLquery1.&quot;&ListGetAt(ISQLquery1,ii)&&quot;[&quot;&q1&&quot;]&quot;) </li></ul></ul></ul>
    15. 15. Unexpected problems <ul><li>False matches </li></ul><ul><ul><li>SQL Case-sensitive vs CF case-insensitive </li></ul></ul><ul><ul><li>ColdFusion null same as empty string </li></ul></ul><ul><ul><ul><li>(unless query uses COALESCE to give a value to nulls) </li></ul></ul></ul><ul><li>False differences </li></ul><ul><ul><li>Extra rows on both sides (“ships passing in the night”) </li></ul></ul><ul><ul><li>Trailing blanks (do not count in SQL, do in CF) </li></ul></ul><ul><li>Sort sequences differ </li></ul><ul><ul><li>Across database products </li></ul></ul><ul><ul><li>Across database configurations of same product </li></ul></ul><ul><ul><li>ColdFusion has its own sort sequence </li></ul></ul><ul><li>SQL query special characters hard to store </li></ul><ul><ul><li>need URLEncodedFormat </li></ul></ul>
    16. 16. More Demos
    17. 17. Workarounds to sort problems <ul><li>Sort by UPPER(columnname) </li></ul><ul><li>Replace known special characters with something that sorts the same </li></ul><ul><ul><li>UPPER(REPLACE(columnname,’_’,’ ’)) </li></ul></ul><ul><li>ReSort in ColdFusion, </li></ul><ul><ul><li>via CF5 query-on-query </li></ul></ul>
    18. 18. Additional Problems <ul><li>In CFFORM my own javascript inadvertantly disables CFINPUT validation </li></ul><ul><ul><li>onSubmit=&quot;return SubmitDiffForm(_CF_this);“ </li></ul></ul><ul><li>Column names with special characters won’t “evaluate” (I warn and abort) </li></ul><ul><li>My javascript function to uppercase character sort columns was too confusing (now disabled) </li></ul>
    19. 19. Additional Implementation Notes <ul><li><CFSETTING EnableCFOUTPUTONLY=&quot;Yes&quot;> </li></ul><ul><ul><li>To limit extraneous white space (lots of it) </li></ul></ul><ul><li>Hidden radio button trick – form variable is always present and has values &quot;Y,N“ or “N” </li></ul><ul><li>thisTag.generatedContent gives “playback” </li></ul><ul><li>Empty username/password attributes mean “use defaults” (instead of ODBC error) </li></ul><ul><li>PreserveSingleQuotes needed in CFQUERY </li></ul><ul><li>Sandbox security may limit DIFFing </li></ul><ul><li>Written/tested in CF 4.5; enhanced with optional CF5 features </li></ul>
    20. 20. Security concerns <ul><li>This utility must be protected; </li></ul><ul><ul><li>Otherwise, it’s a hacker’s dream </li></ul></ul><ul><ul><li>NOTE: I use a session variable to hold a login </li></ul></ul><ul><li>Security can be disabled </li></ul><ul><ul><li>(in application.cfm, set BypassSecurity=“Yes”) </li></ul></ul><ul><ul><li>OK, iff protection already exists </li></ul></ul><ul><li>Security must be configured </li></ul><ul><ul><li>Use an existing username/password to login </li></ul></ul><ul><ul><ul><li>Avoids password management </li></ul></ul></ul><ul><ul><ul><li>Provide parameters in LoginParam.cfm </li></ul></ul></ul><ul><li>Database passwords for queries are omitted in stored DIFF specification </li></ul>
    21. 21. Feature Creep ( extra funcionality ) <ul><li>Javascript function cascades merge key to sort key input boxes ( less data entry ) </li></ul><ul><li>Option to save entire DIFF specification in a file </li></ul><ul><li>Batch execution ( can be scheduled in CF admin ) </li></ul><ul><li>Email the differences report </li></ul><ul><li>Resort ( CF5 query-on-query ) to avoid sort sequence variance </li></ul>
    22. 22. Future Enhancements? <ul><li>Case-sensitive search </li></ul><ul><li>Parameters to limit output </li></ul><ul><li>Numeric tolerances in a match </li></ul><ul><li>Encrypted passwords for datasources </li></ul><ul><li>CfScript to speed it up </li></ul>