Visual tools for databade queries and analysis


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Patient Data is collected by Medical facilities across the state of KY.Abstractors read paper/electronic records and code the data as a cancer abstract according to standards.Abstracting is performed using the KCR’s custom CPDMS.NET reporting system.The abstract is transmitted across the internet and stored in the registry database.
  • Take KCR’s data into something a computer can process and analyze quicklyCreate the tools for analysisDevelop useful ways to present the results of analysisPresent the information in a user friendly manner
  • Many valuable statistics and trends are hidden in the registry database.Retrieving this information is an arduous task, especially for those without knowledge of SQL
  • When this information can be analyzed and visualized, life-saving discoveries may be uncovered by research experts. Advancing the understanding of cancer and toward the development of new models and modes of intervention in malignant processes.Take this old mine of information and simplify it visually and numerically;It is hoped that this may help advance the understanding of cancer, and in turn help science fight one of its biggest battles: to better treat and prevent disease.
  • The Query Builder tool aims to solve the aforementioned problems by providing a visual interface forconstructing database queries without the need to understand the underlying structure of the database orwrite formal SQL expressions.1) Provide access to important registry database objects including Patient, Case, and Therapy information.2) Provide a list of important attributes/fields associated with each object.3) Allow search criteria be entered with minimal effort, and no knowledge of SQL language.4) Show descriptive database field values where appropriate - in addition to or in lieu of coded values.a. Display an appropriate input field for different data types like dates, numbers, and lists.5) Allow the user to construct arbitrarily complex searches by adding as many criteria as needed tothe query.6) Support a set of Boolean operators: AND, OR, XOR, NOT - so search criteria can be joined invarious ways.7) Allow searches to be saved for later use.
  • Direct interaction with the database system involves the use of a structured query language (SQL)used by most relational database systems. This includes operations like reading, adding, removing, andmodifying data stored by the system. Although this language is readable by humans, special understandingof the syntax and structure of an SQL statement is required for a user to “talk” to the database systemAnd find what he or she is looking for. This can at the very least be cumbersome or nearly impossiblefor those without much experience with programming languages or similar, especially when one tries todescribe a very specific data set.There are several factors that contribute to the disparity between a database language like SQL, and anatural language such as English, each reason of course being related to the way a computer stores andprocesses information in a digital form.Encoding of Each Attribute: helps reduce the database storage space required andincrease performance. Unfortunately the trade-off of is that any SQL statement describing such a recordmust use the coded version of the attribute data rather than a natural textual description. For example,a person’s assigned treatment could be encoded as No Treatment=0, Treatment=1, Surveillance=2. Normalize the Data: avoid duplicatin information and wasting storage space, records are often split up into multiple tablesand associated with one another.
  • Each condition of the query can be entered with several mouse clicksThe conditions may be joined with Boolean operators AND, OR, etcEncoded values are shown with descriptive translationsThe Query Builder shows a data-type sensitive input for each variableSeparates researchers from data encoding
  • Syntax Tree is generated from the query and stored in serialized form for later use.Once the user is satisfied with the query, it can be given a title and saved for analysis!
  • Queries are saved indefinitely for later for each user account.Metadata showing the last modified and edited times are displayedStudy groups can be copied, deleted, edited or created from this interface
  • Compare the survival distributions of two samples. Nonparametric test – used with data that is censoredUsed frequently in clinical trials applications
  • Visual tools for databade queries and analysis

    1. 1. Visual Tools for Queries andDisplay of Quantitative Information in a Cancer Research Database JESSE STEWART and JERZY W. JAROMCZYK Department of Computer Science University of Kentucky, Lexington KY
    2. 2. The Kentucky Cancer Registry• The Markey Cancer has the singular mission to eliminate the morbidity and mortality of cancer• Since its founding, the Markey Cancer Center and the UK Chandler hospital have served 2000-2200 new patients a year and is one of the few institutions nationwide that address both clinical care as well as cancer research.• The KCR’s case count exceeded 30,000 annually as of 2009• The KCR houses a wealth of historical data for hundreds of cancer variants, associated treatments, and their relative success across the state of Kentucky.
    3. 3. Data CollectionPatient Abstracting Internet Registry DBEvents CPDMS.NET HTTPS MySQL
    4. 4. Cancer Abstracts• A cancer abstract contains up to 240 different elements ranging from patient demographics to staging information to therapy history• KCR alone stores tens of thousands of unique abstracts• Each abstract is created by a registrar, a professional trained to understand cancer data standards, formats and coding rules
    5. 5. Accelerating Cancer Research DiscoverDevelop Visualize ImportantQueries Data Sets Correlations
    6. 6. Registry Databases and Research Valuable Information •Survival Trends •Incidence Rates •Behavioral and Geographical Correlation Challenges in Research •Coded Data •SQL •Complex DB Schemas •Access Control •Visualization
    7. 7. Software Solutions• Define Queries (Data Sets) – Intuitive: no programming required – Flexible: allow any data set to be explored – Accessible: Visual cross-browser application – Re-use: Save, modify and combine Data Sets• Data Analysis and Visualization: – Context-specific diagrams – Compare data sets singularly or side-by-side – Customizable appearance
    8. 8. The Query Builder• Presents a high-level abstraction of the Registry Database• Patient, Case, Therapy data variables are easily recognizable and categorized• Separates the user from the actual database structure and coded information – Example: Treatment is encoded as: • No Treatment=0, Treatment=1, Surveillance=2
    9. 9. The Query Builder• Translates a question about cancer data into SQL (Structured Query Language) which can be understood by the computer system• Parses and stores the query for modification and reuse later
    10. 10. Example Query• Patients diagnosed between Jan 1, 2005 and Dec 31, 2008• Patients diagnosed in Kentucky• Patients treated with immunotherapy• SQL may be complexcase_data.diagdate >= 20050101 and case_data.diagdate <= 20081231 and case_tx.txtype = ‘I’ and case_data.diagstate = ‘KY’ from case_data, case_tx where case_tx.hospkey = case_data.hospkey and case_tx.patkey = case_data.patkey and case_data.incomplete = 0;
    11. 11. Interface Design• To make writing a query like the previous example simple, the Query Builder must provide intuitive controls permitting a user to define each query component• Variable names and coded values should be descriptive and easy to locate• Conditions should be combined in a natural way with Boolean operators• Tree-like layout chosen to represent queries
    12. 12. Query Builder in Action
    13. 13. Custom UI Controls• For each variable, DB schema information is used to display a customized UI control, eg: – Dates: date fields or ranges – Discrete variables: drop-down list or multiselect – Variable with many values: autofill field
    14. 14. Syntax Tree
    15. 15. Internal Representation• Program maintains an abstract syntax tree for the query as it is created• Captures the essential structure of the query but omits SQL-specific syntax• This data structure serves as an intermediary between the interface and the database system• Permits two code-generation targets: JSON and SQL
    16. 16. Serialization and Storage• Each query once created by the user may be saved for future analysis or manipulation• The program stores the AST for the query as a JavaScript object, which can then be serialized into JSON (JavaScript Object Notation) and then stored.• Deserialization and conversion to SQL is performed later for analysis
    17. 17. Query Management
    18. 18. Query Storage• Queries are often referred to as ‘study groups’ by researchers• The serialized queries and associated metadata is stored in a database table study_groups: id | Name | Query| User | LastModified | LastUsed• MySQL database was chosen for convenience since registry data is stored using this system
    19. 19. Visualization Tools– Scaled Venn Diagrams • User can quickly ascertain relative size of data sets and their relationship to one another– Bar and Histogram Charts • Flexible view of variable distribution for different sets– Survival Trends • View and compare survival rates over time– Statistics • Common descriptive statistics • Comparison with Chi-square, Log rank, T-, Z-tests
    20. 20. Venn Diagrams• Venn diagrams show logical relationships between a number of sets• Subset of Euler diagrams – all possible subsets must be displayed• Can quickly convey how data sets overlap and relate to one another
    21. 21. Area Proportionality• Area-proportional venn diagrams show the relative size of datasets and their intersections• Very useful for rapid exploration of data sets such as cancer data• Although typical venn digrams often display 3 sets, area-proportional diagrams cannot always be drawn with circles for more than 2 sets [1]• The vast majority of research needs involve comparison of two data sets
    22. 22. Drawing To Scale Circle-intersection problemTriangle(C1,C2,A) = Triangle(C1,C2,B)Triangle(C1,C2,A) + Triangle(C1,C2,B) + Lens = Sector(C1,A,B) + Sector(C2,A,B)Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)
    23. 23. Drawing to Scale Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)• By applying formulas for the area of a circular sector and triangle, we arrive at this result for the distance between the circles’ centers:• The value must be approximated, to do so the Root-bisection method was used in implementation.
    24. 24. Visualization: Venn Diagrams
    25. 25. Visualization: Venn Diagrams
    26. 26. Reports• Several customizable reports were implemented to further leverage the query builder’s utility.• Each is implemented in PHP, and produces an SQL query using the saved criteria and the settings selected by the user for the report
    27. 27. Data List Tool
    28. 28. Cross-Tab Analysis
    29. 29. Graph Settings Interface
    30. 30. Visualization: Histogram
    31. 31. Visualization: Survival Trends
    32. 32. Chi-square Analysis
    33. 33. Censored Life Table
    34. 34. Success• The Visual Query Builder and Data Analysis tools have become an integral part of CPDMS.NET – the online abstracting system developed at the KCR.• Over 5000 study groups have been created by users of the system.• Features have been added and improved resulting from feedback given by researchers and registrars (cancer data professionals).• Future developments may include: – Wider array of statistical tests – Functions to analyze more than two data sets at once
    35. 35. References• The Kentucky Cancer Registry – A History•• F. Ruskey and M. Weston – A Survey of Venn Diagrams•• S. Chow and F. Ruskey, Drawing Area-Proportional Venn and Euler Diagrams• Circle-Circle Intersection Problem•
    36. 36. Acknowledgements Eric Durbin, Kentucky Cancer RegistryDr. Jerzy Jaromczyk, UK Computer Science
    37. 37. Software