Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mapping and Integration of Multiple Forms into Relational Databases


Published on

  • Be the first to comment

  • Be the first to like this

Mapping and Integration of Multiple Forms into Relational Databases

  1. 1. MAPPING & INTEGRATING MULTIPLE FORMS INTO A DATABASE Yuan An, Ritu Khare, Il-Yeol Song, Xiaohua HuBackground The FormMapper System Desirable Characteristics of Database Semantic Form Tree (w.r.t. the input form) Patient Information PatientInformation Tree Extraction Component Form Mapping and Integration Component root Completeness FORM Date: piId Date Patient HPI VitalSign X1 Layered Hidden Markov Initial Correspondence Correctness Patient Y1 z1 Models(HMMs) x1 x2 Generation and Validation Compactness Name: X2 DATABASE Patient Y1 Y2 Y3 Merging Y2 z2 Parent Child Association Normalization (3NF) Gender: M F pId Name Gender DOB Database Birthing Algorithm NEW Algorithm Y3 z3 Rules DB Optimization (minimize DOB: z1 z2 z3 potential NULL values & the HPI: Input Form Gender Vital Signs Fig. 3 The FormMapper System has two components: (1) Tree Extraction (2) Form Integration. number of database Vital Sign elements) Height: gId options vId Height Weight BP Key Techniques Tj Tj Weight: 001 Male Tj Tj Hierarchical Representation of Forms as Form Trees ID c ID f ID BP: 002 Female ID f Hidden Markov Models for Form Information Extraction Sophisticated Matching techniques for Deriving Mapping Tr Fig. 1 Using forms as the front-end interface mapping to a back-end database is a T T T Correspondences between tree and database ID fj f ID Options ID standard way for data collection. Figure shows a scenario in healthcare domain textbox radiobutton checkbox ID ck Form Tree Patterns and DB design principles to translate a 1 Vk form tree into an equivalent database (See Fig. 4) a)Textbox PatternMotivation and Focus Quantitative metric (quality tuning factor) to facilitate the d)Category – Subcategory PatternIn the quest for database usability, several DIY and WYSIWYG approaches decision of merging(or not merging) two mapped tables b)Radiobutton Pattern c)Checkbox Patternenable non-technical users to design forms. Such approaches (e.g. Fig. 4 Some Form Tree to Database Mapping Patterns.FormAssembly) automatically translate forms into databases whileshielding the users from technical details. Such approaches, however,neither support database evolution due to changing user requirements Implicationsnor support multiple users managing a common database. Empirical Study in Healthcare FormMapper Vs Gold 1 FormMapper Vs Gold 2 High potential to replace the human expertsWhile there exist many techniques to forward engineer a single form to Perfect 6% As more forms are mapped, the 200an individual back-end database, mapping multiple forms to an existing Datasets Tree Extraction Component Database 1 FormMapper 20% Match database grows automatically in 150 Expectation Maximization Gold 1 Positivestructured database remains unexplored. This work addresses the 16 highly complex data- 100 Mismatch 40 a principled manner . Algorithm on 52 clinical forms Gold 2 28% 52% 54 50 % It is challenging to automate theproblem of automatically mapping multiple(possibly overlapping) entry forms from 3 Negative % Viterbi Algorithm for decoding 0 Mismatch aspects of mapping that rely onforms to an existing structured database. healthcare institutions. 5 parent child association rules Tables Columns Values Foreign human understanding of domain Average 57 form Fig. 6. Comparison of Tables. Keys semantics. Healthy Living Program Challenges in Mapping Forms to Databases elements per form Accuracy: 96.93% 200 Date: How to automatically understand a user- 150 Database 2 FormMapper Vs Gold DB Benchmarks Duration: 0.07 sec per form 100 Patient created form and extract semantic On an average, 87% of the database 16 Gold Standard Trees 50 Work in Progress Name: relationships among form elements? tables are either identical or Prepared Using a DIY Form Integration Component 0 Leverage Ontology and Controlled DOB: superior(positive mismatch) to the How to automatically map the semantic Indexing using Lucene Tables Columns Values Foreign Vocabularies to handle semantic form design tool. Keys gold database tables based on the Social Activities model extracted from a form to the Quality tuning factor = 0.5 defined database characteristics. heterogeneity. Smokes: existing database? Two sets of 3 Gold 200 Database 3 Inferior cases (negative mismatch) is More sophisticated Standard Databases Duration: 3 sec per form 150 Correspondence Generation and Alcohol: How to automatically evolve the existing prepared by 2 database 100 mostly due to the missing database with desired properties and 50 correspondences (due to extraction Validation Techniques Hours Watching TV: experts each with at what are these properties? least 10 years of 0 inaccuracies) and imprecisely derived Consider more complicated Hours Exercise: experience. Tables Columns Values Foreign cardinalities among merging situations (e.g. a table Keys category/subcategory in forms. corresponds to a column)Fig. 2 A New Form representing a Fig. 5. Scale of the evolved Databasesnew (or evolved) user requirement CVDI is a collaboration between the University of Louisiana at Lafayette & Drexel University