Data base and data entry presentation by mj n somya
DATA BASE AND DATA ENTRY Presented By:- Mukesh Jaiswal Somya Verma ICRI, Dehradun.
Clinical Data Base• A database is a method of organizing and analyzing information.• A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full- text, numeric, and images.
Cont…• In computing, databases are sometimes classified according to their organizational approach. The most prevalent approach is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses
Cont…• The main objectives of data base design is to capture and store clinical data accurately.• The essential features of good design are ease of data capture, efficient creation of analysis datasets and accommodation of source data transfer formets.
Why use a database?• Organize and analyze information in different ways – Sorting – Grouping – Querying – Reporting – Exporting for statistical analysis• Computerized database – Speed – Quality control – Precision – Automate repetitive tasks
Databases versus Excel• Excel has some limited capabilities to sort data but its primary function is to create financial spreadsheets – Can create “what if” scenarios to determine financial consequences – Can be used for small and limited research data sets and simple lists – Not multi-user such that only one person can work on the file at a time• Databases are designed to collect, sort, and manipulate data – Data sets can process large amounts of data and is usually limited by hardware constraints – Structure is in the same format for each member record of a table – Data quality control features ensure that valid data is entered – A relational database allows for linking of an unlimited number of tables – Databases are multi-user because the data can reside on a server and multiple people can have access at the same time – Many databases offer web interfaces thereby eliminating the need for each user to have a copy of the the program on their computer
Cont…• Many databases offer audit functions required by certain regulatory agencies • Tracks date record created and modified • Tracks original and changed values • Requires user to give reason for the change• Databases are more suitable for importing data from multiple sources • More robust in connecting to different data sources • Imports of different data types into different tables can be linked via common identifiers such as subject ID • Merging multiple data sources into Excel so that the rows line up properly in a flat file format can be a challenge
How is a database organized?• One or more tables• Tables store records – Patient identifiers – Demographics and history – Test results – Etc…..• A record is a collection of fields – Patient identifiers • Name, DOB, address, …..are stored in separate fields
Differences between a clinical andresearch database• Clinical database – Form or report oriented so data is displayed for clinical decision making – Emphasis on displaying or reporting of individual data rather than accumulating multiple records• Research database – Table oriented so that data is accumulated for eventual export to a statistical package for data analysis and reporting – Less emphasis on individual records
Types of Database• Flat-File:- The flat-file style of database are ideal for small amounts of data that needs to be human readable or edited by hand. Essentially all they are made up of is a set of strings in one or more files that can be parsed to get the information they store; great for storing simple lists and data values, but can get complicated when you try to replicate more complex data structures.
Cont…• Relational:- The relational databases such as MySQL, Microsoft SQL Server and Oracle, have a much more logical structure in the way that it stores data. Tables can be used to represent real world objects, with each field acting like an attribute.• One major advantage of the relational model is that, if a database is designed efficiently, there should be no duplication of any data; helping to maintain database integrity. This can also represent a huge saving in file size, which is important when dealing with large volumes of data.
Cont…• Relational databases also have functions "built in" that help them to retrieve, sort and edit the data in many different ways. These functions save script designers from having to worry about filtering out the results that they get, and so can go quite some way to speeding up the development and production of web applications.
Advantages of a Relational Database • Elimination of Multiple Value Data – a relational database allows creation of relationships for subordinate data. For example, a table for laboratory testing and another table for clinical findings would each have multiple subjects but the subject demographic information is maintained in a separate table). • Avoiding Update Anomalies – since data is stored in only one place, it is easy to update (no other copies to remember to update). • Avoiding Data Entry Anomalies – like updates, since data is only stored in one place, it needs to be inserted in one place. • Avoiding Data Deletion Anomalies – once again, since data is in one place only, it is deleted only once.
Advantages of a database• Collection of data in a centralized location• Controls redundant data• Data stored so as to appear to users in one location – Data can be stored in multiple tables and come from multiple sources – A relational database brings it all together
Database Design Considerations• What to collect – What questions are to be answered? – Think of the data tables in your future publications • Focus on the key data elements rather than collect as much as possible• What statistical package will be used – Format of the data file to which the data will be exported • Allowable characters • Format for certain analyses – For example, gender can be recorded in the database as M or F but statistical package may require 0 and 1 • Length of data field labels • Long or wide format
Long versus Wide Format Long: each year is represented as its own observation in a record Wide: each family is a record and each year is a field with that record
Quality Control of Data BeforeStudy• Collect only needed variables• Select appropriate computer hardware and software• Plan analyses with dummy tabulations• Develop study forms – Precode responses – Format boxes for data entry – Label each page with date, time, ID – Consider scan technology
What needs to be in the researchdatabase? Research variables directly related to the hypotheses being tested-YES Clinical measures used for screening-MAYBE ◦ Blood work, ECG, medical history Administrative data-NO ◦ Contact information ◦ Scheduling
What Do You Do With the Data?• Ongoing monitoring• Safety/adverse event reporting• IRB reports/sponsor reports• FDA reports• Early analysis/late analysis
Data Entry• Refers to the process of transferring data from the paper CRF to the data base.• This is also refers to as transcribing the data.• Data entry result in creation of electronic data , which corresponds to the CRF data.• Once the data is entered into the database, it is reviewed and validated by the data editor.• Data entry consists of both double entry and single entry.
Double Entry• This involves entry of the same CRF page by two independent data entry personnel.• The first data entry personnel keys in the data into the database. Later, a second independent data entry personnel keys in the same data.• In the case of difference or discrepancy between first and second entry, a ‘pop up’ box throws up, alerting the second data entry personnel either key in what they see or to accept what the first data entry personnel has entered.
Cont…• Another option is to have a third personnel review the differences/discrepancies and resolve them.• Thus double data entry serves as a quality check in the data that is entered into the database.
Cont…• The system allowed design of data entry forms that satisfied the needs of our clinicians, biostatisticians, and administrative staff. The system drastically reduced the time required to enter patient exam, demographic, and laboratory measurement data onto the study database, and provided tools for verifying that the data were scanned accurately. The system improved both the quality of patient care and the integrity of clinical patient data, allowing clinicians to quickly and easily retrieve patient records, and permitted our biostatisticians to generate periodic recruitment monitoring, patient safety, protocol adherence, and data quality assurance reports in a timely fashion.
Single Entry• This involves entry by single data entry personnel.• This process is used when there are sufficient and extensive checks built into the database that would detect certain error that might be missed out by the data entry personnel.• Single data entry is extensively used in EDC and RDC systems.
Cont…• Thus single data entry eliminates having data entry personnel within the data management unit.• Once the data is keyed directly at site, it is already to be reviewed, edited and validated by the data editor.
Cont…The data entry could be of two types:-• Data entry is done locally at the site database and transmitted periodically to the central database via internet or using a dialup line. Sometimes the data is sent using other electronic media such as a CD, floppy or as a mail attachment.• Data entry is done online directly into the central database via internet. Usually these systems are web- based and the data is available in real time for review.
Rules for Data Entry• Each variable has a field in the dataset• Categorical and nominal values require a number or string code• Continuous values are entered directly• Missing values must be different values from a real response – Common formats are “99” or bullets “·” – Don’t know is a response—do not leave blank – “0” is not the same as missing• Coding instructions should be on form• Avoid open-ended questions
Avoid open-ended questions• Enter the subject’s gender:___________________• Enter the subjects level of education:__________
Close Ended Question What is the subject’s sex? Check one Male Female