Data management issues are integral to all core areas of Responsible Conduct of Research
(RCR) instruction, and everyone involved in research-related activities should be aware of these issues to
conduct and support research responsibly. What constitutes research data can often be discipline-
specific, but in general “research data” refers to information collected, stored, and processed in a
systematic manner to meet the objectives of a particular research project.
Data can be collected manually or electronically, and can be quantitative or qualitative. Data can be
represented as numerical figures, text, images, audio/video, etc. Data sources can be human or animal
subjects, field notes, journals, laboratory specimens, observations, etc. Different disciplines can have
different notions of what constitutes data in their disciplines and how it can be managed. The underlying
issue however is how to manage research data responsibly.
Data management issues encompass all stages of research from conceptualization of a project to the
archiving and disposal of research materials. Those involved in research can face integrity issues in each
stage, and therefore, should be prepared deal with and address the issues that may arise from these
issues. For the purpose of this module, data-related integrity issues have been organized under the
Data reporting and publishing
Along with the mentioned topics, research conceptualization and training of research staff can have a
significant impact on the integrity of research data. Research staff includes not only those who are directly
involved in conducting the research activities but also those who provide support for such activities, and
as a result, can have an impact on the integrity of the research effort. Proper conceptualization of a
research project and the use of appropriate research methods along with adequate training of all those
staff directly and indirectly involved in the research project will ensure data integrity. Those who are new
to a research area or particular research methods can unintentionally commit mistakes or misuse
research methods that can impact the integrity of research data along with every other aspect of a
project. But issues related to conceptualization of research and research methods are difficult to cover in
the core modules of RCR instruction due to the project-specific nature of such issues.
Research staff should be trained not only on the research methods but also on the relevant standards
and regulations. For example, data standards can play an important role in data management in some
disciplines such as geography (example, federal spatial data standards, http://www.fgdc.gov/), while
federal and state guidelines on data collection on animal or human subjects can have a significant impact
on data integrity in other disciplines such as biology or health sciences. Research staff should also be
aware of the open-ended nature of some basic research which can at times conflict with the regulations
imposed by state, federal, and other bodies for the purpose of protecting research subjects, and be
prepared to deal with the potential conflicts associated with such exploration for the purpose of advancing
science. Therefore, training and supervising research staff adequately on the necessary research
methods, data standards, institutional policies and regulations, and sponsors’ requirements relevant to
the research project is essential to prepare them to make better decisions that ensure research integrity.
Another aspect of data integrity that is becoming increasingly important is related to the use technology in
research projects for data collection, storage, analysis, archival, etc. These technologies include
electronic instruments or hand-held devices for collecting data, computer systems for storing and sharing
data, and software for analyzing data. But the use of technology can create additional integrity concerns
that researchers must be prepared to deal with and act responsibly. Adequate training of research staff in
the application and implications of technology used in a project can help to prevent technology-related
In summary, researchers should be familiar with the contextual nature of data and the six areas of data
management mentioned earlier to have a better understanding of data integrity issues. Adequate project
planning, training and supervision of research staff, understanding of standards and regulations, and
knowledge of the implications of technology used can prevent or reduce data-related integrity violations in
research. Finally, researchers should recognize the overlapping nature of data management issues with
all the other core areas of RCR instruction and be prepared to deal with data integrity issues in a
professionally responsible manner at all stages of research projects.
Data selection is defined as the process of determining the appropriate data type and source, as well
as suitable instruments to collect data. Data selection precedes the actual practice of data collection.
This definition distinguishes data selection from selective data reporting (selectively excluding data that
is not supportive of a research hypothesis) and interactive/active data selection (using collected data for
monitoring activities/events, or conducting secondary data analyses). The process of selecting suitable
data for a research project can impact data integrity.
The primary objective of data selection is the determination of appropriate data type, source, and
instrument(s) that allow investigators to adequately answer research questions. This determination is
often discipline-specific and is primarily driven by the nature of the investigation, existing literature,
and accessibility to necessary data sources.
Integrity issues can arise when the decisions to select ‘appropriate’ data to collect are based primarily on
cost and convenience considerations rather than the ability of data to adequately answer research
questions. Certainly, cost and convenience are valid factors in the decision-making process. However,
researchers should assess to what degree these factors might compromises the integrity of the research
Considerations/issues in data selection
There are a number of issues that researchers should be aware of when selecting data. These include
the appropriate type and sources of data which permit investigators to adequately answer the
stated research questions,
suitable procedures in order to obtain a representative sample
the proper instruments to collect data. There should be compatibility between the type/source of
data and the mechanisms to collect it. It is difficult to extricate the selection of the type/source of
data from instruments used to collect the data.
Types/Sources of Data
Depending on the discipline, data types and sources can be represented in a variety of ways. The two
primary data types are quantitative (represented as numerical figures - interval and ratio level
measurements), and qualitative (text, images, audio/video, etc.). Although scientific disciplines differ in
their preference for one type over another, some investigators utilize information from both quantitative
and qualitative with the expectation of developing a richer understanding of a targeted phenomenon. Data
sources can include field notes, journal, laboratory notes/specimens, or direct observations of humans,
Interactions between data type and source are not infrequent. Researchers collect information from
human beings that can be qualitative (ex. observing child rearing practices) or quantitative (recording
biochemical markers, anthropometric measurements). Determining appropriate data is discipline-specific
and is primarily driven by the nature of the investigation, existing literature, and accessibility to data
Questions that need to addressed when selecting data type and type include:
1. What is (are) the research question(s)?
2. What is the scope of the investigation? (This defines the parameters of any study. Selected data
should not extend beyond the scope of the study).
3. What has the literature (previous research) determined to be the most appropriate data to collect?
4. What type of data should be considered: quantitative, qualitative, or a composite of both?
Methodological Procedures to Obtain a Representative Sample
The goal of sampling is to select a data source that is representative of the entire data universe of
interest. Depending on discipline, samples can be drawn from human or animal populations, laboratory
specimens, observations, or historical documents. Failure to ensure representativeness may
introduce bias, and thus compromise data integrity.
It is one thing to have a sampling methodology designed for representativeness and yet another thing for
the data sample to actually be representative. Thus, data sample representativeness should be tested
and/or verified before use of those data.
Potential biases limit the ability to draw inferences to larger populations. A partial list of biases could
include sex, age, race, height, or geographical locale.
A variety of sampling procedures are available to reduce the likelihood of drawing a biased sample, and
some of them are listed below:
1. Simple random sampling
2. Stratified sampling
3. Cluster sampling
4. Systematic sampling
These methods of sampling try to ensure the representativeness from the entire population by
incorporating an element of ‘randomness’ to the selection procedure, and thus a greater ability to
generalize findings to the targeted population. These methods contrast sharply with
the ‘convenience’ sample where little or no attempt is made to ensure representativeness.
Random sampling procedures common in quantitative research contrasts with the predominant type of
sampling conducted in qualitative research. Since investigators may be focusing on a small numbers of
cases, sampling procedures are often purposive or theoretical rather than random. According to
Savenye and Robinson (2004), “For the study to be valid, the reader should be able to believe that a
representative sample of involved individuals was observed. The “multiple realities” of any cultural context
should be represented.
Each strategy has its appropriate application for specific scenarios (the reader is advised to review
research methodology textbooks for detailed information on each sampling procedure). Selection bias
can occur when failing to implement a selected sampling procedure properly. The resulting non-
representative sample may exhibit disproportionate numbers of participants sharing characteristics (ex.
race, gender, age, geographic) that could interact with main effect variables (Skodol, Bender, 2003;
Robinson, Woerner, Pollak, Lerner, 1996; Maynard, Selker, Beshansky, Griffith, Schmid, Califf,
D’Agostino, Laks, Lee, Wagner, 1995; Fourcroy, 1994; Gurwitz, Col, Avorn, 1992). Use of homogenous
samples in clinical trials may limit the ability of researchers to generalize findings to a broader population
(Sharpe, 2002; Dowd, Recker, Heaney, 2000; Johnson, 1990). The issues of sampling procedures apply
to both quantitative and qualitative research areas.
Savenye and Robinson (2004) contrast this approach with qualitative researchers’ tendency to interpret
results of an investigation or draw conclusions based on specific details of a particular study, rather than
in terms of generalizability to other situations and settings. While findings from a case study cannot be
generalized, this data may be used to develop research questions later to be investigated in an
experiment (Savenye, Robinson, 2004).
Selection of Proper Instrument
Potential for compromising data integrity also exists in the selection of instruments to measure targeted
data. Typically, researchers are familiar with the range of instruments that are conventionally used in a
specialized field of study. Challenges occur when researchers fail to keep abreast of critiques of existing
instruments or diagnostic tests (Goehring, Perrier, Morabia, 2004; Walter, Irwig, Glasziou, 1999; Khan,
Khan, Nwosu, Arnott, Chien, 1999). Furthermore, researchers may be:
unaware of the development of more refined instruments
use instruments that have not been field-tested, calibrated, validated or measured for reliability
apply instruments to populations for which they were not originally intended
Questions that should be addressed in the selection of instruments include:
1. How was data collected in the past?
2. Is (are) the instrument(s) appropriate for the type of data sought?
3. Will the instrument(s) be adequate to collect all necessary data to the degree needed?
4. Is the instrument current, properly field-tested, calibrated, validated, and reliable?
5. Is the instrument appropriate for using in collecting data from a different source than originally
intended? Should the instrument be modified?
Attention to the data selection process is crucial in supporting the research steps that follow. Despite
efforts to maintain strict adherence to data collection protocols, selection of fitting statistical analyses,
accurate data reporting, and an unbiased write-up, scientific findings will have questionable value if the
data selection process is flawed.
Dowd, R., Recker, R.R., Heaney, R.P. (2000). Study subjects and ordinary patients. Osteoporos Int.
Fourcroy, J.L. (1994). Women and the development of drugs: why can’t a women be more like a man?
Ann N Y Acad Sci, 736:174-95.
Goehring, C., Perrier, A., Morabia, A. (2004). Spectrum Bias: a quantitative and graphical analysis of the
variability of medical diagnostic test performance. Statistics in Medicine, 23(1):125-35.
Gurwitz,J.H., Col. N.F., Avorn, J. (1992). The exclusion of the elderly and women from clinical trials I
acute myocardial infarction. JAMA, 268(11): 1417-22.
Hartt, J., Waller, G. (2002). Child abuse, dissociation, and core beliefs in bulimic disorders. Child Abuse
Negl. 26(9): 923-38.
Kahn, K.S, Khan, S.F, Nwosu, C.R, Arnott, N, Chien, P.F.(1999). Misleading authors’ inferences in
obstetric diagnostic test literature. American Journal of Obstetrics and Gynaecology., 181(1`), 112-5.
Maynard, C., Selker, H.P., Beshansky, J.R.., Griffith, J.L., Schmid, C.H., Califf, R.M., D’Agostino, R.B.,
Laks, M.M., Lee, K.L., Wagner, G.S., : et al. (1995). The exclusions of women from clinical trials of
thrombolytic therapy: implications for developing the thrombolytic predictive instrument database. Med
Decis Making (Medical Decision making: an international journal of the Society for Medical Decision
Making), 15(1): 38-43.
Robinson, D., Woerner, M.G., Pollack, S., Lerner, G. (1996). Subject selection bias in clinical: data from a
multicenter schizophrenia treatment center. Journal of Clinical Psychopharmacology, 16(2): 170-6.
Sharpe, N. (2002). Clinical trials and the real world: selection bias and generalisability of trial results.
Cardiovascular Drugs and Therapy, 16(1): 75-7.
Walter, S.D., Irwig, L., Glasziou, P.P. (1999). Meta-analysis of diagnostic tests with imperfect reference
standards. J Clin Epidemiol., 52(10): 943-51.
Whitney, C.W., Lind, B.K., Wahl, P.W. (1998). Quality assurance and quality control in longitudinal
studies. Epidemiologic Reviews, 20(1): 71-80.
Data collection is the process of gathering and measuring information on variables of interest, in
an established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of
study including physical and social sciences, humanities, business, etc. While methods vary by discipline,
the emphasis on ensuring accurate and honest collection remains the same.
The importance of ensuring accurate and appropriate data collection
Regardless of the field of study or preference for defining data (quantitative, qualitative), accurate data
collection is essential to maintaining the integrity of research. Both the selection of appropriate data
collection instruments (existing, modified, or newly developed) and clearly delineated instructions for their
correct use reduce the likelihood of errors occurring.
Consequences from improperly collected data include
inability to answer research questions accurately
inability to repeat and validate the study
distorted findings resulting in wasted resources
misleading other researchers to pursue fruitless avenues of investigation
compromising decisions for public policy
causing harm to human participants and animal subjects
While the degree of impact from faulty data collection may vary by discipline and the nature of
investigation, there is the potential to cause disproportionate harm when these research results are used
to support public policy recommendations.
Issues related to maintaining integrity of data collection:
The primary rationale for preserving data integrity is to support the detection of errors in the data
collection process, whether they are made intentionally (deliberate falsifications) or not (systematic or
Most, Craddick, Crawford, Redican, Rhodes, Rukenbrod, and Laws (2003) describe ‘quality assurance’
and ‘quality control’ as two approaches that can preserve data integrity and ensure the
scientific validity of study results. Each approach is implemented at different points in the research
timeline (Whitney, Lind, Wahl, 1998):
1. Quality assurance - activities that take place before data collection begins
2. Quality control - activities that take place during and after data collection
Since quality assurance precedes data collection, its main focus is 'prevention' (i.e., forestalling
problems with data collection). Prevention is the most cost-effective activity to ensure the integrity of data
collection. This proactive measure is best demonstrated by the standardization of protocol developed in
a comprehensive and detailed procedures manual for data collection. Poorly written manuals increase the
risk of failing to identify problems and errors early in the research endeavor. These failures may be
demonstrated in a number of ways:
Uncertainty about the timing, methods, and identify of person(s) responsible for reviewing data
Partial listing of items to be collected
Vague description of data collection instruments to be used in lieu of rigorous step-by-step
instructions on administering tests
Failure to identify specific content and strategies for training or retraining staff members
responsible for data collection
Obscure instructions for using, making adjustments to, and calibrating data collection equipment
No identified mechanism to document changes in procedures that may evolve over the course of
the investigation .
An important component of quality assurance is developing a rigorous and detailed recruitment and
training plan. Implicit in training is the need to effectively communicate the value of accurate data
collection to trainees (Knatterud, Rockhold, George, Barton, Davis, Fairweather, Honohan, Mowery,
O'Neill, 1998). The training aspect is particularly important to address the potential problem of staff who
may unintentionally deviate from the original protocol. This phenomenon, known as ‘drift’, should be
corrected with additional training, a provision that should be specified in the procedures manual.
Given the range of qualitative research strategies (non-participant/ participant observation, interview,
archival, field study, ethnography, content analysis, oral history, biography, unobtrusive research) it is
difficult to make generalized statements about how one should establish a research protocol in order to
facilitate quality assurance. Certainly, researchers conducting non-participant/participant observation may
have only the broadest research questions to guide the initial research efforts. Since the researcher is the
main measurement device in a study, many times there are little or no other data collecting instruments.
Indeed, instruments may need to be developed on the spot to accommodate unanticipated findings.
While quality control activities (detection/monitoring and action) occur during and after data collection, the
details should be carefully documented in the procedures manual. A clearly defined communication
structure is a necessary pre-condition for establishing monitoring systems. There should not be any
uncertainty about the flow of information between principal investigators and staff members following the
detection of errors in data collection. A poorly developed communication structure encourages lax
monitoring and limits opportunities for detecting errors.
Detection or monitoring can take the form of direct staff observation during site visits, conference calls, or
regular and frequent reviews of data reports to identify inconsistencies, extreme values or invalid codes.
While site visits may not be appropriate for all disciplines, failure to regularly audit records, whether
quantitative or quantitative, will make it difficult for investigators to verify that data collection is proceeding
according to procedures established in the manual. In addition, if the structure of communication is not
clearly delineated in the procedures manual, transmission of any change in procedures to staff members
can be compromised
Quality control also identifies the required responses, or ‘actions’ necessary to correct faulty data
collection practices and also minimize future occurrences. These actions are less likely to occur if data
collection procedures are vaguely written and the necessary steps to minimize recurrence are not
implemented through feedback and education (Knatterud, et al, 1998)
Examples of data collection problems that require prompt action include:
errors in individual data items
violation of protocol
problems with individual staff or site performance
fraud or scientific misconduct
In the social/behavioral sciences where primary data collection involves human subjects, researchers are
taught to incorporate one or more secondary measures that can be used to verify the quality of
information being collected from the human subject. For example, a researcher conducting a survey
might be interested in gaining a better insight into the occurrence of risky behaviors among young adult
as well as the social conditions that increase the likelihood and frequency of these risky behaviors.
To verify data quality, respondents might be queried about the same information but asked at different
points of the survey and in a number of different ways. Measures of ‘ Social Desirability’ might also be
used to get a measure of the honesty of responses. There are two points that need to be raised here, 1)
cross-checks within the data collection process and 2) data quality being as much an observation-level
issue as it is a complete data set issue. Thus, data quality should be addressed for each individual
measurement, for each individual observation, and for the entire data set.
Each field of study has its preferred set of data collection instruments. The hallmark of laboratory
sciences is the meticulous documentation of the lab notebook while social sciences such as sociology
and cultural anthropology may prefer the use of detailed field notes. Regardless of the discipline,
comprehensive documentation of the collection process before, during and after the activity is essential to
preserving data integrity.
Knatterud.,G.L., Rockhold, F.W., George, S.L., Barton, F.B., Davis, C.E., Fairweather, W.R., Honohan, T.,
Mowery, R, O’Neill, R. (1998). Guidelines for quality assurance in multicenter trials: a position
paper. Controlled Clinical Trials, 19:477-493.
Most, .M.M., Craddick, S., Crawford, S., Redican, S., Rhodes, D., Rukenbrod, F., Laws, R. (2003).
Dietary quality assurance processes of the DASH-Sodium controlled diet study. Journal of the American
Dietetic Association, 103(10): 1339-1346.
Whitney, C.W., Lind, B.K., Wahl, P.W. (1998). Quality assurance and quality control in longitudinal
studies.Epidemiologic Reviews, 20(1): 71-80.
Data handling is the process of ensuring that research data is stored, archived or disposed off in a
safe and secure manner during and after the conclusion of a research project. This includes the
development of policies and procedures to manage data handled electronically as well as through non-
electronic means .
Data handling is important in ensuring the integrity of research data since it addresses concerns related
to confidentially, security, and preservation/retention of research data. Proper planning for data handling
can also result in efficient and economical storage, retrieval, and disposal of data. In the case of data
handled electronically, data integrity is a primary concern to ensure that recorded data is not altered,
erased, lost or accessed by unauthorized users.
Data handling issues encompass both electronic as well as non-electronic systems, such as paper files,
journals, and laboratory notebooks. Electronic systems include computer workstations and laptops,
personal digital assistants (PDA), storage media such as videotape, diskette, CD, DVD, memory cards,
and other electronic instrumentation. These systems may be used for storage, archival, sharing, and
disposing off data, and therefore, require adequate planning at the start of a research project so that
issues related to data integrity can be analyzed and addressed early on.
Considerations/issues in data handling
Issues that should be considered in ensuring integrity of data handled include the following:
Type of data handled and its impact on the environment (especially if it is on a toxic media).
Type of media containing data and its storage capacity, handling and storage requirements,
reliability, longevity (in the case of degradable medium), retrieval effectiveness, and ease of
upgrade to newer media.
Data handling responsibilities/privileges, that is, who can handle which portion of data, at what
point during the project, for what purpose, etc.
Data handling procedures that describe how long the data should be kept, and when, how, and
who should handle data for storage, sharing, archival, retrieval and disposal purposes.
Deciding how long research data should be kept may depend on the nature of the project, sponsoring
agency’s guidelines, ongoing interest in or need for the data, cost of maintaining the data in the long run,
and other relevant considerations. Under current Health and Human Services requirements, research
records must be maintained for at least three years after the last expenditure report. Federal regulations
or institutional guidelines may require that data be retained for longer periods.
In the case of data stored electronically, the potential for altering, erasing, losing, or unauthorized access
is high. Several years of valuable research data can be compromised or lost as it happned in April 2001,
when an intruder broke into a server used by a group of Univeristy of Washington graduate students and
deleted the entire file system (UoW website, 2003). Although some aspects of protection from these
threats are the responsibility of IT professionals, researchers are ultimately responsible for ensuring the
security of their data.
In the “ Data Management Guidelines Issued by British Medical Research Council” published on the ORI
website (2003) it states that:
" If the data are recorded electronically, the data should be regularly backed up on disc; a hard copy
should be made of particularly important data; relevant software must be retained to ensure future
access, and special attention should be given to guaranteeing the security of electronic data” (ORI
Creating a secure environment for electronic data usually involves all members of a project, which can
include an IT Manager, system administrator, support personnel, and several end- users. Some issues to
consider when handling data electronically include the following:
Protect systems’ and individual files with login and passwords
Manage access rights (in the case of computer system administrators not involved in the project
their access rights could be limited)
Regularly update virus protection to prevent vulnerability of data
Limit physical access to equipment and storage media (for example, in the case of data stored on
a computer using a stand-alone computer may be secure than a networked, computer)
Accurate data removal from old hardware and certification that the data was removed
Ensure data recoverability in case of emergencies
Regularly update electronic storage media to avoid outdated storage/retrieval devices
Backup multiple copies in secured multiple locations
Encrypt files when wireless devices are used, and keep track of wireless connectivity to prevent
accidental file sharing
Record date and time when a piece of electronic data was originally recorded to prevent
alteration or manipulation at a future date
In the article entiled “Preventing data theft”, Lynn Greiner quotes Paul Hyde, CEO of Kasten Chase (a
company that develops high-assurance data security systems) that:
" It's important to have a level of security that is adequate if the machine is stolen. Everyone who is in the
position where they could be separated from the device needs security.I think the best way to look at it, is
to look at the criticality of what you're doing, (and) of its importance to the business environment. You
have to determine what the value of the information is, and match up security accordingly" (Greiner,
One of the key issues to consider in storing or archiving data manually or electronically is “configuration
management.” This involves keeping track of data on different media or format during different stages of
the project by different users. For example, in a research effort raw data could be recorded in a laboratory
notebook, then transferred to an electronic data file for analysis, which could result in output data. The
output data then could be converted to plots or graphs. Configuration management will involve keeping
track of all these and upgrading the data to newer media or formats as necessary during the life of a
particular project. Effective configuration management will not only ensure data integrity but also simplify
the use of data .
Disposing research data requires adequate plans, procedures, and impact analysis to ensure that the
appropriate data is discarded in a safe and secure manner. Retaining data on paper files and electronic
media when not needed after a project is over can lead to unauthorized access to confidential data. The
likelihood of this is very high especially when principal investigators retire, leave the project, or die without
establishing proper data management procedures on which data should be kept, disposed off, shared,
Disposing of data containing confidential information on human subjects or national security requires
additional care to ensure that the information could not be reconstructed from the disposed media. When
disposing electronically data stored on computer disks, the disks will have to be erased several times and
certified that data could not be recovered from them. Some federal and state agencies have guidelines on
how many times a computer disk should be erased to ensure the disk is free of recoverable data. In the
case of data stored on film or other toxic media, care should be taken to ensure that the disposal process
does not pollute the environment.
Research organizations often contract to commercial data disposing companies to dispose of data stored
on non-electronic media such as laboratory notebooks, paper files, etc., and it is the responsibility of the
research organization to ensure the commercial company will dispose off the data in a safe and non-
recoverable manner .
Data handling requires adequate planning, development of procedures, and training and supervision of
research staff to ensure that data is stored, archived or disposed off in a safe and secure manner that
preserves the integrity of research data as well as simplifies data management .
University of Washington. "Is Your Computer Safe?" Computing & Communications Windows on
Technology, No. 27, June 2002. 18 Nov.
Greiner, Lynn. "Preventing data theft " Computer Dealer News, February 22, 2002, Vol. 18 No. 3. 21 Nov.
Office of Research Integrity. "Data Management Guidelines Issued by British Medical Research Council"
September 2001, Vol. 9, No. 4. 20 Nov. 2003. http://ori.dhhs.gov/html/resources/britishmed.asp
Source: University Of Texas Southwestern Medical Center At Dallas Date: 2000-10-10 Collecting
Research Data On Computer Wave Of Future, UT Southwestern Researchers Report In
RCR Education Consortium (2004). Accessed on April 15,
Data Analysis is the process of systematically applying statistical and/or logical techniques to
describe and illustrate, condense and recap, and evaluate data. According to Shamoo and Resnik (2003)
various analytic procedures “provide a way of drawing inductive inferences from data and distinguishing
the signal (the phenomenon of interest) from the noise (statistical fluctuations) present in the data”..
While data analysis in qualitative research can include statistical procedures, many times analysis
becomes an ongoing iterative process where data is continuously collected and analyzed almost
simultaneously. Indeed, researchers generally analyze for patterns in observations through the entire
data collection phase (Savenye, Robinson, 2004). The form of the analysis is determined by the specific
qualitative approach taken (field study, ethnography content analysis, oral history,
biography, unobtrusive research) and the form of the data (field notes, documents, audiotape,
An essential component of ensuring data integrity is the accurate and appropriate analysis of research
findings. Improper statistical analyses distort scientific findings, mislead casual readers (Shepard, 2002),
and may negatively influence the public perception of research. Integrity issues are just as relevant to
analysis of non-statistical data as well.
Considerations/issues in data analysis
There are a number of issues that researchers should be cognizant of with respect to data analysis.
Having the necessary skills to analyze
Concurrently selecting data collection methods and appropriate analysis
Drawing unbiased inference
Inappropriate subgroup analysis
Following acceptable norms for disciplines
Determining statistical significance
Lack of clearly defined and objective outcome measurements
Providing honest and accurate analysis
Manner of presenting data
Data recording method
Partitioning ‘text’ when analyzing qualitative data
Training of staff conducting analyses
Reliability and Validity
Extent of analysis
Having necessary skills to analyze
A tacit assumption of investigators is that they have received training sufficient to demonstrate a high
standard of research practice. Unintentional ‘scientific misconduct' is likely the result of poor instruction
and follow-up. A number of studies suggest this may be the case more often than believed (Nowak, 1994;
Silverman, Manson, 2003). For example, Sica found that adequate training of physicians in medical
schools in the proper design, implementation and evaluation of clinical trials is “abysmally small” (Sica,
cited in Nowak, 1994). Indeed, a single course in biostatistics is the most that is usually offered
(Christopher Williams, cited in Nowak, 1994).
A common practice of investigators is to defer the selection of analytic procedure to a research team
‘statistician’. Ideally, investigators should have substantially more than a basic understanding of the
rationale for selecting one method of analysis over another. This can allow investigators to better
supervise staff who conduct the data analyses process and make informed decisions
Concurrently selecting data collection methods and appropriate analysis
While methods of analysis may differ by scientific discipline, the optimal stage for determining appropriate
analytic procedures occurs early in the research process and should not be an afterthought. According to
Smeeton and Goda (2003), “Statistical advice should be obtained at the stage of initial planning of an
investigation so that, for example, the method of sampling and design of questionnaire are appropriate”.
Drawing unbiased inference
The chief aim of analysis is to distinguish between an event occurring as either reflecting a true effect
versus a false one. Any bias occurring in the collection of the data, or selection of method of analysis, will
increase the likelihood of drawing a biased inference. Bias can occur when recruitment of study
participants falls below minimum number required to demonstrate statistical power or failure to maintain a
sufficient follow-up period needed to demonstrate an effect (Altman, 2001).
Inappropriate subgroup analysis
When failing to demonstrate statistically different levels between treatment groups, investigators may
resort to breaking down the analysis to smaller and smaller subgroups in order to find a difference.
Although this practice may not inherently be unethical, these analyses should be proposed before
beginning the study even if the intent is exploratory in nature. If it the study is exploratory in nature, the
investigator should make this explicit so that readers understand that the research is more of a hunting
expedition rather than being primarily theory driven. Although a researcher may not have a theory-based
hypothesis for testing relationships between previously untested variables, a theory will have to be
developed to explain an unanticipated finding. Indeed, in exploratory science, there are no a priori
hypotheses therefore there are no hypothetical tests. Although theories can often drive the processes
used in the investigation of qualitative studies, many times patterns of behavior or occurrences derived
from analyzed data can result in developing new theoretical frameworks rather than determined a
priori (Savenye, Robinson, 2004).
It is conceivable that multiple statistical tests could yield a significant finding by chance alone rather than
reflecting a true effect. Integrity is compromised if the investigator only reports tests with significant
findings, and neglects to mention a large number of tests failing to reach significance. While access to
computer-based statistical packages can facilitate application of increasingly complex analytic
procedures, inappropriate uses of these packages can result in abuses as well.
Following acceptable norms for disciplines
Every field of study has developed its accepted practices for data analysis. Resnik (2000) states that it is
prudent for investigators to follow these accepted norms. Resnik further states that the norms are
‘…based on two factors:
(1) the nature of the variables used (i.e., quantitative, comparative, or qualitative),
(2) assumptions about the population from which the data are drawn (i.e., random distribution,
independence, sample size, etc.). If one uses unconventional norms, it is crucial to clearly state this is
being done, and to show how this new and possibly unaccepted method of analysis is being used, as well
as how it differs from other more traditional methods. For example, Schroder, Carey, and Vanable (2003)
juxtapose their identification of new and powerful data analytic solutions developed to count data in the
area of HIV contraction risk with a discussion of the limitations of commonly applied methods.
If one uses unconventional norms, it is crucial to clearly state this is being done, and to show how this
new and possibly unaccepted method of analysis is being used, as well as how it differs from other more
traditional methods. For example, Schroder, Carey, and Vanable (2003) juxtapose their identification of
new and powerful data analytic solutions developed to count data in the area of HIV contraction risk with
a discussion of the limitations of commonly applied methods.
While the conventional practice is to establish a standard of acceptability for statistical significance, with
certain disciplines, it may also be appropriate to discuss whether attaining statistical significance has a
true practical meaning, i.e., ‘clinical significance’. Jeans (1992) defines ‘clinical significance’ as “the
potential for research findings to make a real and important difference to clients or clinical practice, to
health status or to any other problem identified as a relevant priority for the discipline”.
Kendall and Grove (1988) define clinical significance in terms of what happens when “… troubled and
disordered clients are now, after treatment, not distinguishable from a meaningful and representative non-
disturbed reference group”. Thompson and Noferi (2002) suggest that readers of counseling literature
should expect authors to report either practical or clinical significance indices, or both, within their
research reports. Shepard (2003) questions why some authors fail to point out that the magnitude of
observed changes may too small to have any clinical or practical significance, “sometimes, a supposed
change may be described in some detail, but the investigator fails to disclose that the trend is not
statistically significant ”.
Lack of clearly defined and objective outcome measurements
No amount of statistical analysis, regardless of the level of the sophistication, will correct poorly defined
objective outcome measurements. Whether done unintentionally or by design, this practice increases the
likelihood of clouding the interpretation of findings, thus potentially misleading readers.
Provide honest and accurate analysis
The basis for this issue is the urgency of reducing the likelihood of statistical error. Common challenges
include the exclusion of outliers, filling in missing data, altering or otherwise changing data, data mining,
and developing graphical representations of the data (Shamoo, Resnik, 2003).
Manner of presenting data
At times investigators may enhance the impression of a significant finding by determining how to
present derived data (as opposed to data in its raw form), which portion of the data is shown, why, how
and to whom (Shamoo, Resnik, 2003). Nowak (1994) notes that even experts do not agree in
distinguishing between analyzing and massaging data. Shamoo (1989) recommends that investigators
maintain a sufficient and accurate paper trail of how data was manipulated for future review.
The integrity of data analysis can be compromised by the environment or context in which data was
collected i.e., face-to face interviews vs. focused group. The interaction occurring within a dyadic
relationship (interviewer-interviewee) differs from the group dynamic occurring within a focus group
because of the number of participants, and how they react to each other’s responses. Since the data
collection process could be influenced by the environment/context, researchers should take this into
account when conducting data analysis.
Data recording method
Analyses could also be influenced by the method in which data was recorded. For example, research
events could be documented by:
a. recording audio and/or video and transcribing later
b. either a researcher or self-administered survey
c. either closed ended survey or open ended survey
d. preparing ethnographic field notes from a participant/observer
e. requesting that participants themselves take notes, compile and submit them to researchers.
While each methodology employed has rationale and advantages, issues of objectivity and subjectivity
may be raised when data is analyzed.
Partitioning the text
During content analysis, staff researchers or ‘raters’ may use inconsistent strategies in analyzing text
material. Some ‘raters’ may analyze comments as a whole while others may prefer to dissect text material
by separating words, phrases, clauses, sentences or groups of sentences. Every effort should be made to
reduce or eliminate inconsistencies between “raters” so that data integrity is not compromised.
Training of Staff conducting analyses
A major challenge to data integrity could occur with the unmonitored supervision of inductive techniques.
Content analysis requires raters to assign topics to text material (comments). The threat to integrity may
arise when raters have received inconsistent training, or may have received previous training
experience(s). Previous experience may affect how raters perceive the material or even perceive the
nature of the analyses to be conducted. Thus one rater could assign topics or codes to material that is
significantly different from another rater. Strategies to address this would include clearly stating a list of
analyses procedures in the protocol manual, consistent training, and routine monitoring of raters.
Reliability and Validity
Researchers performing analysis on either quantitative or qualitative analyses should be aware of
challenges to reliability and validity. For example, in the area of content analysis, Gottschalk (1995)
identifies three factors that can affect the reliability of analyzed data:
stability , or the tendency for coders to consistently re-code the same data in the same way over
a period of time
reproducibility , or the tendency for a group of coders to classify categories membership in the
accuracy , or the extent to which the classification of a text corresponds to a standard or norm
The potential for compromising data integrity arises when researchers cannot consistently demonstrate
stability, reproducibility, or accuracy of data analysis
According Gottschalk, (1995), the validity of a content analysis study refers to the correspondence of the
categories (the classification that raters’ assigned to text content) to the conclusions, and the
generalizability of results to a theory (did the categories support the study’s conclusion, and was the
finding adequately robust to support or be applied to a selected theoretical rationale?).
Extent of analysis
Upon coding text material for content analysis, raters must classify each code into an appropriate
category of a cross-reference matrix. Relying on computer software to determine a frequency or word
count can lead to inaccuracies. “One may obtain an accurate count of that word's occurrence and
frequency, but not have an accurate accounting of the meaning inherent in each particular usage”
(Gottschalk, 1995). Further analyses might be appropriate to discover the dimensionality of the data set
or identity new meaningful underlying variables.
Whether statistical or non-statistical methods of analyses are used, researchers should be aware of the
potential for compromising data integrity. While statistical analysis is typically performed on quantitative
data, there are numerous analytic procedures specifically designed for qualitative material including
content, thematic, and ethnographic analysis. Regardless of whether one studies quantitative or
qualitative phenomena, researchers use a variety of tools to analyze data in order to test hypotheses,
discern patterns of behavior, and ultimately answer research questions. Failure to understand or
acknowledge data analysis issues presented can compromise data integrity.
Gottschalk, L. A. (1995). Content analysis of verbal behavior: New findings and clinical applications.
Hillside, NJ: Lawrence Erlbaum Associates, Inc
Jeans, M. E. (1992). Clinical significance of research: A growing concern. Canadian Journal of Nursing
Research, 24, 1-4.
Lefort, S. (1993). The statistical versus clinical significance debate. Image, 25, 57-62.
Kendall, P. C., & Grove, W. (1988). Normative comparisons in therapy outcome. Behavioral Assessment,
Nowak, R. (1994). Problems in clinical trials go far beyond misconduct. Science. 264(5165): 1538-41.
Resnik, D. (2000). Statistics, ethics, and research: an agenda for educations and reform. Accountability in
Research. 8: 163-88
Schroder, K.E., Carey, M.P., Venable, P.A. (2003). Methodological challenges in research on sexual risk
behavior: I. Item content, scaling, and data analytic options. Ann Behav Med, 26(2): 76-103.
Shamoo, A.E., Resnik, B.R. (2003). Responsible Conduct of Research. Oxford University Press.
Shamoo, A.E. (1989). Principles of Research Data Audit. Gordon and Breach, New York.
Shepard, R.J. (2002). Ethics in exercise science research. Sports Med, 32 (3): 169-183.
Silverman, S., Manson, M. (2003). Research on teaching in physical education doctoral dissertations: a
detailed investigation of focus, method, and analysis. Journal of Teaching in Physical Education, 22(3):
Smeeton, N., Goda, D. (2003). Conducting and presenting social work research: some basic statistical
considerations. Br J Soc Work, 33: 567-573.
Thompson, B., Noferi, G. 2002. Statistical, practical, clinical: How many types of significance should be
considered in counseling research? Journal of Counseling & Development, 80(4):64-71.
Data publication and reporting is the process of preparing and disseminating research
findings to the scientific community. Scholarly disciplines can only advance through dissemination and
review of research findings at professional meetings and publications in discipline-related journals. The
tacit assumption in publishing is one of trust between the author(s) and readers regarding the accuracy
and truthfulness of any submission.
The practice of ensuring research integrity is relevant at all stages of research investigation, from early
conceptualization, design, implementation, to analysis. This practice also extends to the stage of
documenting and preparing results for publication. In this process, researchers may experience many
more challenges to preserving research integrity.
Considerations/issues in data reporting and publishing
There are often factors in research settings that can result in compromises to data integrity. These factors
may facilitate conditions where the goal of conducting research in as objective a manner as possible can
sometimes be challenged. These can be categorized as either external or internal factors as follows:
Lack of formal mentoring
Lack of penalties
Little chance of getting caught
Bad examples from mentors (Price, Drake, Islam, 2001)
Individual ego or vanity
Personal financial gain
Psychiatric illness (Weed, 1998)
Importance of accurate and honest data reporting
Investigators demonstrating lapses of integrity while engaged in data reporting and publishing can have a
negative influence in the direction of future research efforts, threaten to compromise the credibility of a
particular field of study, and may ultimately risk the well-being and safety of the public in general, as well
as research subjects in particular.
Sources of guidance promoting good data reporting practices and publishing include faculty advisors who
carefully instruct graduate students, departmental chairpersons mentoring researchers new to the field,
regular review of published university policies, existing codes of professional ethics, or established
government rules and regulations. Deficiencies in training or a lack of awareness of existing policies,
codes, or rules may increase the likelihood of a deviation from the acceptable standards of practice in
reporting and publishing.
Listed below are some issues related to integrity of data reporting and publication:
Due to problems data in collection, researchers may omit data that is not supportive of the research
hypothesis. Alternately, data may be fabricated if the data collection process was somehow interrupted or
data was lost, and the researchers believe the invented data would have been similar to what was
anticipated. In either case, the true scope of the data findings remains hidden from readers who are
unable to accurately assess the validity of the findings.
Plagiarism is the act of taking credit for ideas or data that rightfully belongs to others. Related to this is the
theft of ideas from grants and drafts of papers that a researcher has reviewed. This harms the
researcher(s) from which the idea(s) or data was appropriated improperly acknowledged.
Selectivity of reporting / failure to report all pertinent data
This is the practice of only using data that supports one’s research hypothesis and ignoring or omitting
data that does not. A related practice is inaccurate reporting of missing data points. As explained under
“Misrepresentation” earlier, the true scope of the data findings remains hidden from readers who are
unable to accurately assess the validity of the findings.
Failure to disclose conflicts of interest
Editors, reviewers, or readers who are not aware of possible conflicts of interest (financial and otherwise)
may not have an opportunity to adequately assess the validity of research findings without being aware of
possible undue influences from the sponsors of an investigation. These conflicts may compromise
researchers’ credibility in their fields.
Publication bias / neglecting negative results
Since the vast majority of research findings submitted to professional journals tend to be ‘positive’ in
nature, the literature in most scientific fields demonstrates a negative bias. This in part reflects the
reluctance of journal editors to publish articles with negative findings. Thus, researchers are less willing to
report findings that fail to demonstrate an intended effect or yield an expected result. The value of these
publications could be substantial in that other investigators would not needlessly pursue a fruitless path of
Analysis of data by several methods to find a significant result
This is also known as ‘milking’ or ‘dredging the data’ and involves researchers utilizing a variety of
statistical tests in the hopes of yielding a significant result. The proper procedure would be to base the
selection of desired tests on a theory or theoretical framework rather than selecting tests a priori. Other
related statistical issues include reporting percentages rather than absolute numbers due to small sample
size, reporting differences when statistical significance is not reached suggesting a certain trend exists,
reporting no difference when statistical power is inadequate, and failure to include the total number of
eligible participants. The importance of this last point is the difficulty for readers to be able to determine
whether a dismal non-respondent rate might compromise the representativeness of respondents.
Inadequate evaluation of prior research
This refers to an insufficient review of available literature that presents an incomplete picture of the
current status of a particular research area. A critique of the included citations may lack the required
depth of analysis and fail to justify the need for proposed research.
Ignoring citations or prior work that challenge stated conclusions or call current findings into question
Selective inclusion of citations that minimize threats to the justification for the present study can
compromise the integrity of the study. Whether done intentionally or not, omissions can have the
untoward consequence of providing support for an author’s position.
Misleading discussion of observations
This may result from using inappropriate statistical tests, neglecting negative results, omitting missing
data points, failing to report actual numbers of eligible subjects, using inappropriate graph labels or
terminology, and data dredging. These can result in readers becoming less able to objectively critique the
Reporting conclusions that are not supported
Faulty data collection, inappropriate analyses, gaps in logic, and unexplained deviation from
conventionally accepted methods of interpretation can result in conclusions that are not valid. Readers
cannot assess the validity of the conclusions for themselves unless all the necessary information is
Breaking down of a single piece of research into multiple overlapping reports
This can occur when the distinction and differences in findings between reports is negligible and the focus
is publishing for quantity versus quality. A related practice is submission of duplicate publications in
journal from different disciplines or in different languages. The expectation is that investigators would not
read journal from different fields of study or languages. Literature reviews or meta-analyses that are
conducted may lead to an inaccurate assessment of findings from a particular research area due to
duplicate publications of the same study in different journals.
Just Attribution of Authorship
Publication disputes generally fall into four categories(Ritter, Washington, 2001):
1. a researcher is listed as an author but did not have a chance to review or approve the manuscript
2. a researcher is promised first authorship when the project is completed, but the principal
investigator adds the work of someone else, who then becomes first author
3. a researcher claims first authorship on the basis of the amount of work he or she did when not
given that recognition, and
4. after leaving a laboratory, a researcher does not receive credit in an article that includes his or
her work. Related to this is submission of manuscripts not seen and reviewed by all the listed co-
authors of a publication
A fair and equitable understanding of each author’s contribution to published research provides clear
credit and acknowledgement for advancing a field of study.
Inappropriate use of terminology without precise definitions
A potential barrier to successful cross-disciplinary investigations is the use of field-specific terminology.
Encouraging the use of precise definitions can reduce confusion and promote understanding of research
Inflation of research results for the media
This involves providing statements for public and not professional consumption that are insufficiently
supported by data for the purpose of publishing un-reviewed or untested results in a non-scientific or non-
scholarly magazine/media. Premature reporting of results that turn out to be unsubstantiated may
compromise the credibility of a particular field.
Publishing in peer-reviewed journals or presenting in scholarly meetings is the primary mechanism for
investigators to disseminate their findings to the research community. This community relies on authors(s)
to report the events of a study honestly and accurately. All researchers should be aware of the issues that
compromise the integrity of data reporting and publishing. Ensuring integrity is essential to promoting the
credibility of all fields of study.
Marco, C.A., Larkin, G.L. (2000) Research ethics: ethical issues of data reporting and the quest for
authenticity.Acad Emerg Med (Academic emergency medicine: official journal of the Society for Academic
Emergency Medicine.), 7 (6): 691-694.
Price, J.H., Drake, J.A., Islam, R. (2001). Selected ethical issues in research and publication: perceptions
of health education faculty. Health Education & Behavior, 28 (1): 51-64.
Stephen, K.R., Washington, C., Washington, E.N. (2001). Publication ethics: rights and wrongs:
Balancing obligations and interests surrounding dissemination of research is an arduous task. Science &
Technology , 79 (46): 24-31.
Weed, D.L. Preventing scientific misconduct. American Journal of Public Health, 88 (1) (Jan 1998): 125-
Data ownership refers to both the possession of and responsibility for information. Ownership
implies power as well as control. The control of information includes not just the ability to access, create,
modify, package, derive benefit from, sell or remove data, but also the right to assign these access
privileges to others (Loshin, 2002).
Implicit in having control over access to data is the ability to share data with colleagues that promote
advancement in a field of investigation (the notable exception to the unqualified sharing of data would be
research involving human subjects). Scofield (1998) suggest replacing the term ‘ownership’ with
‘stewardship’, “because it implies a broader responsibility where the user must consider the
consequences of making changes over ‘his’ data”.
According to Garner (1999), individuals having intellectual property have rights to control intangible
objects that are products of human intellect. The range of these products encompasses the fields of art,
industry, and science. Research data is recognized as a form of intellectual property and subject to
protection by U.S. law.
Importance of data ownership:
According to Loshin (2002), data has intrinsic value as well as having added value as a byproduct of
information processing, “at the core, the degree of ownership (and by corollary, the degree of
responsibility) is driven by the value that each interested party derives from the use of that information”.
The general consensus of science emphasizes the principle of openness (Panel Sci. Responsib. Conduct
Res. 1992). Thus, sharing data has a number of benefits to society in general and protecting the integrity
of scientific data in particular. The Committee on National Statistics’ 1985 report on sharing data
(Fienberg, Martin, Straf, 1985) noted that sharing data reinforces open scientific inquiry, encourages a
diversity of analyses and conclusions, and permits:
1. reanalyses to verify or refute reported results
2. alternative analyses to refine results
3. analyses to check if the results are robust to varying assumption
The cost and benefits of data sharing should be viewed in ethical, institutional, legal, and professional
dimensions. Researchers should clarify at the beginning of a project if data can or cannot be shared,
under what circumstances, by and with whom, and for what purposes.
Considerations/issues in data ownership
Researchers should have a full understanding of various issues related to data ownership to be able to
make better decisions regarding data ownership. These issues include paradigm of ownership, data
hoarding, data ownership policies, balance of obligations, and technology. Each of these issues gives
rise to a number of considerations that impact decisions concerning data ownership
Paradigm of Ownership – Loshin (2002) alludes to the complexity of ownership issues by identifying the
range of possible paradigms used to claim data ownership. These claims are based on the type and
degree of contribution involved in the research endeavor. Loshin (2002) identifies a list of parties laying a
potential claim to data:
Creator – The party that creates or generate data
Consumer – The party that uses the data owns the data
Compiler - This is the entity that selects and compiles information from different information
Enterprise - All data that enters the enterprise or is created within the enterprise is completely
owned by the enterprise
Funder - the user that commissions the data creation claims ownership
Decoder - In environments where information is “locked” inside particular encoded formats, the
party that can unlock the information becomes an owner of that information
Packager - the party that collects information for a particular use and adds value through
formatting the information for a particular market or set of consumers
Reader as owner - the value of any data that can be read is subsumed by the reader and,
therefore, the reader gains value through adding that information to an information repository
Subject as owner - the subject of the data claims ownership of that data, mostly in reaction to
another party claiming ownership of the same data
Purchaser/Licenser as Owner – the individual or organization that buys or licenses data may
stake a claim to ownership
This practice is considered antithetical to the general norms of science emphasizing the principle of
openness. Factors influencing the decision to withhold access to data could include (Sieber, 1989):
(a) proprietary, economic, or security concerns
(b) documenting data which can be extremely costly and time consuming
(c) providing all the materials needed to understand or extend the research
(d) technical obstacles to sharing computer-readable data
(f) concerns about the qualifications of data requesters
(g) personal motives to withhold data
(h) costs to the borrowers
(i) costs to funders
Data Ownership Policies
Institutional policies lacking specificity, supervision, and formal documentation can increase the risk of
compromising data integrity. Before research is initiated, it is important to delineate the rights, obligations,
expectations, and roles played by all interested parties. Compromises to data integrity can occur when
investigators are not aware of existing data ownership policies and fail to clearly describe rights, and
obligations regarding data ownership. Listed below are some scenarios between interested parties that
warrant the establishment of data ownership policies
Between academic institution and industry (public/private sector) – This refers to the sharing
of potential benefits resulting from research conducted by academic staff but funded by corporate
sponsors. The failure to clearly delineate data ownership issues early in public/private
relationships has created controversy concerning the rights of academic institutions and those of
industry sponsors (Foote, 2003).
Between academic institution and researcher staff –According to Steneck (2003) research
funding is awarded to research institutions and not individual investigators. As recipients of funds,
these institutions have responsibilities for overseeing a number of activities including budgets,
regulatory compliance, and the management of data. Steneck (2003) notes “To assure that they
are able to meet these responsibilities, research institutions claim ownership rights over data
collected with funds given to the institution. This means that researchers cannot automatically
assume that they can take their data with them if they move to another institution. The research
institution that received the funds may have rights and obligations to retain control over the data”.
Fishbein (1991) recommended that institutions clearly state their policies regarding ownership of
data, and present guidelines for such a policy.
Collaboration between research colleagues–This is applicable to collaborative efforts that
occur both within and between institutions. Whether collaborations are between faculty peers,
students, or staff, all parties should have a clear understanding of who will determine how the
data will be distributed and shared (if applicable) even before it is collected.
Between authors and journals - To reduce the likelihood of copyright infringement, some
publishers require a copyright assignment to the journal at the time of submission of a
manuscript. Authors should be aware of the implications of such copyright assignments and
clarify the policies involved.
Balance of obligations
Investigators must learn to negotiate the delicate balance that exists between an investigator’s willingness
to share data in order to facilitate scientific progress, and the obligation to employer/sponsor,
collaborators, and students to preserve and protect data (Last, 2003). Signed agreements of
nondisclosure between investigators and their corporate sponsors can circumvent efforts to publish data
or share with colleagues. However, in some cases as with human participants data sharing may not be
allowed due to confidentiality reasons.
Advances in technology have enabled investigators to explore new avenues of research, enhance
productivity, and use data in ways unimagined before. However, careless application of new technologies
has the potential to create a slew of unanticipated data ownership problems that can compromise
research integrity. The following examples highlight data ownership issues resulting from the careless
application of technology:
Computer – The use of computer technology has permitted rapid access to many forms of
computer-generated data (Veronesi, 1999). This is particularly the case in the medical profession
where patient medical record data is becoming increasingly computerized. While this process
facilitates data access to health care professionals for diagnostic and research purposes,
unauthorized interception and disclosure of medical information can compromise patients’ right of
privacy. While the primary justification for collecting medical data is to benefit the patient, Cios
and Moore (2002) question whether medical data has a special status based on their applicability
to all people.
Genetics – Due to advances in technology, i nvestigators of the Human Genome Project have
opportunities to make significant contributions by addressing previously untreatable diseases and
other human conditions. However, the status of genetic material and genetic information remains
unclear (de Witte, Welie, 1997). Wiesenthal and Wiener (1996) discuss the conflict between the
rights of the individual for privacy, and the need for societal protection. The critical issues that
investigators need to be aware of include the ownership of genetic data, confidentiality rights to
such information, and legislation to control genetic testing and its applications (Wiesenthal and
The mentioned data ownership issues serve to highlight potential challenges to preserving data integrity.
While the ideal is to promote scientific openness, there are situations where it may not be appropriate
(especially in the case of human participants) to share data. The key is for researchers to know various
issues impacting ownership and sharing of their research data and make decisions that promote scientific
inquiry and protect the interests of the parties involved.
Cios, K. J., Moore, G. W. (2002). Uniqueness of medical mining. Artif Intell Med (Artificial intelligence in
medicine), 26(1-2): 1-24.
de Witte, J. I. & Welie, J. V. (1997). The status of genetic material and genetic information in The
Netherlands. Soc Sci Med (Social Science & Medicine (1982), 45(1): 45-9.
Fienberg, S. E., Martin, M.E., Straf, M.L. (1985). Sharing Research Data. Washington , DC: National
Fishbein, E. A. (1991). Ownership of research data. Academic Medicine, 66(3), 129-33.
Foote, M. (2003). Review of current authorship guidelines and the controversy regarding publication of
clinical data. Biotechnol Annu Rev (Biotechnology annual review), 9: 303-13.
Garner, B. A. (1999). Black’s Law Dictionary, 7 th edition. West Group, St. Paul, MN.
Last, R. L. (2003). Sandbox ethics in science: sharing of data and materials in plant biology. Plant Physiol
(Plant physiology.), 132(1): 17-8.
Loshin, D. (2002). Knowledge Integrity: Data Ownership (Online) June 8,
Panel Sci. Responsib. Conuct Res. (1992). Responsible Science. Ensuring the Integrity of the Research
Process. Vol. 1. Comm. Sci. Eng. Public Policy. Washington, DC: Natl. Acad. Press.
Scofield, M. (1998). Issues of Data Ownership (online), retrieved June 10,
Shamoo, A. E., Resnik, D. B. (2002). Intellectual Property. Responsible Conduct of Research. New York:
Oxford University Press.
Sieber, J. E. (1989). Sharing scientific data I: new problems for IRBs. IRB (IRB; a Review of Human
Subjects Research), 11(6): 4-7.
Steneck, N. H. (2003). ORI Introduction to the Responsible Conduct of Research. Department of Health
and Human Services.
Veronesi, J. F. (1999). Ethical issues in computerized medical records. Crit Care Nurs Q (Critical Care
Nursing Quarterly), 22(3): 75-80.
Wiesenthal, D. L., Wiener, N. I. (1996). Privacy and the Human Genome Project. Ethics Behav (Ethics &