La Trobe University Library partnered with our Health Sciences academics to procure datasets from two Victorian regional health service providers in 2014/15 and from these created a publically available, healthy communities data collection for research purposes
5. 5La Trobe University
Research Data Managment
• La Trobe University Working
party from various business
depts
• Most universities grappling with
the same issues
• Managing research data can be
complex & messy
• There’s always exceptions to the
norm
Sam Searle, Content and Discovery Services, Griffith University
6. 6La Trobe University
What are research data?
6
“It is not possible to apply a uniform definition of research data across all
disciplines. Research data may be numerical, textual, audio-visual,
digital or physical, depending on the discipline and the nature of the
research.”
Source: University of Sydney Research Data Management Policy 2014 http://sydney.edu.au/policies/showdoc.aspx?recnum=PDOC2013/337
7. 7La Trobe University
Why now?
• Good practice
• Funder requirements
• Compliance
• Increased citation
Australian Code for the Responsible Conduct of
Research, states:
Policies are required that address the ownership
of research materials and data, their storage,
their retention beyond the end of the project,
and appropriate access to them by the research
community.
8. 8La Trobe University
General issues to manage when dealing with research data…
• Research data can be digital or analogue
• Dealing with sensitive data
• Copyright & Licensing
• Not all research data will be suitable for Open Access
• Issues relating to validation/review of research data (established principles &
criteria for journal articles & thesis)
• Data Management plans
9. 9La Trobe University
Population Health data collection
for the City of Greater Bendigo
http://dx.doi.org/10.4225/22/55BAE9DBD9670
10. 10La Trobe University
Some key issues/terms
• Research
• Ethics
• Privacy
• Confidentiality
• Consent
• Risk
• Harm
• Identifier
• Databank
• De-identified data
• CC-BY
11. 11La Trobe University
ANDS Major Open Data Collections (MODC) project
• The project was funded by the Australian National Data Service as part of its
Open Data Collections program to support partner institutions make available an
internationally significant open research data collection.
12. 12La Trobe University
ANDS Major Open Data Collections (MODC) project
• La Trobe University Library partnered with the Building Healthy Communities
Research Focus Area (RFA) to trial processes to develop a Healthy Communities
Data Collection.
• Funders requirements:
• There is a demonstrated research need for the data beyond the scope of the
institution
• The data should be well described and well linked having richly connected data
collections and sub-collections
• The open data collection must be discoverable through appropriate means including
Research Data Australia as well as institutional, international and discipline specific
portals
13. 13La Trobe University
The Bendigo Health Population data collection
• Health researchers have a major interest
in accessing clinical data to support
research to inform and improve
population health and health services.
• Cardiovascular disease is the leading
cause of death in Australia, being
responsible for 33.7 per cent of all deaths
in 2008 (ABS 2010). Furthermore,
cardiovascular disease is the second
leading cause of the disease burden (18
per cent of the total burden) (Begg et al.
2007).
• Curate & identify connections
between disparate data sets.
14. 14La Trobe University
Secondary research data from regional health service providers
• Bendigo Health: 48,000 patient
records relating to Circulatory
system diseases (ICD code range of
I00–I99) for patients over 40 years
of age
• Loddon Mallee Murray Medicare
Local: 245,000 patient records from
General Practices within the
Loddon/Murray/Mallee catchment
area for a broad range of health
issues
• Other sources: ABS, AIHW
MODC
Data
collection
BH
LMMML
Other
data
Curate & identify connections
between disparate data sets.
15. 15La Trobe University
Key issues for project team
• No precedence for publication of secondary data
(methodology, workflow, supporting templates)
• Obligations when handling sensitive health data
requires systems and process to ensure the security
and integrity of the data is managed
• Collection, Use & Disclosure of data involving human
subjects is subject to ethics approvals
• The O in MODC is for Open: third-party material to be
licenced under conditions that support re-use & re-
purposing
16. 16La Trobe University
Some project house keeping: systems & templates to develop
• Schemas / Data dictionary
• Data sample
• Data extraction plan
• Ethics
• Secure storage / Access restricted by roles
• License agreement
• Metadata
• Research data management plan
• Sensitive data
17. 17La Trobe University
Stage one: data acquisition
• Preliminary meetings
• Positive response from both health services
• “Define area of research interest and we can
extract and supply the data”
• La Trobe Human Research Ethics Committee
approved
• More meetings with Health IT Manager, La Trobe
Epidemiologist and Library repository staff
18. 18La Trobe University
Stage one: define area of interest
Bendigo Health (BH) CIO required detailed scope of intention
& data planning for the project, including:
• Clear statement of intent – including inputs/outputs
• Area of interest (scope)
• Any internal resources required
• Roadmap & Timeline
• May require approval from BH Human Research Ethics
Committee (HREC)
19. 19La Trobe University
Stage one: Human Research Ethics
• The primary role of Bendigo Health’s Human
Research Ethics Committee is to protect the welfare
and rights of participants in research.
• HREC review research proposals and make judgments
on whether risks of the research are justified by the
potential benefits.
• Must meet the requirements for ethical research
within National Statement on Ethical Conduct in
Human Research (2007)
• Also familiarize yourself with the relevant
Commonwealth and State legislation to ensure your
project complies with human research and privacy
laws
21. 21La Trobe University
What is human research?
• Human research is conducted with or about people through:
• Taking part in surveys, interviews or focus groups
• Undergoing psychological, physiological or medical testing or treatment
• Being observed by researchers
• Researchers having access to their personal documents or other materials
• The collection and use of their body organs, tissues or fluids (eg skin,
blood, urine, saliva, hair, bones, tumour and other biopsy specimens) or
their exhaled breath
• Access to their information as part of an existing published or unpublished
source or database
22. 22La Trobe University
National Statement on
Ethical Conduct in
Human Research, 2007
(Updated May 2015)
• Two themes must always be
considered in human research:
the risks and benefits of
research, and participants’
consent.
• The National Statement allows for
different levels of ethical review
of research, reflecting the
difference in degree of risk
involved.
23. 23La Trobe University
NS 2.1: Risk and Benefit
• The expression low risk research describes research in which the only
foreseeable risk is one of discomfort.
• The expression negligible risk research describes research in which there is no
foreseeable risk of harm or discomfort; and any foreseeable risk is no more than
inconvenience.
GUIDELINE 2.1.2: Risks to
research participants are
ethically acceptable only if they
are justified by the potential
benefits of the research.
24. 24La Trobe University
NS 2.2: General requirements for Consent
• Consent to participate in research must be
voluntary and based on sufficient information
and adequate understanding of both the
proposed research and the implications of
participation in it.
• Depending upon the circumstances of an
individual project it may be justifiable to
employ an opt-out approach or a waiver of the
requirement for consent, rather than seeking
explicit consent.
GUIDELINE 2.2.1
The guiding principle for
researchers is that a person’s
decision to participate in
research is to be voluntary, and
based on sufficient information
and adequate understanding of
both the proposed research
and the implications of
participation in it.
GUIDELINE 2.2.2
Participation that is voluntary
and based on sufficient
information requires an
adequate understanding of the
purpose, methods, demands,
risks and potential benefits of
the research.
25. 25La Trobe University
NS 3.2 Databanks
• The National Statement defines databanks as
“[A] systematic collection of data … If data are
being collected, aggregated and stored with a
view to use for future related or as yet
unspecified research, this may involve ‘banking’
the participants’ data.”
• The term databanks includes databases.
• Types of research that commonly make use of
databanks include epidemiology, pathology,
genetics and social sciences.
GUIDELINE 3.2.1
When planning a databank,
researchers should clearly
describe how their research
data will be collected, stored,
used and disclosed, and outline
how that process conforms to
this National Statement,
particularly the requirements
for consent set out in
paragraphs 2.2.14 to 2.2.18.
GUIDELINE 3.2.3
Researchers’ use of data from
databanks must comply with
conditions specified by the
providers of the data; in
particular, any conditions on
the identifiability of the data
(see paragraphs 2.2.14 to
2.2.18).
26. 26La Trobe University
NS 5 (Processes of research governance and ethical review)
Institutional responsibilities
• Research involving no more than low risk
can be exempted from review
• Institutions may choose to exempt from
ethical review research that:
a) is negligible risk research (as defined
in paragraph 2.1.7); and
b) involves the use of existing collections
of data or records that contain only
non-identifiable data about human
beings.
• Deciding to exempt research from ethical
review still means the research must meet
the requirements of the National Statement
and be ethically acceptable.
GUIDELINE 5.1.8
Research that carries only
negligible risk (see paragraph
2.1.7) and meets the
requirements of paragraphs
5.1.22 and 5.1.23 may be
exempted from ethical review.
28. 28La Trobe University
Data identifiability
• Individually identifiable data: where the identity of
a specific individual can reasonably be ascertained.
Examples of identifiers include the individual’s name,
image, date of birth or address
• Re-identifiable data: from which identifiers have
been removed and replaced by a code, but it remains
possible to re-identify a specific individual by, for
example, using the code or linking different data sets
• Non-identifiable data: where no specific individual
can be identified, as the data has never been labelled
with individual identifiers or from which identifiers
have been permanently removed
29. 29La Trobe University
Legal requirements
• Every project will involve the collection, use or disclosure of some piece of
information.
• Researchers should review ALL Privacy Principles in the relevant legislation to
ensure that their project is fully compliant with all aspects of the law.
• Researchers are responsible for identifying the relevant Act and guidelines
under which an application for approval of a project is made.
• If more than one Act (or set of guidelines) applies, all relevant legislative
requirements will need to be met, including the obtaining of any necessary
approvals from a Human Research Ethics Committee. The statutory guidelines
referred to above are not identical, as they must reflect the various statutes
under which they are made and any different requirements must be adhered to.
30. 30La Trobe University
Victorian Laws
• In Victoria there is a requirement to comply with legislation relevant to human
research involving information privacy (Information Privacy Act 2000) and
health information (Health Records Act 2001).
• The Health Records Act 2001 (Victoria) applies to all health information handled
by the Victorian public sector and private sector. There are eleven Health
Privacy Principles (HPPs). HPP 1 and 2 govern the collection, use and
disclosure of health information, including for the purposes of research.
• The Information Privacy Act 2000 (Victoria) regulates the responsible collection
and handling of personal information – which includes “sensitive information”
but excludes health information by organisations in the Victorian public sector,
including universities. Sets out ten Information Privacy Principles (IPPs). IPPs 1,
2 and 10 deal with the collection, use and disclosure of this information for
the purposes of research.
31. 31La Trobe University
Commonwealth Law
• The Privacy Act 1988 (Cth) outlines thirteen Australian
Privacy Principles, which establish requirements for the
collection, storage, use and disclosure of personal
information and health information.
Sections 16A and 16B of the Privacy Act set out certain
circumstances in which it is permissible to collect, use and
disclose personal information and health information for
the purposes of research.
32. 32La Trobe University
Definitions by law
• Collection: an organisation or individual collects information if it gathers,
acquires or obtains information from any source and by any means, whether
that information has been requested or not. Questionnaires, surveys, interviews,
focus groups and requests for information held in databases, data sets or
institutional records are all examples of how information may be collected.
• Use: an organisation or individual uses information if it handles the information
in any way. Use of information includes any form of quantitative or qualitative
analysis and any inclusion of the information in any form of publication.
• Disclosure: an organisation or individual discloses information when it releases
information to other organisations or individuals (that is, outside of those who
collected the information in the first instance).
33. 33La Trobe University
Step 2: Data cleansing and merging
• Data preparation and linkage:
• Filter / Screen fields (eg: Pensioners, ATSI)
• Aggregation / Band fields (eg: DOB)
• BH- and ABS- data were joined on SLA- (‘Statistical Local Area’) codes
• ML- and ABS- data were joined on SLA+ML (Medicare-Local) codes
• How much data in total?
• ML - 221,268 patient records
• BH - 40,237 patient records
• Other tables incl Measurement; Medication, Diagnosis.
34. 34La Trobe University
Step 3: Deposit to La Trobe repository
• Supported by data dictionary & reusable format
• Metadata created to describe collection and distribute through La Trobe repository:
• Research Data Australia
• National Library’s TROVE service
• DataCite
• Google
35. 35La Trobe University
http://hdl.handle.net/1959.9/319746
LTU Research Online repository
Title: Population Health data collection for the City of Greater Bendigo.
Keywords: Health informatics; Epidemiology; Heart disease; Circulatory system disease; Health data analysis
Description: This data collection contains de-identified clinical health service utilisation data from Bendigo Health and the General
Practitioners Practices associated with the Loddon Mallee Murray Medicare Local. The collection also includes associated population
health data from the ABS, AIHW and the Municipal Health Plans. Health researchers have a major interest in how clinical data can be
used to monitor population health and health care in rural and regional Australia through analysing a broad range of factors shown to
impact the health of different populations. The Population Health data collection provides students, managers, clinicians and
researchers the opportunity to use clinical data in the study of population health, including the analysis of health risk factors, disease
trends and health care utilisation and outcomes.
37. 37La Trobe University
Funders supporting the re-use and re-purposing of open
research data
The Australian Research Council (ARC) Open Access Policy:
• “Any publications arising from an ARC supported research Project must be
deposited into an open access institutional repository within a twelve (12)
month period from the date of publication.”
http://www.arc.gov.au/arc-open-access-policy
38. 38La Trobe University
Accessing and Using Publicly Funded Data for Health Research
The National Health and Medical Research Council has drafted a framework of
principles for researchers and data custodians to consider when requests or
applications are made for access to existing health and health-related datasets for
research purposes.
1. Research use of publicly held health and health-related data should be
maximised
2. Data custodians should recognise their responsibilities and accountabilities
when providing access to data for research.
3. Researchers should recognise their responsibilities and accountabilities when
accessing and using publicly held health and health related datasets
39. 39La Trobe University
Lessons learnt
• No such thing as a free lunch: Open access projects still require investments of
time, money and expertise
• Relationships: Bendigo hospital, Loddon Mallee Murray Medicare Local ANDS
• New model for releasing secondary data: little precedence for open
publication of data alone
• Technical: Disparate data from different proprietary technical systems
• Managing risk: dealing with sensitive health data under an open access model
40. 40La Trobe University
Key terms (National Statement on Ethical Conduct
in Human Research 2007)
• Research: Includes at least investigation undertaken to gain knowledge
and understanding or to train researchers
• Ethics: The concepts of right and wrong, justice and injustice, virtue and
vice, good and bad, and activities to which these concepts apply
• Privacy: A domain within which individuals and groups are entitled to be
free from the scrutiny of others
• Confidentiality: The obligation of people not to use private information –
whether private because of its content or the context of its communication -
for any purpose other than that for which it was given to them
• Consent: A person’s or group’s agreement, based on adequate knowledge
and understanding of relevant material, to participate in research
41. 41La Trobe University
Key terms (National Statement on Ethical Conduct
in Human Research 2007)
• Risk: The function of the magnitude of a harm and the probability
that it will occur
• Harm: that which adversely affects the interests or welfare of an
individual or a group. Harm includes physical harm, anxiety, pain,
psychological disturbance, devaluation of personal worth and social
disadvantage
• Identifier: Details attached to data, such as name and/or contact
information, that identify an individual
• Databank: A systematic collection of data, whether individually
identifiable, re-identifiable or non-identifiable
• De-identified data: NS avoids term as it’s meaning is unclear.
• CC-BY: Attribution Creative Commons license
42. 42La Trobe University
Resources
• ANDS ‘Publishing and Sharing Sensitive Data’ -
http://ands.org.au/guides/sensitivedata.html
• ANDS ‘Ethics, consent and data sharing’ - http://ands.org.au/guides/ethics-working-
level.html
• How to confidentialise data: the basic principles, National Statistical Service -
http://www.nss.gov.au/nss/home.nsf/pages/Confidentiality+-
+How+to+confidentialise+data:+the+basic+principles
• The National Statement on Ethical Conduct in Human Research (2007) -
http://www.nhmrc.gov.au/guidelines-publications/e72
• [DRAFT] Principles for Accessing and Using Publicly-Funded Data for Health Research -
http://consultations.nhmrc.gov.au/public_consultations/funded-data
• The Australian Code for the Responsible Conduct of Research -
http://www.nhmrc.gov.au/_files_nhmrc/publications/attachments/r39.pdf