SlideShare a Scribd company logo
1 of 32
BIG DATA
BIG OPPORTUNITIES OR
BIG TROUBLE?
Kathy Partin, Office of the VP for Research,
Dept. of Biomedical Sciences
Shea Swauger, Libraries
What is Big Data?
• Volume
• Variety
• Velocity
• Too Big to Email
• Veracity
• Variability
• Visualization
• Value
The Data Lifecycle
• Proposal
• Infrastructure
• Acquisition/Generation
• Management
• Dissemination
• Preservation
Proposal
• Grant Funding
Requirements
• Data Management Plan
http://lib.colostate.edu/repository/nsf
https://dmptool.org
Infrastructure
• Where do you store it?
• How do you move it?
• How do you analyze it? (HPC?)
+ Ultra High Speed Research LAN
+ College or
Department Servers
+ Bioinformatics
& other Clusters
http://istec.colostate.edu/activities/hpc/
Data Acquisition/Generation
Reuse Existing
• Where to find it?
• How to understand/use it?
• Do you trust it?
• Create your own data
Metadata + README files
Data Provenance
Privacy, Security, Proprietary, Dual Use
Research of Concern
Data Management
• Access/Permissions
• File Naming
• Metadata
• Organization
• Collaboration
• Version Control
• Fixity/Integrity
http://lib.colostate.edu/services/data-management
Dissemination
Where to share your data?
• Institutional Repository
• Discipline Specific Repository
How to cite your data?
• Permanent identifier (doi, handle, PURL, etc.)
• Citation standards
http://lib.colostate.edu/services/data-management/citing-data
Data Preservation
• Media Obsolescence
• Software Obsolescence
• Bit Rot
• Back-ups
• Checksums
Public Outcry Regarding Data Integrity
• “Why Most Published Research Findings are False”, Ioannidis, 2005
• “Update of the Stroke Therapy Academic Industry Roundtable Preclinical
Recommendations,” Fisher et al., 2009
• “Science Publishing: The Trouble with Retractions,” Van Noorden, 2011
• “Believe it or not: how much can we rely on published data on potential drug
targets?” Prinz et al., 2011
• “Misconduct Accounts for the Majority of Retracted Scientific Publications,”
Fang et al., 2012
• “Drug Development: Raise standards for Preclinical Cancer Research,”
Begley & Ellis, 2012
http://i97.photobucket.com/albums/l217/Shockwave_73/angry-
mob-at-frankenstein-castle_zps364a2714.jpg
Integrity - Reliability - Translation
• “Power Failure: why small sample size undermines the
reliability of neuroscience”, Button et al., 2013
• “Challenges in Translating Academic Research into
Therapeutic Advancement,” Matos et al., 2013 (epilepsy)
• “Reproducibility,” McNutt, 2014
• “NIH plans to enhance reproducibility,” Collins & Tabak,
2014
• “Reproducibility: Fraud is not the big problem,” – Gunn,
2014
• Taxpayers are wasting their investment because the
integrity of basic research is flawed, not due to
intentional misconduct but to unintentional
mismanagement.
Research Misconduct
1. Fabrication, falsification, plagiarism, or other practices
that seriously deviate from those that are commonly
accepted within the relevant scientific/academic
community for proposing, conducting, reviewing or
reporting research; that
2. Has been committed intentionally, knowingly or
recklessly; and, that
3. Has been proven by a preponderance of the evidence
(more likely than not)
Misconduct does not include honest error or honest
differences in interpretations or judgments of data.
Reporting Concerns
• All employees and individuals associated with CSU should report observed,
suspected or apparent Research Misconduct to their Department Head, Dean,
the RIO and/or the Vice President for Research.
• If an individual is unsure whether a suspected incident falls within the definition
of scientific misconduct, a call may be placed to one of these individuals to
discuss the suspected misconduct informally.
http://reportinghotline.colostate.edu/
Research Integrity Officer
› Primary contact for departments and deans with
questions about potential misconduct issues
› Represents CSU with the PHS Office of Research
Integrity (ORI), NSF, USDA, etc
› Manages the CSU MIS process to meet
institutional, state and federal standards
› Kathy.Partin@colostate.edu
External Pressure to Fix or Be Fixed
• Issues with data reliability have brought external pressure
on the scientific community
• From Congress
• Presidential Council of Advisors on Science and Technology
(PCAST) – “Improving Scientific Reproducibility in an Age of
International Competition and Big Data” , 2014
http://www.tvworldwide.com/events/pcast/140131/
• From the popular press and “watch dog” websites/blogs
• The Economist - “Unreliable research: Trouble at the Lab”, 2013
• NYT– “New truths that only one can see”, 2014
• RetractionWatch.com
The Gap Between Applied & Basic Research
Innovation
Reliability
The two opposite and contrary forces of data
Dynamic,
agile,
discovery,
exploration,
optimization,
creative,
outside-the-
box, anti-
dogmatic
(pre pre-
clinical study)
Reproducible,
robust,
translatable to
bedside, rigid,
immutable, non-
optimized,
boring
(preclinical or
clinical study)
What needs to change?
• Funding agencies need to raise the bar for data
acquisition
• Publishers need to raise the bar for data quality
• Academic institutions need to reassess how success is
defined
• Academic institutions need to provide their faculty with the
right tools and training to do it right
• Faculty need to pass this down to their trainees
External Changes
• NIH appears to be
• Developing a new training module on good experimental design to
disseminate
• Developing a data checklist for grant proposals
• DDI- Data Discovery Index
• New biosketch format to reduce the focus on numbers of publications
and increase the focus on impact of publications
• Considering blinded review of grant proposals
• Science Exchange Reproducibility Initiative
DDI
“In summary, a Data Discovery Index (DDI) emphasizes
development of an adaptable, scalable system through
active community engagement that would serve as an
index to large biomedical datasets.”
Rather than in a traditional “catalog” the DDI concept
stresses discoverability, access, and citability.
This is a dataset of raw data, which rarely saw the light of
day in academic research before.
Publishers
• Preventing plagiarism with iThenticate
• Preventing Fabrication/Falsification with new data checklists
• Abolishing word limits on methods sections
Six Common Experimental Failings
1. Poor experimental design
2. Poor reagents
3. Poor analysis
4. Failure to reject hypothesis after observing discordant,
valid experimental results
5. Deliberate bias in selecting positive rather than
negative results to report, publish, cite, and fund
6. Failure to follow through when wondering “Why is this
result NOT what I expected?”
Statistics & General Methods
1. How was the sample size chosen to ensure adequate
power to detect a pre-specified effect size?
2. Describe inclusion/exclusion criteria if samples, subjects
or animals were excluded from the analysis. Were the
criteria pre-established?
3. If a method of randomization was used to determine
how samples/subjects/animals were allocated to
experimental groups and processed, describe it.
Statistics & General Methods
4. If the investigator was blinded to the group allocation
during the experiment and/or when assessing the
outcome, state the extent of blinding.
5. For every figure, are statistical tests justified as
appropriate? Do the data meet the assumptions of the
tests (e.g., normal distribution)?
a) Is there an estimate of variation within each group of data?
b) Is the variance similar between the groups that are being statistically
compared?
Good Laboratory Practice for Data
A Attributable (who made the entry)
L Legible
C Contemporaneous/Complete
O Original
A Accurate
http://www.paduiblog.com/pa-dui/why-forensic-science-testing-
for-dui-bac-determination-is-the-silly-sister-of-analytical-science-
good-laboratory-practices/
Data Notebooks – Another Vulnerability
• Binders
• Electronic Notebooks
• Software documentation
• Field notes
• Images
• Algorithms
Data Corrections & Amendments
• Errors, additions, and modifications should be identified
by crossing out the original data with a single line (do not
obscure the initial data) and initialing, dating and providing
a reason for the change.
• Missing or obscured data/pages are often interpreted as
intentional obfuscation of data
•Absence is interpreted as guilt
Data Forensics
• Numbers
• Images
• Hardware/Software
Numbers
Images
Hardware
Software
Questions?

More Related Content

What's hot

EoE HLN summer conference 2015 IK - open access & research data management
EoE HLN summer conference 2015 IK - open access & research data managementEoE HLN summer conference 2015 IK - open access & research data management
EoE HLN summer conference 2015 IK - open access & research data managementIsla Kuhn
 
Open science and the individual researcher
Open science and the individual researcherOpen science and the individual researcher
Open science and the individual researcherBram Zandbelt
 
What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Sandra Binning
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?Paul Agapow
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celiintensivecaresociety
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)Heather Piwowar
 
Roche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLRoche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLDominique Roche
 
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Dorothy Bishop
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG researchDorothy Bishop
 
Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...
Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...
Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...NHSNWRD
 
Fixing Science: The Replicability Crisis
Fixing Science: The Replicability CrisisFixing Science: The Replicability Crisis
Fixing Science: The Replicability CrisisAlex Holcombe
 
Using a behavioral framework to understand researchers data management practi...
Using a behavioral framework to understand researchers data management practi...Using a behavioral framework to understand researchers data management practi...
Using a behavioral framework to understand researchers data management practi...ARDC
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 

What's hot (20)

EoE HLN summer conference 2015 IK - open access & research data management
EoE HLN summer conference 2015 IK - open access & research data managementEoE HLN summer conference 2015 IK - open access & research data management
EoE HLN summer conference 2015 IK - open access & research data management
 
Open science and the individual researcher
Open science and the individual researcherOpen science and the individual researcher
Open science and the individual researcher
 
What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celi
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Roche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLRoche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NL
 
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...
Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...
Let's Talk Research Annual Conference - 24th-25th September 2014 (Professor R...
 
Fixing Science: The Replicability Crisis
Fixing Science: The Replicability CrisisFixing Science: The Replicability Crisis
Fixing Science: The Replicability Crisis
 
Using a behavioral framework to understand researchers data management practi...
Using a behavioral framework to understand researchers data management practi...Using a behavioral framework to understand researchers data management practi...
Using a behavioral framework to understand researchers data management practi...
 
Searching for Trials for a Systematic Review
Searching for Trials for a Systematic ReviewSearching for Trials for a Systematic Review
Searching for Trials for a Systematic Review
 
2012 researching your first review article class
2012 researching your first review article class2012 researching your first review article class
2012 researching your first review article class
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
meta analysis
meta analysis meta analysis
meta analysis
 

Viewers also liked

froedtert__the_medical_college_of_wisconsin
froedtert__the_medical_college_of_wisconsinfroedtert__the_medical_college_of_wisconsin
froedtert__the_medical_college_of_wisconsinLeah Beutler
 
Build up array/build down 1
Build up array/build down 1Build up array/build down 1
Build up array/build down 1slidesareinform
 
Uni2go week4_LL
Uni2go week4_LLUni2go week4_LL
Uni2go week4_LLUNI2GO
 
SouthAfricaReport_PROOF11
SouthAfricaReport_PROOF11SouthAfricaReport_PROOF11
SouthAfricaReport_PROOF11Alice Curci
 
Innovazione s@lute2016 tmi
Innovazione s@lute2016 tmiInnovazione s@lute2016 tmi
Innovazione s@lute2016 tmiUrbani Stefano
 
Mecanica de materiales_-_beer_3ed
Mecanica de materiales_-_beer_3edMecanica de materiales_-_beer_3ed
Mecanica de materiales_-_beer_3edfrontera24
 
UNI2GO Pitch Draft
UNI2GO Pitch DraftUNI2GO Pitch Draft
UNI2GO Pitch DraftUNI2GO
 
SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)
SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)
SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)Kevin A. Nicholson
 
Mussie Kahsay resume
Mussie Kahsay resumeMussie Kahsay resume
Mussie Kahsay resumeMussie Kahsay
 
Legal environtment
Legal environtmentLegal environtment
Legal environtmentfawaidalvian
 
Uni2 go week5
Uni2 go week5Uni2 go week5
Uni2 go week5UNI2GO
 
Pieter nortje fitter with dual diesel and auto
Pieter nortje fitter with dual diesel and autoPieter nortje fitter with dual diesel and auto
Pieter nortje fitter with dual diesel and autoPieter Nortje
 

Viewers also liked (20)

froedtert__the_medical_college_of_wisconsin
froedtert__the_medical_college_of_wisconsinfroedtert__the_medical_college_of_wisconsin
froedtert__the_medical_college_of_wisconsin
 
PayForAnswer
PayForAnswerPayForAnswer
PayForAnswer
 
Build up array/build down 1
Build up array/build down 1Build up array/build down 1
Build up array/build down 1
 
LO4- Sound Waves
LO4- Sound WavesLO4- Sound Waves
LO4- Sound Waves
 
Ashokan Sreedharan-CV
Ashokan Sreedharan-CVAshokan Sreedharan-CV
Ashokan Sreedharan-CV
 
Uni2go week4_LL
Uni2go week4_LLUni2go week4_LL
Uni2go week4_LL
 
Bergen Portfolio
Bergen PortfolioBergen Portfolio
Bergen Portfolio
 
Un 2015
Un 2015Un 2015
Un 2015
 
SouthAfricaReport_PROOF11
SouthAfricaReport_PROOF11SouthAfricaReport_PROOF11
SouthAfricaReport_PROOF11
 
J
JJ
J
 
Innovazione s@lute2016 tmi
Innovazione s@lute2016 tmiInnovazione s@lute2016 tmi
Innovazione s@lute2016 tmi
 
Mecanica de materiales_-_beer_3ed
Mecanica de materiales_-_beer_3edMecanica de materiales_-_beer_3ed
Mecanica de materiales_-_beer_3ed
 
Kuivalainen_Miikka
Kuivalainen_MiikkaKuivalainen_Miikka
Kuivalainen_Miikka
 
UNI2GO Pitch Draft
UNI2GO Pitch DraftUNI2GO Pitch Draft
UNI2GO Pitch Draft
 
SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)
SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)
SS12 C6 Driving Value Through Data Analysis - Combined v4 (final)
 
Mussie Kahsay resume
Mussie Kahsay resumeMussie Kahsay resume
Mussie Kahsay resume
 
Lo8
Lo8Lo8
Lo8
 
Legal environtment
Legal environtmentLegal environtment
Legal environtment
 
Uni2 go week5
Uni2 go week5Uni2 go week5
Uni2 go week5
 
Pieter nortje fitter with dual diesel and auto
Pieter nortje fitter with dual diesel and autoPieter nortje fitter with dual diesel and auto
Pieter nortje fitter with dual diesel and auto
 

Similar to BIG DATA BIG OPPORTUNITIES OR BIG TROUBLE

Biomedical Literature
Biomedical Literature Biomedical Literature
Biomedical Literature Arete-Zoe, LLC
 
Research writing
Research writingResearch writing
Research writingEUNICEPARCO
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineIda Sim
 
Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015MedicReS
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalkimlyman
 
Articulating Program Impacts with Case Studies & Success Stories
Articulating Program Impacts with Case Studies & Success StoriesArticulating Program Impacts with Case Studies & Success Stories
Articulating Program Impacts with Case Studies & Success StoriesNick Hart, Ph.D.
 
321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)Iin Angriyani
 
Clinical data munging
Clinical data mungingClinical data munging
Clinical data mungingKen Mwai
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionalsNadir Mehmood
 
UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...
UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...
UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...CTSI at UCSF
 
Developing core common outcomes for tropical peatland research and management
Developing core common outcomes for tropical peatland research and managementDeveloping core common outcomes for tropical peatland research and management
Developing core common outcomes for tropical peatland research and managementMark Reed
 
Dissemination and Implementation Research - Getting Funded
Dissemination and Implementation Research - Getting FundedDissemination and Implementation Research - Getting Funded
Dissemination and Implementation Research - Getting FundedHopkinsCFAR
 
How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaPubrica
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesIUPUI
 
Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...
Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...
Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...sreenath T.V
 

Similar to BIG DATA BIG OPPORTUNITIES OR BIG TROUBLE (20)

Data and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentivesData and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentives
 
Biomedical Literature
Biomedical Literature Biomedical Literature
Biomedical Literature
 
Research writing
Research writingResearch writing
Research writing
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based Medicine
 
Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
 
Articulating Program Impacts with Case Studies & Success Stories
Articulating Program Impacts with Case Studies & Success StoriesArticulating Program Impacts with Case Studies & Success Stories
Articulating Program Impacts with Case Studies & Success Stories
 
321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)
 
Clinical data munging
Clinical data mungingClinical data munging
Clinical data munging
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionals
 
Proposal writing by dr.s.kalpana
Proposal writing by dr.s.kalpanaProposal writing by dr.s.kalpana
Proposal writing by dr.s.kalpana
 
National Workshop to Advance Use of Electronic Data
National Workshop to Advance Use of Electronic DataNational Workshop to Advance Use of Electronic Data
National Workshop to Advance Use of Electronic Data
 
Research Essay Questions
Research Essay QuestionsResearch Essay Questions
Research Essay Questions
 
UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...
UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...
UCSF Informatics Day 2014 - Ida Sim, "Informatics Technologies: From a Data-C...
 
Developing core common outcomes for tropical peatland research and management
Developing core common outcomes for tropical peatland research and managementDeveloping core common outcomes for tropical peatland research and management
Developing core common outcomes for tropical peatland research and management
 
Dissemination and Implementation Research - Getting Funded
Dissemination and Implementation Research - Getting FundedDissemination and Implementation Research - Getting Funded
Dissemination and Implementation Research - Getting Funded
 
How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - Slides
 
Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...
Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...
Preparing a-case-study-a-guide-for-designing-and-conducting-a-case-study-for-...
 

BIG DATA BIG OPPORTUNITIES OR BIG TROUBLE

  • 1. BIG DATA BIG OPPORTUNITIES OR BIG TROUBLE? Kathy Partin, Office of the VP for Research, Dept. of Biomedical Sciences Shea Swauger, Libraries
  • 2. What is Big Data? • Volume • Variety • Velocity • Too Big to Email • Veracity • Variability • Visualization • Value
  • 3. The Data Lifecycle • Proposal • Infrastructure • Acquisition/Generation • Management • Dissemination • Preservation
  • 4. Proposal • Grant Funding Requirements • Data Management Plan http://lib.colostate.edu/repository/nsf https://dmptool.org
  • 5. Infrastructure • Where do you store it? • How do you move it? • How do you analyze it? (HPC?) + Ultra High Speed Research LAN + College or Department Servers + Bioinformatics & other Clusters http://istec.colostate.edu/activities/hpc/
  • 6. Data Acquisition/Generation Reuse Existing • Where to find it? • How to understand/use it? • Do you trust it? • Create your own data Metadata + README files Data Provenance Privacy, Security, Proprietary, Dual Use Research of Concern
  • 7. Data Management • Access/Permissions • File Naming • Metadata • Organization • Collaboration • Version Control • Fixity/Integrity http://lib.colostate.edu/services/data-management
  • 8. Dissemination Where to share your data? • Institutional Repository • Discipline Specific Repository How to cite your data? • Permanent identifier (doi, handle, PURL, etc.) • Citation standards http://lib.colostate.edu/services/data-management/citing-data
  • 9. Data Preservation • Media Obsolescence • Software Obsolescence • Bit Rot • Back-ups • Checksums
  • 10. Public Outcry Regarding Data Integrity • “Why Most Published Research Findings are False”, Ioannidis, 2005 • “Update of the Stroke Therapy Academic Industry Roundtable Preclinical Recommendations,” Fisher et al., 2009 • “Science Publishing: The Trouble with Retractions,” Van Noorden, 2011 • “Believe it or not: how much can we rely on published data on potential drug targets?” Prinz et al., 2011 • “Misconduct Accounts for the Majority of Retracted Scientific Publications,” Fang et al., 2012 • “Drug Development: Raise standards for Preclinical Cancer Research,” Begley & Ellis, 2012 http://i97.photobucket.com/albums/l217/Shockwave_73/angry- mob-at-frankenstein-castle_zps364a2714.jpg
  • 11. Integrity - Reliability - Translation • “Power Failure: why small sample size undermines the reliability of neuroscience”, Button et al., 2013 • “Challenges in Translating Academic Research into Therapeutic Advancement,” Matos et al., 2013 (epilepsy) • “Reproducibility,” McNutt, 2014 • “NIH plans to enhance reproducibility,” Collins & Tabak, 2014 • “Reproducibility: Fraud is not the big problem,” – Gunn, 2014 • Taxpayers are wasting their investment because the integrity of basic research is flawed, not due to intentional misconduct but to unintentional mismanagement.
  • 12. Research Misconduct 1. Fabrication, falsification, plagiarism, or other practices that seriously deviate from those that are commonly accepted within the relevant scientific/academic community for proposing, conducting, reviewing or reporting research; that 2. Has been committed intentionally, knowingly or recklessly; and, that 3. Has been proven by a preponderance of the evidence (more likely than not) Misconduct does not include honest error or honest differences in interpretations or judgments of data.
  • 13. Reporting Concerns • All employees and individuals associated with CSU should report observed, suspected or apparent Research Misconduct to their Department Head, Dean, the RIO and/or the Vice President for Research. • If an individual is unsure whether a suspected incident falls within the definition of scientific misconduct, a call may be placed to one of these individuals to discuss the suspected misconduct informally. http://reportinghotline.colostate.edu/
  • 14. Research Integrity Officer › Primary contact for departments and deans with questions about potential misconduct issues › Represents CSU with the PHS Office of Research Integrity (ORI), NSF, USDA, etc › Manages the CSU MIS process to meet institutional, state and federal standards › Kathy.Partin@colostate.edu
  • 15. External Pressure to Fix or Be Fixed • Issues with data reliability have brought external pressure on the scientific community • From Congress • Presidential Council of Advisors on Science and Technology (PCAST) – “Improving Scientific Reproducibility in an Age of International Competition and Big Data” , 2014 http://www.tvworldwide.com/events/pcast/140131/ • From the popular press and “watch dog” websites/blogs • The Economist - “Unreliable research: Trouble at the Lab”, 2013 • NYT– “New truths that only one can see”, 2014 • RetractionWatch.com
  • 16. The Gap Between Applied & Basic Research Innovation Reliability The two opposite and contrary forces of data Dynamic, agile, discovery, exploration, optimization, creative, outside-the- box, anti- dogmatic (pre pre- clinical study) Reproducible, robust, translatable to bedside, rigid, immutable, non- optimized, boring (preclinical or clinical study)
  • 17. What needs to change? • Funding agencies need to raise the bar for data acquisition • Publishers need to raise the bar for data quality • Academic institutions need to reassess how success is defined • Academic institutions need to provide their faculty with the right tools and training to do it right • Faculty need to pass this down to their trainees
  • 18. External Changes • NIH appears to be • Developing a new training module on good experimental design to disseminate • Developing a data checklist for grant proposals • DDI- Data Discovery Index • New biosketch format to reduce the focus on numbers of publications and increase the focus on impact of publications • Considering blinded review of grant proposals • Science Exchange Reproducibility Initiative
  • 19. DDI “In summary, a Data Discovery Index (DDI) emphasizes development of an adaptable, scalable system through active community engagement that would serve as an index to large biomedical datasets.” Rather than in a traditional “catalog” the DDI concept stresses discoverability, access, and citability. This is a dataset of raw data, which rarely saw the light of day in academic research before.
  • 20. Publishers • Preventing plagiarism with iThenticate • Preventing Fabrication/Falsification with new data checklists • Abolishing word limits on methods sections
  • 21. Six Common Experimental Failings 1. Poor experimental design 2. Poor reagents 3. Poor analysis 4. Failure to reject hypothesis after observing discordant, valid experimental results 5. Deliberate bias in selecting positive rather than negative results to report, publish, cite, and fund 6. Failure to follow through when wondering “Why is this result NOT what I expected?”
  • 22. Statistics & General Methods 1. How was the sample size chosen to ensure adequate power to detect a pre-specified effect size? 2. Describe inclusion/exclusion criteria if samples, subjects or animals were excluded from the analysis. Were the criteria pre-established? 3. If a method of randomization was used to determine how samples/subjects/animals were allocated to experimental groups and processed, describe it.
  • 23. Statistics & General Methods 4. If the investigator was blinded to the group allocation during the experiment and/or when assessing the outcome, state the extent of blinding. 5. For every figure, are statistical tests justified as appropriate? Do the data meet the assumptions of the tests (e.g., normal distribution)? a) Is there an estimate of variation within each group of data? b) Is the variance similar between the groups that are being statistically compared?
  • 24.
  • 25. Good Laboratory Practice for Data A Attributable (who made the entry) L Legible C Contemporaneous/Complete O Original A Accurate http://www.paduiblog.com/pa-dui/why-forensic-science-testing- for-dui-bac-determination-is-the-silly-sister-of-analytical-science- good-laboratory-practices/
  • 26. Data Notebooks – Another Vulnerability • Binders • Electronic Notebooks • Software documentation • Field notes • Images • Algorithms
  • 27. Data Corrections & Amendments • Errors, additions, and modifications should be identified by crossing out the original data with a single line (do not obscure the initial data) and initialing, dating and providing a reason for the change. • Missing or obscured data/pages are often interpreted as intentional obfuscation of data •Absence is interpreted as guilt
  • 28. Data Forensics • Numbers • Images • Hardware/Software

Editor's Notes

  1. Academic institutions are under more scrutiny than ever, due to public outcry, on the perceived lack of data integrity. THIS WILL HAVE AN IMPACT ON FUNDING!
  2. Most people suspect intentional misconduct – altering the data record in favor of your hypothesis. Actually, most studies suggest that unintentional mismanagement is a more likely culprit.
  3. Research Misconduct definition – a key discriminator is intentionality.
  4. If you have concerns about data integrity you can be an anonymous whistleblower
  5. Or, you can contact me with a “hypothetical “ scenario and I will protect your identity.
  6. Let’s take a closer look at unintentional problems with data. We need to auto-correct and we need to expect greater external scrutiny.
  7. Think, as we strive to move to applied solutions instead of pure research, about the tension between pure discovery and its application.
  8. So, if we are ready to keep our side of the street clean, what do we need to do?
  9. Sponsors will add increased requirements regarding data integrity. You need help with boilerplate verbiage in your grant to demonstrate your approach to this issue.
  10. Uploading your raw data! I never dreamed of doing this in the past. Of course, if you have protected data (human subjects), there are more hoops to jump through.
  11. The Libraries can hook you up with iThenticate. Expect that when you submit a peer-reviewed article, it will be run through iThenticate.
  12. Is this Greek to you? YOU NEED HELP!!