SlideShare a Scribd company logo
1 of 10
Michael M. Hoffman
Princess Margaret Cancer Centre Department of Medical Biophysics
Department of Computer Science
University of Toronto
http://hoffmanlab.org/
Twitter: @michaelhoffman
Data challenges for researchers
Who I am
• Scientist at Princess Margaret Cancer
Centre/Asst Professor at University of Toronto
• Previously part of Encyclopedia of DNA
Elements (ENCODE) Project
• Develop computational methods for big
genomic data
View of an analysis pipeline
Source data
Intermediate files
Data products Publications
Challenges in data acquisition
Showstoppers
• Data available “on request”
• Data available on application or agreement
Timewasters
• Data in inappropriate format
• Data in different format than I need
• Data doesn’t comply with format specification
More challenges in data acquisition
Annoyances
• Transferring
• Storing
• Staleness
• Deletion
• Organization
• Discovery
Challenges in data distribution
• Permanence
• Job changes
• Embargo pre-publication
• Space
• Waiting for approval
• Enabling acquisition by external services
• Graphical-only interfaces
• Ongoing costs
Challenges in intermediate files
• Poor organization
• Big
• Don’t always need them, sometimes do
• Sometimes need someone else’s intermediate
files
• Should be reproducible given source data and
pipeline but often isn’t
My dream solution
Policy: Data must be deposited in archive and
available at publication time
Technical: Trivially simple multi-level data
caching
Economic: Central archival space should cost
researcher less than keeping their own copy

More Related Content

What's hot

Reading Group: From Database to Dataspaces
Reading Group: From Database to DataspacesReading Group: From Database to Dataspaces
Reading Group: From Database to DataspacesJürgen Umbrich
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Federating Research Profiling Data
Federating Research Profiling DataFederating Research Profiling Data
Federating Research Profiling Dataericmeeks
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Susanna-Assunta Sansone
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
 
Bringing Things Together and Linking to Health Information using openEHR
Bringing Things Together and Linking to Health Information using openEHRBringing Things Together and Linking to Health Information using openEHR
Bringing Things Together and Linking to Health Information using openEHRKoray Atalag
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate StudentsRebekah Cummings
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in researchLouise Corti
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeASIS&T
 
Adaptive Knowledge Portal for Education Domain
Adaptive Knowledge Portal for Education DomainAdaptive Knowledge Portal for Education Domain
Adaptive Knowledge Portal for Education DomainMikhail Navrotskii
 
Gaining credit for sharing research data: Viewpoints on Data Publishing
Gaining credit for sharing research data: Viewpoints on Data PublishingGaining credit for sharing research data: Viewpoints on Data Publishing
Gaining credit for sharing research data: Viewpoints on Data PublishingVarsha Khodiyar
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalVarsha Khodiyar
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflowVarsha Khodiyar
 
eSource: A Clinical Data Manager's Tale of Three Studies
eSource: A Clinical Data Manager's Tale of Three StudieseSource: A Clinical Data Manager's Tale of Three Studies
eSource: A Clinical Data Manager's Tale of Three Studieswww.datatrak.com
 

What's hot (20)

Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"
 
Reading Group: From Database to Dataspaces
Reading Group: From Database to DataspacesReading Group: From Database to Dataspaces
Reading Group: From Database to Dataspaces
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Payton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook MetadataPayton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook Metadata
 
Federating Research Profiling Data
Federating Research Profiling DataFederating Research Profiling Data
Federating Research Profiling Data
 
Can Clinicians Create High-Quality Databases?
Can Clinicians Create High-Quality Databases?Can Clinicians Create High-Quality Databases?
Can Clinicians Create High-Quality Databases?
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 
Bringing Things Together and Linking to Health Information using openEHR
Bringing Things Together and Linking to Health Information using openEHRBringing Things Together and Linking to Health Information using openEHR
Bringing Things Together and Linking to Health Information using openEHR
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in Practice
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
 
Adaptive Knowledge Portal for Education Domain
Adaptive Knowledge Portal for Education DomainAdaptive Knowledge Portal for Education Domain
Adaptive Knowledge Portal for Education Domain
 
Gaining credit for sharing research data: Viewpoints on Data Publishing
Gaining credit for sharing research data: Viewpoints on Data PublishingGaining credit for sharing research data: Viewpoints on Data Publishing
Gaining credit for sharing research data: Viewpoints on Data Publishing
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journal
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
eSource: A Clinical Data Manager's Tale of Three Studies
eSource: A Clinical Data Manager's Tale of Three StudieseSource: A Clinical Data Manager's Tale of Three Studies
eSource: A Clinical Data Manager's Tale of Three Studies
 

Viewers also liked

100 percent open access: expect no less!
100 percent open access: expect no less!100 percent open access: expect no less!
100 percent open access: expect no less!Michael Hoffman
 
Khotbah Berjaga-jaga Ps. Matius Lim
Khotbah Berjaga-jaga Ps. Matius LimKhotbah Berjaga-jaga Ps. Matius Lim
Khotbah Berjaga-jaga Ps. Matius LimMatthew Lim
 
M2 t1 planificador_aamtic.docx
M2 t1 planificador_aamtic.docxM2 t1 planificador_aamtic.docx
M2 t1 planificador_aamtic.docxMartha Campo
 
Avança - Canvia de Xip i connecta amb el teu fill
Avança - Canvia de Xip i connecta amb el teu fillAvança - Canvia de Xip i connecta amb el teu fill
Avança - Canvia de Xip i connecta amb el teu fillCursbook
 
Implementation of Synchronization Algorithms for Media FLO Systems
Implementation of Synchronization Algorithms for Media FLO SystemsImplementation of Synchronization Algorithms for Media FLO Systems
Implementation of Synchronization Algorithms for Media FLO Systemsa_elmoslimany
 
KOM Presentation STN-N Conductor_rev 2
KOM Presentation STN-N Conductor_rev 2KOM Presentation STN-N Conductor_rev 2
KOM Presentation STN-N Conductor_rev 2Brian Quan (Minh)
 
Life of the holy prophet (s
Life of the holy prophet (sLife of the holy prophet (s
Life of the holy prophet (sumar01cdz
 
ETP-Corporate Brochure
ETP-Corporate BrochureETP-Corporate Brochure
ETP-Corporate BrochureNeev Ahuja
 
A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...
A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...
A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...a_elmoslimany
 
CREAMOS POESÍA (por Lucas)
CREAMOS POESÍA (por Lucas)CREAMOS POESÍA (por Lucas)
CREAMOS POESÍA (por Lucas)afcovelo15
 
VSP brochure-company org chart
VSP brochure-company org chartVSP brochure-company org chart
VSP brochure-company org chartBrian Quan (Minh)
 
Channel Modeling for Wideband MIMO Vehicle-to-Vehicle Channels
Channel Modeling for Wideband MIMO Vehicle-to-Vehicle ChannelsChannel Modeling for Wideband MIMO Vehicle-to-Vehicle Channels
Channel Modeling for Wideband MIMO Vehicle-to-Vehicle Channelsa_elmoslimany
 
Teambuilding present T&I STT 2016
Teambuilding present T&I STT 2016Teambuilding present T&I STT 2016
Teambuilding present T&I STT 2016Brian Quan (Minh)
 
Collaborative 3D Environments over Windows Azure
Collaborative 3D Environments over Windows AzureCollaborative 3D Environments over Windows Azure
Collaborative 3D Environments over Windows AzureJiri Danihelka
 

Viewers also liked (20)

100 percent open access: expect no less!
100 percent open access: expect no less!100 percent open access: expect no less!
100 percent open access: expect no less!
 
Stunning photos
Stunning photosStunning photos
Stunning photos
 
Khotbah Berjaga-jaga Ps. Matius Lim
Khotbah Berjaga-jaga Ps. Matius LimKhotbah Berjaga-jaga Ps. Matius Lim
Khotbah Berjaga-jaga Ps. Matius Lim
 
M2 t1 planificador_aamtic.docx
M2 t1 planificador_aamtic.docxM2 t1 planificador_aamtic.docx
M2 t1 planificador_aamtic.docx
 
Avança - Canvia de Xip i connecta amb el teu fill
Avança - Canvia de Xip i connecta amb el teu fillAvança - Canvia de Xip i connecta amb el teu fill
Avança - Canvia de Xip i connecta amb el teu fill
 
Implementation of Synchronization Algorithms for Media FLO Systems
Implementation of Synchronization Algorithms for Media FLO SystemsImplementation of Synchronization Algorithms for Media FLO Systems
Implementation of Synchronization Algorithms for Media FLO Systems
 
KOM Presentation STN-N Conductor_rev 2
KOM Presentation STN-N Conductor_rev 2KOM Presentation STN-N Conductor_rev 2
KOM Presentation STN-N Conductor_rev 2
 
Life of the holy prophet (s
Life of the holy prophet (sLife of the holy prophet (s
Life of the holy prophet (s
 
AS PLANTAS
AS PLANTASAS PLANTAS
AS PLANTAS
 
ETP-Corporate Brochure
ETP-Corporate BrochureETP-Corporate Brochure
ETP-Corporate Brochure
 
A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...
A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...
A New Communication Scheme Implying Amplitude-Limited Inputs and Signal-Depen...
 
CREAMOS POESÍA (por Lucas)
CREAMOS POESÍA (por Lucas)CREAMOS POESÍA (por Lucas)
CREAMOS POESÍA (por Lucas)
 
VSP brochure-company org chart
VSP brochure-company org chartVSP brochure-company org chart
VSP brochure-company org chart
 
Channel Modeling for Wideband MIMO Vehicle-to-Vehicle Channels
Channel Modeling for Wideband MIMO Vehicle-to-Vehicle ChannelsChannel Modeling for Wideband MIMO Vehicle-to-Vehicle Channels
Channel Modeling for Wideband MIMO Vehicle-to-Vehicle Channels
 
Las Grandes Religiones
Las Grandes ReligionesLas Grandes Religiones
Las Grandes Religiones
 
Teambuilding present T&I STT 2016
Teambuilding present T&I STT 2016Teambuilding present T&I STT 2016
Teambuilding present T&I STT 2016
 
AS PLANTAS
AS PLANTASAS PLANTAS
AS PLANTAS
 
AS PLANTAS
AS PLANTASAS PLANTAS
AS PLANTAS
 
Exhibitions in the age of digitization
Exhibitions in the age of digitizationExhibitions in the age of digitization
Exhibitions in the age of digitization
 
Collaborative 3D Environments over Windows Azure
Collaborative 3D Environments over Windows AzureCollaborative 3D Environments over Windows Azure
Collaborative 3D Environments over Windows Azure
 

Similar to Data challenges for researchers

Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Mojtaba Lotfaliany
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014aceas13tern
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...SC CTSI at USC and CHLA
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetCongChen35
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
 
Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Leeds
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research DataKristin Briney
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
A Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraA Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraVicki Ferrini
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Kristin Briney
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity ResearchSpace
 
Faculty Research Support Needs Survey
Faculty Research Support Needs SurveyFaculty Research Support Needs Survey
Faculty Research Support Needs SurveyKathryn Crowe
 
RDAP14: University-wide Research Data Management Policy
RDAP14: University-wide Research Data Management PolicyRDAP14: University-wide Research Data Management Policy
RDAP14: University-wide Research Data Management PolicyASIS&T
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 

Similar to Data challenges for researchers (20)

Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
A Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraA Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital Era
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity
 
Faculty Research Support Needs Survey
Faculty Research Support Needs SurveyFaculty Research Support Needs Survey
Faculty Research Support Needs Survey
 
RDAP14: University-wide Research Data Management Policy
RDAP14: University-wide Research Data Management PolicyRDAP14: University-wide Research Data Management Policy
RDAP14: University-wide Research Data Management Policy
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Data challenges for researchers

  • 1. Michael M. Hoffman Princess Margaret Cancer Centre Department of Medical Biophysics Department of Computer Science University of Toronto http://hoffmanlab.org/ Twitter: @michaelhoffman Data challenges for researchers
  • 2. Who I am • Scientist at Princess Margaret Cancer Centre/Asst Professor at University of Toronto • Previously part of Encyclopedia of DNA Elements (ENCODE) Project • Develop computational methods for big genomic data
  • 3. View of an analysis pipeline Source data Intermediate files Data products Publications
  • 4.
  • 5.
  • 6. Challenges in data acquisition Showstoppers • Data available “on request” • Data available on application or agreement Timewasters • Data in inappropriate format • Data in different format than I need • Data doesn’t comply with format specification
  • 7. More challenges in data acquisition Annoyances • Transferring • Storing • Staleness • Deletion • Organization • Discovery
  • 8. Challenges in data distribution • Permanence • Job changes • Embargo pre-publication • Space • Waiting for approval • Enabling acquisition by external services • Graphical-only interfaces • Ongoing costs
  • 9. Challenges in intermediate files • Poor organization • Big • Don’t always need them, sometimes do • Sometimes need someone else’s intermediate files • Should be reproducible given source data and pipeline but often isn’t
  • 10. My dream solution Policy: Data must be deposited in archive and available at publication time Technical: Trivially simple multi-level data caching Economic: Central archival space should cost researcher less than keeping their own copy

Editor's Notes

  1. ENCODE: 12000 assays, many multiples of that in terms of number of datasets Guessing about 2-20 GB of accessioned data per assay, so in the hundreds of terabytes to single-digit petabyte sizes
  2. Most evaluation of researchers relies primarily on the Publications. And that’s primarily what a lot of researchers are interested in
  3. Wastes of time and money, some of this should be fixed at publication gating “advanced file copying”
  4. Most have to do with local copies
  5. Want to avoid “solutions” that are like Canadian Common CV but for data science