Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Research Infrastructures H3ABioNet case study/Nicola Mulder

66 views

Published on

Presented during the AOSP ICT Infrastructure meeting on 14 May 2018, Pretoria, SA.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Research Infrastructures H3ABioNet case study/Nicola Mulder

  1. 1. Research Infrastructures H3ABioNet case study Prof Nicola Mulder H3ABioNet PI Head of Computational Biology University of Cape Town
  2. 2. Outline • Introduction to H3Africa and H3ABioNet • H3Africa data • Data sharing policy • Building infrastructure • Computing infrastructure • Human capacity • Data harmonization & curation • Facilitating data access
  3. 3. H3Africa: Human Heredity & Health in Africa • H3frica Vision: “To facilitate an Africa-based contemporary research approach to the study of genomics and environmental determinants of common diseases with the goal of improving the health of African populations” • Funding: NIH, Wellcome Trust/AESA
  4. 4. The H3Africa Consortium 14 Collaborative Centers 13 Research Projects 3 Pilot Biorepositories 8 Ethics Grants The H3Africa Consortium Bioinformatics Network 4 Global Health Bioinformatics Training Programs H3ABioNet
  5. 5. H3ABioNet Informatics network • H3ABioNet is a Pan African Informatics Network, to provide bioinformatics infrastructure and support for the H3Africa consortium • Round 1: 34 partners in 14 African countries • Round 2: 28 partners in 17 countries • Activities: • Infrastructure • User support • Research • Training www.h3abionet.org
  6. 6. H3Africa data (Phase I) • Phenotype data (associated with genotype data) – Demographic information – Anthropometric data – Disease and health related phenotype data • Genetic Variation data human and pathogen – Sequence data (whole genome, exome, targeted) • Genotyping chip array data – ~55,000 samples to be run on an H3Africa African custom chip • Microbiome sequence data – Patient/sample phenotypes – Non-human 16S rRNA sequence data for microbiome – Non-human full genome sequence data for microbiome – Possible human sequence contamination • Biospecimens to be deposited at the H3Africa biorepositories Image credits: National Human Genome Research Institute (https://www.genome.gov/imagegallery/)
  7. 7. Why share data? • New era of open science • Enables reproducible science • Increases visibility and credibility of data generators • Additional publications and citations • New research questions can be asked of data • New discoveries made of relevance to participants • Increasing sample size • Increases value of the data • Funder requirement
  8. 8. Limits to sharing human genetic data • Data can be stored indefinitely, biobank specimens can be stored for up to 20 years – secondary use -rapid innovation with ‘omics technologies • Blood sample collection and visits to clinics associated with disease and treatment – even if a healthy control • Ethics consent: H3Africa- some projects have broad consent, some used tiered consent or specific consent • History of vulnerable populations, low education levels and exploitation • Anonymized, but risk of identification Ethical considerations Informed consent Participant identification Stigmatisation Benefit sharing
  9. 9. Human genetic data privacy • Age & Sex • Country of birth • Current residence • Native language • Ethno-linguistic/tribal affiliation • Country of birth of father and mother • Native language of father and mother • Ethno-linguistic/tribal affiliation of mother and father • Height • Weight • Current medications • Smoking history • Alcohol history Image credits: National Human Genome Research Institute (https://www.genome.gov/imagegallery/) • Combination of phenotype and genetic data makes it possible to identify different populations and individuals – restricted access
  10. 10. H3Africa Data Sharing Access and Release Policy • Balance between ensuring that adequate safeguards to protect participants while not being a barrier for scientists to advance research • Maximizing the availability of research data, in a timely and responsible manner • Protecting the rights and privacy of human subjects who participated in research studies • Recognizing the scientific contribution of researchers who generated the data • Considering the nature and ethics of the research proposed in establishing the timely release of data, and mechanisms of data sharing • Promoting deposition of genomic data in existing community data repositories whenever possible
  11. 11. H3Africa DSAR policy • For genomic and phenotype data: • Submit to H3Africa archive • 9 months to submit to public repository • 12 month publication embargo • In EGA access controlled by DBAC 2 months Research site- QC genomic & phenotypic data 9 months H3ABioNet- Genomic & phenotypic data stored 12 months EGA- Genomic & phenotypic data available through DBAC with publication embargo Long term EGA- Genomic & phenotypic data available through DBAC without publication embargo Research site -Data generation 23 months
  12. 12. Data and Biospecimen Access Committee • Review and approve requests for data and/or biospecimens • Biospecimens: • first 3 years only access outside H3Africa for those collaborating in Africa • Use info on availability in biobanks • Data generated must be submitted to EGA • Scientific review/funding available • Data • DBAC will ensure requestor has expertise and resources • Scientific review • Evaluation criteria • Scientific merit • Institutional capacity for the research • Potential for publication or translation, e.g. new therapies
  13. 13. Data access agreement • H3Africa not liable for use of data • Only use data for agreed purpose • Maintain data confidentiality • Make sure data is secure • Acknowledge source of data • Submit annual reports • Project put onto website • Access is granted for 1 year
  14. 14. What is required for sharing data? • Consent from participants –varying consent within a study is difficult • Robust data sharing model with implementation strategy for data access, transfer, etc • Access agreements and MoUs • Infrastructure for • Data transfer • Data storage & compute • Training • Data curation and harmonization
  15. 15. Infrastructure development & support • Node server purchases • Sys Admin “How to” documents • Access to HPC, Cloud (Docker containers) • Internet connectivity measurement -NetMap • Data transfer –Globus online, testing vs Aspera • Data storage • Training in IT, data management and general bioinformatics use H3ABioNet combined equipment: 512 cores, 2384 GB RAM, 120TB storage
  16. 16. Building human capacity for genomics data management • Need to train • Bioinformaticians • Data scientists • Bioinformatics users • Medical professionals Specialised courses, shadow teams, internships ISCB EMBL-EBI training team
  17. 17. Training Approaches Face to face Workshops Train-the-Trainer Internships Live Online Training Hackathons/Data Jamborees Access to training materials
  18. 18. Harmonizing H3Africa data
  19. 19. Harmonizing H3Africa data Mapping biobank data to OMIABIS ontology Mapping CRFs to ontologies, e.g. phenotype or disease ontology Mapping genomics data to Experimental Factor ontology PHWG has developed set of core phenotypes, standard CRF Mapping ethics consent info to Data Use ontology
  20. 20. Harmonizing H3Africa data Mapping biobank data to OMIABIS ontology Mapping CRFs to ontologies, e.g. phenotype or disease ontology Mapping genomics data to Experimental Factor ontology PHWG has developed set of core phenotypes, standard CRF Mapping ethics consent info to Data Use ontology Biorepositories Archive & EGA Catalogue
  21. 21. Making data FAIR • Findable, Accessible, Interoperable, and Re-usable https://www.force11.org/group/fairgroup/fairprinciples • To be Findable: identifier, metadata, indexed • To be Accessible: find by identifier, clear rules for access and authentication • To be Interoperable: standardized and cross- referenced • To be Reusable: licensed, metadata with provenance, standards
  22. 22. Making data FAIR • Findable, Accessible, Interoperable, and Re-usable https://www.force11.org/group/fairgroup/fairprinciples • To be Findable: identifier, metadata, indexed • To be Accessible: find by identifier, clear rules for access and authentication • To be Interoperable: standardized and cross- referenced • To be Reusable: licensed, metadata with provenance, standards
  23. 23. H3Africa Data Archive • Assist H3Africa projects as data coordination center: TransferValidate Store Submit to EGA Obtain EGA accessions for publications 0.5 petabytes storage size including offsite replication Local EGA feasibility?
  24. 24. Data and biospecimen catalogue
  25. 25. Beacons …a simple public web service … designed merely to accept a query of the form "Do you have any genomes with an 'A' at position 100,735 on chromosome 3" (or similar data) and responds with one of "Yes" or "No." genomicsandhealth.org • Advantages • Locally hosted • Minimal information (yes/no for a given allele) • Protection against “scraping” https://goo.gl/Bkd0dx
  26. 26. Summary • H3Africa is largest collection of human biomedical data in Africa to date • Human data is sensitive and needs to be shared while protecting participants and researchers • Need to build infrastructure for sharing: • harmonized/curated metadata • storage and transfer facilities • human capacity -skills • Need to provide access tools –web interface, public repositories, database • Trying to promote Open science –user groups, sessions
  27. 27. Acknowledgements The H3ABioNet Consortium Funding: NIH Common Fund, NGHRI grant: U41HG006941, U24HG006941 H3ABioNet team at CBIO: • Sumir Panji • Gerrit Botha • Ayton Meintjes • Suresh Maslamoney • Vicky Nembaware • Ziyaad Parker • Kim Gurwitz • Mamana Mbiyavanga • Katherine Johnston Slides: Sumir Panji, Michelle Skelton

×