This document discusses open data in bioinformatics and the infrastructure needed to achieve sustainable development goals. It summarizes the exponential growth in biological data from advances like high-throughput sequencing platforms. The H3Africa initiative aims to apply genomics research to improve African health by supporting projects across 27 countries. The H3Africa Bioinformatics Network is developing capacity to archive and analyze the genomic and phenotypic data being collected from over 75,000 research participants to understand disease susceptibility in African populations.
Presentation during the 14th Association of African Universities (AAU) Conference and African Open Science Platform (AOSP)/Research Data Alliance (RDA) Workshop in Accra, Ghana, 7-8 June 2017.
Mr. Thomas A. Burke - One Health, Traceability and Emerging TechnologiesJohn Blue
One Health, Traceability and Emerging Technologies - Mr. Thomas A. Burke, Food Traceability Scientist, Global Food Traceability Center, Institute of Food Technologists, from the 2018 NIAA Annual Conference, Livestock Traceability: Opportunities for Animal Agriculture, plus the Traceability and the Real World Interactive Workshop, April 10 - 12, Denver, CO, USA.
More presentations at https://www.youtube.com/channel/UCeUDeS810OcOfuEYwj1oHKQ
2016 Data Commons and Data Science Workshop June 7th and June 8th 2016. Genomic Data Commons, FAIR, NCI and making data more findable, publicly accessible, interoperable (machine readable), reusable and support recognition and attribution
Work Package (WP) 12 – PEARL Barriers In search for an inventory and assessme...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
International challenges regarding the future sharing of sequence data. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
Presentation during the 14th Association of African Universities (AAU) Conference and African Open Science Platform (AOSP)/Research Data Alliance (RDA) Workshop in Accra, Ghana, 7-8 June 2017.
Mr. Thomas A. Burke - One Health, Traceability and Emerging TechnologiesJohn Blue
One Health, Traceability and Emerging Technologies - Mr. Thomas A. Burke, Food Traceability Scientist, Global Food Traceability Center, Institute of Food Technologists, from the 2018 NIAA Annual Conference, Livestock Traceability: Opportunities for Animal Agriculture, plus the Traceability and the Real World Interactive Workshop, April 10 - 12, Denver, CO, USA.
More presentations at https://www.youtube.com/channel/UCeUDeS810OcOfuEYwj1oHKQ
2016 Data Commons and Data Science Workshop June 7th and June 8th 2016. Genomic Data Commons, FAIR, NCI and making data more findable, publicly accessible, interoperable (machine readable), reusable and support recognition and attribution
Work Package (WP) 12 – PEARL Barriers In search for an inventory and assessme...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
International challenges regarding the future sharing of sequence data. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
May 2016 NCI Cancer Center Directors meeting. Data Sharing and the Cancer Genomic Data Commons (GDC). Focus is on cancer genomic and clinical phenotype data.
DOE-NCI Pilots presentation at the Frederick National Laboratory Advisory Com...Warren Kibbe
May 2016 FNLAC presentation of the DOE-NCI partnership around three pilots focused on existing projects in NCI and existing NSCI directives and activities in DOE.
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
"Building a Sustainable Knowledge Base for the Marine Protected Areas Monitoring Enterprise" a presentation to the California Ocean Science Trust, Oakland, California March 16, 2010
Nci clinical genomics data sharing ncra sept 2016Warren Kibbe
Gave an update on the Cancer Research Data Ecosystem, the Genomic Data Commons, Cloud Pilots, incentives for data sharing in cancer research to the NCI Council of Research Advocates (NCRA) on Monday, September 26th, 2016
Developing a national strategy to bring pathogen genomics into practiceExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Developing a national strategy to bring pathogen genomics into practice. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
Cancer Moonshot, Data sharing and the Genomic Data CommonsWarren Kibbe
Gave the inaugural Informatics Grand Rounds at City of Hope on September 8th. NIH Commons, Genomic Data Commons, NCI Cloud Pilots, Cancer Moonshot and rationale for changing incentives around data sharing all discussed.
Justin Withers from the Australian Research Council presented at University of Technology Sydney's RIA Data Management Workshop on 21 June 2018. In partnership with the Australian Research Council, the National Health and Medical Research Council, the Australian Research Data Commons, and RMIT University, this is part of a national workshop series in data management for research integrity advisors.
Research Ethics and Use of Restricted Access Datalibbiestephenson
Presentation given to the California Center for Population Research on principles of research ethics, data management for protection of privacy and confidentiality, and applying for access to restricted data in social science research.
International perspective for sharing publicly funded medical research dataARDC
Presentation by Olivier Salvado, CSIRO, to the 'Unlocking value from publicly funded Clinical Research Data' workshop, cohosted by ARDC and CSIRO at ANU on 6 March 2019.
May 2016 NCI Cancer Center Directors meeting. Data Sharing and the Cancer Genomic Data Commons (GDC). Focus is on cancer genomic and clinical phenotype data.
DOE-NCI Pilots presentation at the Frederick National Laboratory Advisory Com...Warren Kibbe
May 2016 FNLAC presentation of the DOE-NCI partnership around three pilots focused on existing projects in NCI and existing NSCI directives and activities in DOE.
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
"Building a Sustainable Knowledge Base for the Marine Protected Areas Monitoring Enterprise" a presentation to the California Ocean Science Trust, Oakland, California March 16, 2010
Nci clinical genomics data sharing ncra sept 2016Warren Kibbe
Gave an update on the Cancer Research Data Ecosystem, the Genomic Data Commons, Cloud Pilots, incentives for data sharing in cancer research to the NCI Council of Research Advocates (NCRA) on Monday, September 26th, 2016
Developing a national strategy to bring pathogen genomics into practiceExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Developing a national strategy to bring pathogen genomics into practice. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
Cancer Moonshot, Data sharing and the Genomic Data CommonsWarren Kibbe
Gave the inaugural Informatics Grand Rounds at City of Hope on September 8th. NIH Commons, Genomic Data Commons, NCI Cloud Pilots, Cancer Moonshot and rationale for changing incentives around data sharing all discussed.
Justin Withers from the Australian Research Council presented at University of Technology Sydney's RIA Data Management Workshop on 21 June 2018. In partnership with the Australian Research Council, the National Health and Medical Research Council, the Australian Research Data Commons, and RMIT University, this is part of a national workshop series in data management for research integrity advisors.
Research Ethics and Use of Restricted Access Datalibbiestephenson
Presentation given to the California Center for Population Research on principles of research ethics, data management for protection of privacy and confidentiality, and applying for access to restricted data in social science research.
International perspective for sharing publicly funded medical research dataARDC
Presentation by Olivier Salvado, CSIRO, to the 'Unlocking value from publicly funded Clinical Research Data' workshop, cohosted by ARDC and CSIRO at ANU on 6 March 2019.
Status of ICT structure, infrastructure and applications existed to manage an...RABNENA Network
Status of ICT structure, infrastructure and applications existed to manage and disseminate information and knowledge of Agricultural Biotechnology Innovations Information in Saudi Arabia, Almotairy, Hany Mohammed S. Al-Assaf, King AbdulAziz City for Science and Technology (KACST)
Data are the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
Invited talk at the national institute of astronomy and geophysics - Helwan on Wed. 21 October 2014 on Data are the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
Invited talk at the national institute of astronomy and geophysics - Helwan on Wed. 21 October 2014 on Data is the new oil: Big data, data mining and bio - inspiring techniques
CINECA webinar slides: Making cohort data FAIRCINECAProject
Cohort studies, which recruit groups of individuals who share common characteristics and follow them over a period of time, are a robust and essential method in biomedical research for understanding the links between risk factors and diseases. Through questionnaires, medical assessments, and other interactions, voluminous and complex data are collected about the study participants. While cohort studies present a treasure trove of data, the data is often not FAIR (findable, accessible, interoperable and reusable). First, due to the sensitive and private nature of medical information, cohort data are often access controlled. Due to the lack of information about the studies (metadata), often one needs to dig deep to know what data is available in a cohort study. Therefore, many cohort datasets suffer from the findable and accessible issues. Second, often data collection is performed with instruments and data specifications tailored to the study. As a result, combining data across cohorts, even ones with similar characteristics, is difficult, making interoperability and reusability a challenge. In this presentation, we will explore several informatics techniques, such as the use of ontology, to make cohort data more FAIR. We will also consider the implications of making cohort data more open and the ethical and governance issues associated with open science benefit sharing.
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 17th February 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Visual Analytical Screening System for Disease Linked Gene Variants - Oyekan...Human Variome Project
Background: The major bottleneck in genome sequencing is no longer data generation, but the computational challenges around data analysis, display and integration. New approaches and methods are, therefore, required to meet these challenges. Visual analytics is the representation and presentation of data that exploits human visual perception abilities in order to amplify cognition. Opportunities exist for African researchers to expand the use of visual discovery tools and curated datasets to enable visual discovery (exploration, mining and analysis via interactive visual interfaces) of bioinformatics results from high-quality genomics research.
Methods: We are developing a system of visual analytics resources that are based on molecular and clinical data including molecular consequences of single nucleotide variants; the RNA-seq expression levels of transcripts; and the functional sites in protein sequences.
Results: We have developed an initial set of visual analytics resources with the use case as the major intrinsic protein family of water and glycerol transporters. Members of these protein family have been implicated in diverse cardiometabolic diseases. The computational resources developed can be adapted for gene lists including those obtained from high-throughput assays. The long-term goal of the project is to empower researchers to make discoveries from largescale molecular and clinical datasets to support decision-making on genetic and environmental determinants of cardiometabolic diseases in Africa.
There is a growing community of open archives among the organizations who are working on agricultural research for development. These organizations are working together in the CIARD initiative (Coherence in Information for Agricultural Research for Development) and opening access to agricultural research papers and data is goal of the initiative. In the last two years the development has gone from some single open archives to a movement that includes globally the AGRIS network, the OceanDoc Initiative, the CGIAR and national networks like “Kenya Agricultural Information Network” and AgroRed Peru. The presentation will present case studies, the results of a recent survey and the work on DSpace and Drupal to customize them as OA tools for the use in the community.
Presentation to the Department of Biology at the University of Windsor, Windsor, Ontario. The description and update of activities related to the International Cancer Genome Consortium (ICGC)
The Academy of Science of South Africa (ASSAf) takes proud in the implementation of this new initiative. We are looking forward working with all African continents in populating this platform with information.
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
Abstract
In this presentation, Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health, will share the NIH’s vision for a modernized, integrated FAIR biomedical data ecosystem and the strategic roadmap that NIH is following to achieve this vision. Dr. Gregurick will highlight projects being implemented by team members across the NIH’s 27 institutes and centers and will ways that industry, academia, and other communities can help NIH enable a FAIR data ecosystem. Finally, she will weave in how this strategy is being leveraged to address the COVID-19 pandemic.
Presenter: Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health
dkNET Webinar Information: https://dknet.org/about/webinar
Dros africa the role of center of excellence in fostering scientific research...Prof. Mohamed Labib Salem
In this presentation, Prof. Mohamed Labib Salem, Prof. of Immunology, Faculty of Science, Tanta University, Egypt presents Dross Africa the role of center of excellence in fostering scientific research in Africa.
في هذه المحاضرة والتي كانت بدعوة من هيئة بحثية من الاتحاد الأوروبي يقدم ا.د. محمد ابيب سالم أستاذ علم المناعة بكلية العلوم جامعة طنطا مصر اهمية انشاء مراكز التميز في العلمي في تقدم . البحث العلمي وخاصة في قارة افريقيا ثم يقدم ثلاث مراكز تميز تم انشائهم في جامعة طنطا كمثال واقعي وقصص نجاح. هذه المراكز الثلاث هي مركز التميز لابحاث السرطان ومركز البصمة الجينية للسرطان ومركز ابحاث وعلاج الثلاثليهميا
Similar to Open Data in Bioinformatics and Required Infrastructure towards achieving the SDGs/Samar Kassim (20)
Presentation on behalf of the SA Weather Service presented during SA National Science Week - The harsh realities of climate change, 29 July to 2 August 2019.
Presented at a NeDICC (Network of Data and Information Curation Communities) meeting, 14 March 2019, CSIR, and at the University of Pretoria and the Carnegie Corporation of New York Capstone Conference, 24-29 March 2019, Kieviets Kroon.
Presented on 30 August 2018: Deployment of Open Data Driven Solutions for Socio-economic Value thorough Good Governance and Efficient Public Service Delivery -
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Open Data in Bioinformatics and Required Infrastructure towards achieving the SDGs/Samar Kassim
1. Open
Data
in
Bioinforma/cs
and
Required
Infrastructure
towards
achieving
the
SDGs
www.h3abionet.org
9th
BioVisionAlexandria
Conference,
Alexandria,
Egypt
2018
Prof.
Samar
Kassim
samar_kassim@med.asu.edu.eg
9th
BioVisionAlexandria
Conference,
Egypt
2. Introduc/on
• Major
technological
advances
in
molecular
biology
is
the
sophis7ca7on,
diversity,
scale
and
decreasing
cost
of
the
data
being
generated
i.e.
by
high
throughput
pla;orms
• First
human
genome
sequence:
– Throughput
2.8
million
bases
per
24
hours
on
AB3730xl
sequencers
– 13
years
to
sequence
3
billion
bases
at
x10
coverage
– Cost
~
500
million
USD
(lower
bound
es7mate)
• Next
(now)
genera7on
sequencing:
– Throughput
1
million
bases
per
second
– ~10
hours
to
sequence
3
billion
bases
at
x10
coverage
– Cost
~
4,000
USD
per
genome
hTps://www.genome.gov/sequencingcosts/
hTp://en.wikipedia.org/wiki/File:Historic_cost_of_sequencing_a_human_genome.svg
Author
=
Ben
Moore
9th
BioVisionAlexandria
Conference,
Egypt
3. Data
driven
biological
science
-‐
bioinforma/cs
• Decreasing
data
genera7on
costs
shiZed
biological
sciences
to
a
data
driven
science
with
bioinforma7cs
playing
a
major
component
Stephens
ZD,
Lee
SY,
Faghri
F,
Campbell
RH,
Zhai
C,
et
al.
(2015)
Big
Data:
Astronomical
or
Genomical?.
PLOS
Biology
13(7):
e1002195.
hTps://doi.org/10.1371/journal.pbio.1002195:
hTp://journals.plos.org/plosbiology/ar7cle?id=10.1371/journal.pbio.1002195
9th
BioVisionAlexandria
Conference,
Egypt
4. Genomics
and
Africa
-‐
H3Africa
• “The
Human
Heredity
and
Health
in
Africa
(H3Africa)
Ini/a/ve
aims
to
facilitate
a
contemporary
research
approach
to
the
study
of
genomics
and
environmental
determinants
of
common
diseases
with
the
goal
of
improving
the
health
of
African
popula7ons.”
(hTp://h3africa.org/)
• “The
vision
of
H3Africa
is
to
create
and
support
a
pan-‐con7nental
network
of
laboratories
that
will
be
equipped
to
apply
leading-‐edge
research
to
the
study
of
the
complex
interplay
between
environmental
and
gene7c
factors
which
determines
disease
suscep7bility
and
drug
responses
in
African
popula7ons.”
(hTp://h3africa.org/about/vision)
9th
BioVisionAlexandria
Conference,
Egypt
5. H3Africa
Phase
I
overview
• 25
research
projects
in
Africa
• >
500
inves7gators
• Covers
27
African
countries
• Upto
75,000
research
par7cipants
• >
USD
76
million
invested
in
phase
1
8
Collabora/ve
Centers
7
Research
Projects
3
Biorepositories
6
Ethics
Grants
The
H3Africa
Consor/um
Bioinforma/cs
Network
hTp://h3africa.org/consor7um/projects
9th
BioVisionAlexandria
Conference,
Egypt
6. H3Africa
Bioinformatcs
Network
(H3ABioNet)
• Pan
African
Bioinforma7cs
Network
to
develop
bioinforma7cs
capacity
in
Africa
and
support
the
H3Africa
research
projects
• 28
nodes
in
17
African
countries
• PI:
Prof.
Nicky
Mulder,
CBIO-‐UCT
• Educa7on,
infrastructure,
research
• Archive
African
genomics
data
9th
BioVisionAlexandria
Conference,
Egypt
7. H3Africa
data
being
collected
(Phase
I)
• Phenotype
data
(associated
with
genotype
data)
– Demographic
informa7on
– Anthropometric
data
– Disease
and
health
related
phenotype
data
• Gene7c
Varia7on
data
human
and
pathogen
– Sequence
data
(whole
genome,
exome,
targeted)
• Genotyping
chip
array
data
– ~55,000
samples
to
be
run
on
an
H3Africa
African
custom
chip
• Microbiome
sequence
data
– Pa7ent/sample
phenotypes
– Non-‐human
16S
rRNA
sequence
data
for
microbiome
– Non-‐human
full
genome
sequence
data
for
microbiome
– Possible
human
sequence
contamina7on
• Biospecimens
to
be
deposited
at
the
H3Africa
biorepositories
Image
credits:
Na/onal
Human
Genome
Research
Ins/tute
(h]ps://www.genome.gov/imagegallery/)
9th
BioVisionAlexandria
Conference,
Egypt
8. Lack
of
repository
for
African
Genomics
data
• 1,759
datasets
with
the
query
“African”
–
none
in
Africa
hTps://discover.reposi7ve.io/
9th
BioVisionAlexandria
Conference,
Egypt
9. 9th
BioVisionAlexandria
Conference,
Egypt
H3Africa
Data
Archive
• Assist
H3Africa
projects
as
data
coordina7on
center:
Transfer
Validate
Store
Submit
to
EGA
Obtain
EGA
accessions
for
publica/ons
0.5
petabytes
storage
size
including
offsite
replica7on
10. H3Africa
Catalogue
9th
BioVisionAlexandria
Conference,
Egypt
• Online
catalogue
with
meta-‐data
to
search
and
apply
for
datasets
and
biospecimens
(under
development)
11. Human
gene/c
data
privacy
• H3Africa
rich
source
of
meta-‐data
(phenotypes)
(1)
Age
&
(2)
Sex
(3) Country
of
birth
(4) Current
residence
(5) Native
language
(6)
Ethno-‐linguistic/tribal
affiliation
(7) Country
of
birth
of
father
and
mother
(8) Na7ve
language
of
father
and
mother
(9) Ethno-‐linguistic/tribal
affiliation
of
mother
and
father
(10) Height
(11) Weight
(12) Current
medica7ons
(13) Smoking
history
(14) Alcohol
history
Image
credits:
Na/onal
Human
Genome
Research
Ins/tute
(h]ps://www.genome.gov/imagegallery/)
• Combina7on
of
phenotype
and
gene7c
data
makes
it
possible
to
iden7fy
different
popula7ons
and
individuals
–
restricted
access
9th
BioVisionAlexandria
Conference,
Egypt
12. Sharing
of
research
data
and
outputs
• Funders’
data
sharing
policies
“The
Wellcome
Trust
is
commiTed
to
ensuring
that
the
outputs
of
the
research
it
funds,
including
research
data,
are
managed
and
used
in
ways
that
maximise
public
benefit.
Making
research
data
widely
available
to
the
research
community
in
a
7mely
and
responsible
manner
ensures
that
these
data
can
be
verified,
built
upon
and
used
to
advance
knowledge
and
its
applica7on
to
generate
improvements
in
health.”
hTps://wellcome.ac.uk/funding/managing-‐grant/policy-‐data-‐management-‐and-‐sharing
“The
Na7onal
Ins7tutes
of
Health
(NIH)
Genomic
Data
Sharing
Policy
expects
that
genomic
research
data
from
NIH-‐supported
studies
involving
human
specimens
as
well
as
non-‐human
and
model
organisms
will
be
submiTed
to
an
NIH-‐designated
data
repository.
The
list
below
provides
examples
of
relevant
databases.”
hTps://gds.nih.gov/02dr2.html
9th
BioVisionAlexandria
Conference,
Egypt
13. Limits
to
sharing
human
gene/c
data
• Ethics:
– Digital
data
(genomes)
can
be
stored
indefinitely,
biobank
specimens
can
be
stored
for
up
to
20
years
–
secondary
use
– Rapid
innova7on
with
‘omics
technologies
• H3Africa:
“Seven
projects
used
broad
consent,
five
projects
used
7ered
consent
and
one
used
specific
consentӤ
• History
of
vulnerable
popula7ons,
low
educa7on
levels
and
exploita7on
• Blood
sample
collec7on
and
visits
to
clinics
associated
with
disease
and
treatment
–
even
if
a
healthy
control
• “All
but
one
of
the
consent
forms
that
we
reviewed
included
a
statement
about
data
sharing.”
§
§
Munung
NS,
Marshall
P,
Campbell
M,
et
al
Obtaining
informed
consent
for
genomics
research
in
Africa:
analysis
of
H3Africa
consent
documents.
Journal
of
Medical
Ethics
2016;42:132-‐137)
Ethical
considera7ons
Informed
consent
Par7cipant
iden7fica7on
S7gma7sa7on
Benefit
sharing
9th
BioVisionAlexandria
Conference,
Egypt
14. Limits
to
sharing
human
gene/c
data
• Non-‐harmonized
na7on
/
regional
laws
and
policies
for
ethics
and
genome
data
sharing
within
Africa
Image
credits:
hTps://en.wikipedia.org/wiki/African_Economic_Community
9th
BioVisionAlexandria
Conference,
Egypt
15. H3Africa
data
sharing
and
access
policy
• Balance
between
ensuring
that
adequate
safeguards
to
protect
par7cipants
while
not
being
a
barrier
for
scien7sts
to
advance
research:
- Maximizing
the
availability
of
research
data,
in
a
7mely
and
responsible
manner.
- Protec7ng
the
rights
and
privacy
of
human
subjects
who
par7cipated
in
research
studies.
- Recognizing
the
scien7fic
contribu7on
of
researchers
who
generated
the
data.
- Considering
the
nature
and
ethics
of
the
research
proposed
in
establishing
the
7mely
release
of
data,
and
mechanisms
of
data
sharing.
- Promo7ng
deposi7on
of
genomic
data
in
exis7ng
community
data
repositories
whenever
possible
hTp://h3africa.org/images/DataSARWG_folders/FinalDocsDSAR/H3Africa%20Consor7um%20Data
%20Access%20%20Release%20Policy%20Aug%202014.pdf
9th
BioVisionAlexandria
Conference,
Egypt
16. Challenges
in
sharing
data
–
metadata
standards
• Meta-‐data
(phenotype)
data
is
collected
via
case
report
forms
(CRFs)
Project
1
CRF
Project
2
CRF
Project
3
CRF
Female
Woman
1
Daily
units
Weekly
units
User
defined
7me
period
• Same
ques7on
–
data
coded
in
different
ways
• Similar
measure
–
collected
in
different
ways
9th
BioVisionAlexandria
Conference,
Egypt
17. Use
established
standards
-‐
Ontologies
• “An
ontology
defines
a
common
vocabulary
for
researchers
who
need
to
share
informa7on
in
a
domain.
It
includes
machine-‐interpretable
defini7ons
of
basic
concepts
in
the
domain
and
rela7ons
among
them.”*
*hTp://protege.stanford.edu/publica7ons/ontology_development/ontology101-‐noy-‐
mcguinness.html
9th
BioVisionAlexandria
Conference,
Egypt
18. Op/ons
to
aid
data
sharing
• Make
data
Findable,
Accessible,
Interoperable
and
Reusable
(FAIR
compliant)
• Do
you
see
a
gene7c
variant
in
a
specific
posi7on
within
your
dataset
–
Yes
/
No
as
in
the
case
for
the
South
African
Human
Genome
Program
(SAHGP)
Global
Alliance
for
Genomics
and
Health:
hTp://ga4gh.org/#/beacon
9th
BioVisionAlexandria
Conference,
Egypt
19. H3Africa
genotyping
chip
• Current
genotyping
technologies
are
designed
for
European
popula7ons
• African
popula7ons
under
represented,
although
have
the
most
diversity
9th
BioVisionAlexandria
Conference,
Egypt
Image
credits:
Na/onal
Human
Genome
Research
Ins/tute
(h]ps://www.genome.gov/imagegallery/)
20. Designing
the
H3Africa
genotyping
chip
9th
BioVisionAlexandria
Conference,
Egypt
Image
credits:
Na/onal
Human
Genome
Research
Ins/tute
(h]ps://www.genome.gov/imagegallery/)
• Collabora7on
between
H3ABioNet
and
Na7onal
Center
for
Supercompu7ng
Applica7ons
(NCSA-‐US
based)
via
US
partner
at
University
of
Illinois
• U7lized
the
Bluewaters
supercomputer
facili7es
and
CHPC
facili7es
212,000
Node
compu7ng
hours
used
at
Bluewaters
600
TB
of
storage
needed
Chip
undergone
assessment
and
in
use
with
pos7ve
results
h]ps://twi]er.com/billgates/status/800800954790465536?lang=en
21. Connec/vity
for
data
transfers
GO endpoints
Transfer speeds (Mbps)
(min, max)
Baylor <-> Blue Waters
340, 1900
Blue Waters -> UCT
204, 322
CHPC <-> Blue Waters
81, 243
UCT <-> CHPC
34, 406
Sanger <-> UCT
38, 76
GO
source
and
des/na/on
Files
to
transfer
and
size
per
sample
Total
size
of
transfer
for
350
samples
Min
transfer
speed
Time
to
transfer
Baylor
to
Blue
Waters
Baylor
FASTQ.gzs
/
100GB
75TB
340Mbps
21
days
Blue
Waters
to
UCT
Baylor
FASTQ.gzs
/
100GB
75TB
200Mbps
35
days
Blue
Waters
to
UCT
BW
BAMs
/
100GB
40TB
200Mbps
19
days
UCT
to
CHPC
BW
BAMs
/
100GB
40TB
34Mbps
109
days
CHPC
to
UCT
Union
set
/
VCFs
1TB
34Mbps
3
days
UCT
to
Sanger
Union
set
/
VCFs
1TB
34Mbps
3
days
Globus
Online
installed
at
Nodes
9th
BioVisionAlexandria
Conference,
Egypt
22. Challenge
of
unequal
infrastuctures
• Diverse
levels
of
exper7se
and
infrastructure
between
different
countries
www.project-‐redcap.org/map_fullscreen.php
SoZware
and
hardware
sanc7ons
exacerbate
exis7ng
inequali7es
e.g
Sudan
Node
hTp://mgafrica.com/ar7cle/2015-‐01-‐14-‐17-‐
startling-‐facts-‐about-‐the-‐state-‐of-‐science-‐and-‐
research-‐in-‐africa
9th
BioVisionAlexandria
Conference,
Egypt
23. Bioinforma/cs
educa/on
9th
BioVisionAlexandria
Conference,
Egypt
Aim:
• Basic
bioinforma7cs
training
for
interested
H3Africa
members
(bioinforma7cs
users
–
Introduc7on
to
Bioinforma7cs
Training)
• Web-‐based
bioinforma7cs
tools
and
resources
and
how
to
use
them
Course
logis/cs:
•
3
months,
2
days
contact
7me
per
week
(3
hours
per
session)
•
Distance
learning
model
–
physical
classrooms
connected
to
virtual
classroom
•
Mconf
–
video
conferencing
•
Vula
–
course
management
virtual
classroom
24. 9th
BioVisionAlexandria
Conference,
Egypt
IBT_2017
classroom
sites
27
in
total
(vs.
20
classrooms
in
2016)
Countries
that
have
joined
IBT
in
2017:
Ethiopia,
Burkina
Faso
Some
par7cipants
from
first
course
are
going
to
be
TAs
Over
580
enrolled
Par/cipants
and
over
130
volunteer
staff
IBT
2017
Classrooms
Paper
published
on
course
design
VIRTUAL CLASSROOM
classroom site 2016
new classroom site
2017
classroom site 2016
and 2017
25. Conclusion
• Bioinforma7cs
=
big
data
and
needs
computa7onal
power,
storage,
fast
read
and
write
for
processing
• Well
defined
meta-‐data
standards
are
vital
for
interoperability
and
sharing
of
data
• Cyber
infrastructure
for
moving
and
sharing
large
datasets
is
needed
to
foster
open
data
and
open
science
• Educa7on
and
skills
development
essen7al
for
African
ci7zens
to
take
advantage
of
the
data
revolu7on
• Percep7ons
and
a{tudes
–
no
amount
of
infrastructure
will
drive
Open
data
and
Open
science
if
the
sen7ment
is
absent
9th
BioVisionAlexandria
Conference,
Egypt
26. Acknowledgements
• Prof
Nicky
Mulder
and
H3ABioNet
members
• Ina
Smith
and
the
Academy
of
Science
of
South
Africa
• BioVisionAlexandria
2018
organizers
H3ABioNet
Consor/um
Members
2017
9th
BioVisionAlexandria
Conference,
Egypt
27. Conclusions
Provide
data
archiving
solu7on
for
H3Africa
projects
to
ensure
that
local
copy
of
the
data
remains
on
the
con7nent
9th
BioVisionAlexandria
Conference,
Egypt
28. Communica/on
–
H3Africa
Image
credit:
hTps://commons.wikimedia.org/wiki/File:UTC_hue4map_X_world_Robinson.png
• H3Africa
working
groups
meet
every
fortnight
• Regular
mee7ngs
are
challenging
due
to
diversity
of
7mezones
(most
funders
in
the
US)
and
daylight
saving
hours
9th
BioVisionAlexandria
Conference,
Egypt
29. Communica/on
–
H3Africa
• H3Africa
funders
and
project
members
meet
face
to
face
every
six
months
to
provide
reports
and
for
working
groups
to
also
wrap
up
deliverables
9th
BioVisionAlexandria
Conference,
Egypt
30. Communica/on
–
H3ABioNet
• Within
H3ABioNet
the
nodes
are
located
in
Africa
so
7me
differences
are
not
a
hindrance
• Working
groups
meet
once
a
month
and
network
meets
annually
for
SAB
review
and
network
business
• Only
some
countries
have
toll
free
access
to
a
booked
conference
call,
costly
• Challenges:
communica7on
pla;orms
hTp://mconf.org/
9th
BioVisionAlexandria
Conference,
Egypt
35. OECD
–
WDS
Workshop,
Brussels
2017
Ontologies work
35
Adapting
OMIABIS
ontology to
H3Africa data
Mapping CRFs to ontologies, e.g.
phenotype or disease ontology
Mapping
genomics data to
Experimental
Factor ontology
Developing Sickle Cell Disease Ontology
36. OECD
–
WDS
Workshop,
Brussels
2017
Beacons in Africa
hTps://beacon-‐network.org//#/directory
• First Beacon in Africa “lit” on October 2016 for the SAHGP