I spoke on "Big Data in Biology". The talk basically concentrates on how biology has affected big data and how big data has become a key player in biology. I have also covered how DNA storage can address long term archival storage.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Internal NIH Seminar to the BISTI Team on some early thoughts from the Associate Director for Data Science (ADDS). These ideas are for discussion only and in no way reflect what might happen subsequently. Presented April 1, 2014 (the date is purely a coincidence).
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
The document discusses the NIH's vision of becoming a digital enterprise to enhance biomedical research. It outlines how research is becoming more digital and data-driven. The NIH aims to foster open sharing of data and tools through its Commons platform to facilitate collaboration and reproducibility. It also stresses the importance of training the next generation of data scientists to enable the digital enterprise. The end goal is to accelerate discovery and improve health outcomes through more integrated and data-driven research.
The document discusses the rise of data science and its disruptive impact on higher education. It analyzes precedents like bioinformatics that were enabled by new digital data sources and technologies. The author advocates that universities should embrace data science by establishing interdisciplinary collaborations, investing in data infrastructure, and ensuring research has societal value and responsibility.
Big Data in Biomedicine: Where is the NIH HeadedPhilip Bourne
The National Institutes of Health (NIH) is taking actions to address the implications of big data for biomedical research and healthcare. These include developing a "Commons" approach to make data findable, accessible, interoperable and reusable. The NIH is also establishing initiatives like the Precision Medicine Initiative to generate large datasets and the Center for Predictive Computational Phenotyping to develop predictive models from electronic health records. Overall, the NIH aims to train a workforce equipped for data science and facilitate open collaboration to realize the potential of big data for improving health outcomes.
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
Genome sharing projects across the world
Did you ever wonder what happened to the exponential increase in genome sequencing data? It is out there around the world and a lot of it is consented for research use. This means that if you just know where to find the data, you can potentially analyse gigabytes of data to power your research.
In this talk Fiona will present community genome initiatives, the genome sharing projects across the world, how you can benefit from this wealth of data in your work, and how you can boost your academic career by sharing and collaboration.
by Fiona Nielsen, Founder and CEO of DNAdigest and Repositive
With a background in software development Fiona pursued her career in bioinformatics research at Radboud University Nijmegen. Now a scientist-turned-entrepreneur Fiona founded DNAdigest and its social enterprise spin-out Repositive Ltd. Both the charity and company focus on efficient and ethical sharing of genetics data for research to accelerate diagnostics and cures for genetic diseases.
I spoke on "Big Data in Biology". The talk basically concentrates on how biology has affected big data and how big data has become a key player in biology. I have also covered how DNA storage can address long term archival storage.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Internal NIH Seminar to the BISTI Team on some early thoughts from the Associate Director for Data Science (ADDS). These ideas are for discussion only and in no way reflect what might happen subsequently. Presented April 1, 2014 (the date is purely a coincidence).
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
The document discusses the NIH's vision of becoming a digital enterprise to enhance biomedical research. It outlines how research is becoming more digital and data-driven. The NIH aims to foster open sharing of data and tools through its Commons platform to facilitate collaboration and reproducibility. It also stresses the importance of training the next generation of data scientists to enable the digital enterprise. The end goal is to accelerate discovery and improve health outcomes through more integrated and data-driven research.
The document discusses the rise of data science and its disruptive impact on higher education. It analyzes precedents like bioinformatics that were enabled by new digital data sources and technologies. The author advocates that universities should embrace data science by establishing interdisciplinary collaborations, investing in data infrastructure, and ensuring research has societal value and responsibility.
Big Data in Biomedicine: Where is the NIH HeadedPhilip Bourne
The National Institutes of Health (NIH) is taking actions to address the implications of big data for biomedical research and healthcare. These include developing a "Commons" approach to make data findable, accessible, interoperable and reusable. The NIH is also establishing initiatives like the Precision Medicine Initiative to generate large datasets and the Center for Predictive Computational Phenotyping to develop predictive models from electronic health records. Overall, the NIH aims to train a workforce equipped for data science and facilitate open collaboration to realize the potential of big data for improving health outcomes.
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
Genome sharing projects across the world
Did you ever wonder what happened to the exponential increase in genome sequencing data? It is out there around the world and a lot of it is consented for research use. This means that if you just know where to find the data, you can potentially analyse gigabytes of data to power your research.
In this talk Fiona will present community genome initiatives, the genome sharing projects across the world, how you can benefit from this wealth of data in your work, and how you can boost your academic career by sharing and collaboration.
by Fiona Nielsen, Founder and CEO of DNAdigest and Repositive
With a background in software development Fiona pursued her career in bioinformatics research at Radboud University Nijmegen. Now a scientist-turned-entrepreneur Fiona founded DNAdigest and its social enterprise spin-out Repositive Ltd. Both the charity and company focus on efficient and ethical sharing of genetics data for research to accelerate diagnostics and cures for genetic diseases.
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
This document discusses open access to research data and peer review of data publications. It notes that as a first step, data underpinning journal articles should be made concurrently available in accessible databases. The Royal Society report in 2012 advocated for all science literature and data to be online and interoperable. Key issues in linking data to the scientific record are data persistence, quality, attribution, and credit. The document provides examples from astronomy of data reuse leading to new publications and cites a study finding poor reproducibility of ecological data sets over time as data availability declines. It outlines different levels of research data from raw to processed to published and discusses initiatives for open data publication and peer review.
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.
- The document discusses challenges related to biomedical data including that data is growing rapidly, stored across silos, and expensive to maintain while demands for sharing are increasing. It also notes a lack of data science skills.
- Solutions explored include developing the NIH Commons, which would integrate disparate cloud initiatives using BD2K standards to make data findable, accessible, interoperable and reusable. This could enable new insights from aggregate analysis across datasets.
- A 3-year BD2K-sponsored pilot of the Commons is underway to address questions around discoveries, productivity, reproducibility and cost-effectiveness compared to current approaches. The pilot involves moving model organism databases to the Commons as a test case.
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
This document discusses the promise and challenges of data analytics in healthcare and biomedical research. It notes that we are at a point of deception, where digitization is disrupting traditional models through increased data volume, velocity and variety. The document outlines NIH's Big Data to Knowledge initiative to accelerate biomedical discovery through open data sharing and improved analytics. Precision medicine is highlighted as one area that could see major breakthroughs through these approaches. Challenges around data standards, privacy, workforce needs and demonstrating value are also discussed.
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
This document summarizes Philip Bourne's presentation at the Open Eye Meeting in Santa Fe on March 8, 2016. The presentation provided evidence that data sharing and open science have advanced significantly since the early days of crystallography, though challenges remain. Specifically, it discussed how (1) data sharing was difficult in the past but resources like the PDB now see broad data contribution and use, (2) molecular graphics tools could be more integrated and collaborative, and (3) the commons framework may help optimize data accessibility and analysis across diverse users.
CINECA webinar slides: Making cohort data FAIRCINECAProject
Cohort studies, which recruit groups of individuals who share common characteristics and follow them over a period of time, are a robust and essential method in biomedical research for understanding the links between risk factors and diseases. Through questionnaires, medical assessments, and other interactions, voluminous and complex data are collected about the study participants. While cohort studies present a treasure trove of data, the data is often not FAIR (findable, accessible, interoperable and reusable). First, due to the sensitive and private nature of medical information, cohort data are often access controlled. Due to the lack of information about the studies (metadata), often one needs to dig deep to know what data is available in a cohort study. Therefore, many cohort datasets suffer from the findable and accessible issues. Second, often data collection is performed with instruments and data specifications tailored to the study. As a result, combining data across cohorts, even ones with similar characteristics, is difficult, making interoperability and reusability a challenge. In this presentation, we will explore several informatics techniques, such as the use of ontology, to make cohort data more FAIR. We will also consider the implications of making cohort data more open and the ethical and governance issues associated with open science benefit sharing.
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 17th February 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
This document summarizes an update on the Big Data to Knowledge (BD2K) initiative at the National Institutes of Health (NIH). It discusses progress made in the first year of BD2K funding in three key areas: advancing data science research through centers and targeted awards; sharing data and software through the development of indexing tools and standards; and expanding training programs. It outlines funding amounts and recipient numbers for fiscal year 2015. Future plans are outlined through 2021 with the goals of further developing tools and applications, expanding the data sharing commons, and increasing training and sustainability efforts.
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
Since the FAIR data principles were published in 2016, many organizations including science funders and governments have adopted these principles to promote and foster true open science collaborations. However, to define a vision and create a video of a Personal Health Train that leverages worldwide FAIR health data in a federated manner is one step. To actually make this happen at scale and be able to show new scientific and medical insights for it is quite another!
In this webinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 21st January 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
- The document discusses how biomedical research is entering a period of disruption due to factors like big data, digitization, and open science.
- Key points discussed include the history and changing nature of computational biomedicine, implications of large initiatives like the Precision Medicine Initiative, and how funders should respond by encouraging global open science and sharing infrastructure and policies.
- The author advocates for creating a "commons" environment to enable finding and reusing shared digital research objects according to FAIR principles in order to advance open collaborative science.
NITRD Big Data Interagency Working Group Workshop: Pioneering the Future of Federally Supported Data Repositories Jan 13, 2021 - Opening comments on where we are and one suggestion of where we might go with an International Data Science Institute (IDSI) - A blue sky view.
The document discusses the vision for data science at the National Institutes of Health (NIH). It outlines the goals of fostering an open ecosystem to enable biomedical research as a digital enterprise. Examples are provided of how precision medicine could benefit patients in the near future through large national research cohorts, improved understanding of diseases like diabetes through genomics, and new technologies. The document also discusses several key elements needed for the digital research enterprise, including communities, policies, infrastructure, and workforce training through initiatives like the Big Data to Knowledge program.
From Where Have We Come & Where Are We GoingPhilip Bourne
This document discusses the past and future of FORCE11, a community dedicated to improving scholarly communication. It notes that since 2011:
- New communities are defined by interests rather than domains
- Open data, identifiers, and data/software citation have emerged
It also discusses challenges like maintaining a biomedical focus and opportunities like engaging other communities and pursuing public-private partnerships. Specific opportunities mentioned include pursuing community funding, gaining traction for preprints in life sciences, and leveraging touchpoints with funders around issues like reproducibility, data management, and sustainability. The document encourages stakeholders to identify and pursue these opportunities to help shape the research ecosystem.
Big Data in Biomedicine – An NIH PerspectivePhilip Bourne
Keynote at the IEEE International Conference on Bioinformatics and Biomedicine, Washington DC, November 10, 2015.
https://cci.drexel.edu/ieeebibm/bibm2015/
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
UKOLN advocates that libraries take seven steps to support data management and open science in the data decade:
1) Provide briefings on cloud data services in partnership with IT services.
2) Build usable data management tools in partnership with researchers.
3) Develop data sustainability strategies and articulate the costs and benefits.
4) Publish case studies on open science to show benefits of universal data sharing.
5) Present at university ethics committees to highlight open data issues.
6) Raise awareness of citizen science opportunities and guidelines for good practice.
7) Promote data citation and attribution to embed in publication practice.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
Formal languages to map Genotype to Phenotype in Natural Genomesmadalladam
The document discusses using formal language theory to model genotype to phenotype (G2P) mappings. It proposes that G2P mappings are non-linear networks rather than linear pathways, and that formal languages could be used to formally represent these networks. Specifically, it suggests using concepts from computational linguistics like context-free grammars, attribute grammars, and semantic actions to parse genetic sequences and compute their phenotypic outcomes. As an example, it presents a context-free grammar for designing genetic constructs and computing their chemical dynamics using an attribute grammar. In summary, formal languages may provide a way to rigorously define the complex non-linear relationships between genotypes and resulting phenotypes.
Comparative sequence studies of the repeat elements in diverse insect species can provide useful information on how to make use of them for developing abundant markers that can be used in those species;
$ At the moment, a total of 8 species are in genome assembly stages and another 35 are in progress for genome sequencing;
$ Different molecular marker systems in the field of entomology are expected to provide new directions to study insect genomes in an unprecedented way in the years to come
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
This document discusses open access to research data and peer review of data publications. It notes that as a first step, data underpinning journal articles should be made concurrently available in accessible databases. The Royal Society report in 2012 advocated for all science literature and data to be online and interoperable. Key issues in linking data to the scientific record are data persistence, quality, attribution, and credit. The document provides examples from astronomy of data reuse leading to new publications and cites a study finding poor reproducibility of ecological data sets over time as data availability declines. It outlines different levels of research data from raw to processed to published and discusses initiatives for open data publication and peer review.
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.
- The document discusses challenges related to biomedical data including that data is growing rapidly, stored across silos, and expensive to maintain while demands for sharing are increasing. It also notes a lack of data science skills.
- Solutions explored include developing the NIH Commons, which would integrate disparate cloud initiatives using BD2K standards to make data findable, accessible, interoperable and reusable. This could enable new insights from aggregate analysis across datasets.
- A 3-year BD2K-sponsored pilot of the Commons is underway to address questions around discoveries, productivity, reproducibility and cost-effectiveness compared to current approaches. The pilot involves moving model organism databases to the Commons as a test case.
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
This document discusses the promise and challenges of data analytics in healthcare and biomedical research. It notes that we are at a point of deception, where digitization is disrupting traditional models through increased data volume, velocity and variety. The document outlines NIH's Big Data to Knowledge initiative to accelerate biomedical discovery through open data sharing and improved analytics. Precision medicine is highlighted as one area that could see major breakthroughs through these approaches. Challenges around data standards, privacy, workforce needs and demonstrating value are also discussed.
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
This document summarizes Philip Bourne's presentation at the Open Eye Meeting in Santa Fe on March 8, 2016. The presentation provided evidence that data sharing and open science have advanced significantly since the early days of crystallography, though challenges remain. Specifically, it discussed how (1) data sharing was difficult in the past but resources like the PDB now see broad data contribution and use, (2) molecular graphics tools could be more integrated and collaborative, and (3) the commons framework may help optimize data accessibility and analysis across diverse users.
CINECA webinar slides: Making cohort data FAIRCINECAProject
Cohort studies, which recruit groups of individuals who share common characteristics and follow them over a period of time, are a robust and essential method in biomedical research for understanding the links between risk factors and diseases. Through questionnaires, medical assessments, and other interactions, voluminous and complex data are collected about the study participants. While cohort studies present a treasure trove of data, the data is often not FAIR (findable, accessible, interoperable and reusable). First, due to the sensitive and private nature of medical information, cohort data are often access controlled. Due to the lack of information about the studies (metadata), often one needs to dig deep to know what data is available in a cohort study. Therefore, many cohort datasets suffer from the findable and accessible issues. Second, often data collection is performed with instruments and data specifications tailored to the study. As a result, combining data across cohorts, even ones with similar characteristics, is difficult, making interoperability and reusability a challenge. In this presentation, we will explore several informatics techniques, such as the use of ontology, to make cohort data more FAIR. We will also consider the implications of making cohort data more open and the ethical and governance issues associated with open science benefit sharing.
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 17th February 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
This document summarizes an update on the Big Data to Knowledge (BD2K) initiative at the National Institutes of Health (NIH). It discusses progress made in the first year of BD2K funding in three key areas: advancing data science research through centers and targeted awards; sharing data and software through the development of indexing tools and standards; and expanding training programs. It outlines funding amounts and recipient numbers for fiscal year 2015. Future plans are outlined through 2021 with the goals of further developing tools and applications, expanding the data sharing commons, and increasing training and sustainability efforts.
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
Since the FAIR data principles were published in 2016, many organizations including science funders and governments have adopted these principles to promote and foster true open science collaborations. However, to define a vision and create a video of a Personal Health Train that leverages worldwide FAIR health data in a federated manner is one step. To actually make this happen at scale and be able to show new scientific and medical insights for it is quite another!
In this webinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 21st January 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
- The document discusses how biomedical research is entering a period of disruption due to factors like big data, digitization, and open science.
- Key points discussed include the history and changing nature of computational biomedicine, implications of large initiatives like the Precision Medicine Initiative, and how funders should respond by encouraging global open science and sharing infrastructure and policies.
- The author advocates for creating a "commons" environment to enable finding and reusing shared digital research objects according to FAIR principles in order to advance open collaborative science.
NITRD Big Data Interagency Working Group Workshop: Pioneering the Future of Federally Supported Data Repositories Jan 13, 2021 - Opening comments on where we are and one suggestion of where we might go with an International Data Science Institute (IDSI) - A blue sky view.
The document discusses the vision for data science at the National Institutes of Health (NIH). It outlines the goals of fostering an open ecosystem to enable biomedical research as a digital enterprise. Examples are provided of how precision medicine could benefit patients in the near future through large national research cohorts, improved understanding of diseases like diabetes through genomics, and new technologies. The document also discusses several key elements needed for the digital research enterprise, including communities, policies, infrastructure, and workforce training through initiatives like the Big Data to Knowledge program.
From Where Have We Come & Where Are We GoingPhilip Bourne
This document discusses the past and future of FORCE11, a community dedicated to improving scholarly communication. It notes that since 2011:
- New communities are defined by interests rather than domains
- Open data, identifiers, and data/software citation have emerged
It also discusses challenges like maintaining a biomedical focus and opportunities like engaging other communities and pursuing public-private partnerships. Specific opportunities mentioned include pursuing community funding, gaining traction for preprints in life sciences, and leveraging touchpoints with funders around issues like reproducibility, data management, and sustainability. The document encourages stakeholders to identify and pursue these opportunities to help shape the research ecosystem.
Big Data in Biomedicine – An NIH PerspectivePhilip Bourne
Keynote at the IEEE International Conference on Bioinformatics and Biomedicine, Washington DC, November 10, 2015.
https://cci.drexel.edu/ieeebibm/bibm2015/
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
UKOLN advocates that libraries take seven steps to support data management and open science in the data decade:
1) Provide briefings on cloud data services in partnership with IT services.
2) Build usable data management tools in partnership with researchers.
3) Develop data sustainability strategies and articulate the costs and benefits.
4) Publish case studies on open science to show benefits of universal data sharing.
5) Present at university ethics committees to highlight open data issues.
6) Raise awareness of citizen science opportunities and guidelines for good practice.
7) Promote data citation and attribution to embed in publication practice.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
Formal languages to map Genotype to Phenotype in Natural Genomesmadalladam
The document discusses using formal language theory to model genotype to phenotype (G2P) mappings. It proposes that G2P mappings are non-linear networks rather than linear pathways, and that formal languages could be used to formally represent these networks. Specifically, it suggests using concepts from computational linguistics like context-free grammars, attribute grammars, and semantic actions to parse genetic sequences and compute their phenotypic outcomes. As an example, it presents a context-free grammar for designing genetic constructs and computing their chemical dynamics using an attribute grammar. In summary, formal languages may provide a way to rigorously define the complex non-linear relationships between genotypes and resulting phenotypes.
Comparative sequence studies of the repeat elements in diverse insect species can provide useful information on how to make use of them for developing abundant markers that can be used in those species;
$ At the moment, a total of 8 species are in genome assembly stages and another 35 are in progress for genome sequencing;
$ Different molecular marker systems in the field of entomology are expected to provide new directions to study insect genomes in an unprecedented way in the years to come
The document provides an overview of bioinformatics and examples of how it is used at different biological scales and levels of complexity, from genomics to proteomics to biological networks and systems biology. It discusses how bioinformatics integrates biological data from different sources and scales to offer new biological insights. Examples are given of how bioinformatics is applied to analyze genomic, metagenomic, and proteomic data as well as protein structures and interactions.
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adammadalladam
Defense -- thesis: “Mapping Genotype to Phenotype using Attribute Grammar.”
PhD degree in Genetics, Bioinformatics and Computational Biology (GBCB) in the tracks of Computer Science, Mathematics and Life Sciences.
DNA Markers Techniques for Plant Varietal Identification Senthil Natesan
This document discusses DNA marker techniques for plant varietal identification. It provides background information on the importance of identifying crop varieties at different stages of seed production. While morphological traits can identify varieties, they are influenced by the environment and require a full growing season. The document then discusses various molecular marker techniques like RFLP, PCR, AFLP, SSR, and ISSR that can help with rapid and reliable varietal identification. It also covers the relative importance of markers, the skills and costs required for molecular marker analysis, and considerations for selecting the appropriate marker type.
Bioinformatics is the application of information technology to analyze biological data. This document provides an overview of bioinformatics, including publicly available genome sequences from 1998, promises for applications in medicine and biotechnology, the need for bioinformaticians to analyze growing biological databases, common bioinformatics tasks like sequence analysis and molecular modeling, and important databases like GenBank, SwissProt, and NCBI.
The document summarizes key aspects of genomics and the human genome project. It discusses that the human genome project mapped the human genome through linkage mapping, physical mapping, and DNA sequencing. It was completed in 2003 and found that humans have around 20,000 genes and repetitive non-coding DNA makes up around half the genome. Transposons and retrotransposons are types of repetitive DNA that can move locations within genomes.
Flow Cytometry Training talks - part 1
This forms the first session of the Garvan Flow , Flow Cytometry Training course. this is a 1 1/2 day training course aimed at giving new and experienced researchers a better understanding of cytometry in medical and biological research.
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
This document discusses de novo genome assembly, which is the process of reconstructing long genomic sequences from many short sequencing reads without the aid of a reference genome. It is challenging due to factors like short read lengths, repetitive sequences that complicate the assembly graph, and sequencing errors. The goals of assembly are to produce contiguous sequences with high completeness and correctness by resolving overlaps between reads into consensus sequences. Metrics like N50, core gene content, and read remapping are used to assess assembly quality.
Geared towards bioinformatics students and taking a somewhat humoristic point of view, this presentation explains what bioinformaticians are and what they do.
This document discusses different concepts of genes including:
1. Classical concepts viewed genes as units of heredity, transmission of characters, and mutation.
2. Molecular concepts define genes as the entire nucleic acid sequence required for protein synthesis, including coding and regulatory regions.
3. Genes have a fine structure and can be divided into functional units called cistrons based on complementation testing of mutants.
Bioinformatics emerged from the marriage of computer science and molecular biology to analyze massive amounts of biological data, like that produced by the Human Genome Project. It uses algorithms and techniques from computer science to solve problems in molecular biology, like comparing genomic sequences to understand evolution. As genomic data exploded publicly, bioinformatics was needed to efficiently store, analyze, and make sense of this information, which has applications in molecular medicine, drug development, agriculture, and more.
1. The document discusses some early observations from the Associate Director for Data Science at the National Institutes of Health regarding data at NIH.
2. It notes that NIH does not fully understand how existing data is used, has focused more on why data should be shared rather than how to share it, and lacks plans for long-term sustainability of data.
3. Potential solutions discussed include developing a biomedical commons, modifying the review process, improving education in data science, and expanding the Big Data to Knowledge initiative. The goal is to create a digital research enterprise that better connects all aspects of the research lifecycle.
What Can Happen when Genome Sciences Meets Data Sciences?Philip Bourne
The document discusses the intersection of genome sciences and data sciences. It provides context on data science definitions, relevant examples at NIH, and challenges. The author argues that fully integrating diverse biomedical data sources through open platforms could accelerate research by enabling new discoveries. However, changing entrenched work practices and incentivizing platform use are challenges. The DSI is working to break down silos through collaboration and practical training to help advance open data and digital integration of research workflows.
This document provides an overview of Philip Bourne's early observations and thoughts regarding data management at the NIH. Some of the key points are: 1) Existing data resources are not well understood in terms of how they are used; 2) There is a need to focus on how data will be managed and shared, not just why it should be; 3) There is no NIH-wide sustainability plan for data management; 4) Training in biomedical data science is inconsistent. The document discusses some potential solutions such as establishing a NIH data commons and improving training programs.
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
This document discusses how biomedical research may fundamentally change in the era of big data. It notes that biomedical research has always been data-driven, but the scope, variety, complexity and volume of data is now much greater. It also discusses the need for more open data sharing and new tools and methods for large-scale analysis. The document suggests biomedical research may move towards a more collaborative "platform" model, as seen with companies like Airbnb, with the goal of improving data access, reuse and reproducibility of research. However, overcoming challenges like incentives, trust and work practices will be important for any new platform to succeed.
Data Science in Biomedicine - Where Are We Headed?Philip Bourne
The document discusses the future of data science in biomedicine. It notes that we are currently at a point of deception, where digitization is occurring but disruption has not yet fully happened. It outlines implications such as open collaborative science becoming more important, and data/analytics increasing in scholarly value. Initiatives like the Precision Medicine Initiative and Big Data to Knowledge are aiming to improve research efficiency and enable precision medicine through large datasets and new methodologies. The future will require cooperation across funders and changes to training to address new skills needed.
A Successful Academic Medical Center Must be a Truly Digital EnterprisePhilip Bourne
This document discusses how academic medical centers must become truly digital enterprises to succeed in the future. It outlines how data sharing and use of data analytics will become increasingly important in biomedical research. Academic medical centers will need to improve efficiency, embrace open collaboration, and ensure current training prepares researchers for working with large, diverse data sources. However, balancing accessibility and security of data will also be critical as these digital transformations occur. The implications discussed could shape opportunities, scientific practices, and the value of data and analytics for academic medical institutions.
The Thinking Behind Big Data at the NIHPhilip Bourne
- The document discusses the challenges and opportunities presented by big data in biomedical research. It highlights issues like lack of reproducibility, need for data sharing and standards, and ensuring sustainability of data resources.
- The NIH is organizing various initiatives through the Big Data to Knowledge program to address these issues. This includes developing a biomedical research data commons, training programs, funding for innovation, and modified review processes.
- The goal is to improve data accessibility, support for workflows, relationships with publishers, and metrics to measure reproducibility and reward data sharing. This will help close the research lifecycle loop and advance biomedical discovery.
Philip Bourne presented his viewpoint on the future of open science at an NIAID workshop. He argued that as science becomes more democratized, it will lead to more scrutiny of research, a need for new types of rewards beyond publications and citations, and a removal of artificial boundaries between fields. As an example, he discussed how open science allowed two researchers working in different areas to connect via common data references in their notebooks. Bourne believes this digitization and interconnection of research will accelerate, transforming institutions into digital enterprises where digital assets are identifiable and interoperable. However, fully realizing this vision will require coordinating tools across the research lifecycle through common frameworks and developing new support structures.
Evolution or revolution? The changing data landscapeLizLyon
This document summarizes a presentation on the changing data landscape and challenges of digital information management. It discusses how data sets are becoming core research instruments and potentially the new special collections. It covers perspectives on the increasing scale and complexity of data, as well as challenges regarding storage, incentives, costs and sustainability. It also examines gaps between data policies and practices in areas like data sharing, licensing, ethics and engagement with citizen science.
Why is the NIH investing $100M at the intersection of data science and health research? The NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data. Researchers want visual analytics, and to build the database into a “social network” – being able to “friend” or “like” the data.
Why the food sector needs a research infrastructure on Food and Health Consum...e-ROSA
Bent Egberg Mikkelsen and Karin Zimmermann's presentation at the eROSA Workshop “Towards Open Science in Agriculture & Food”, a side event to High Level conference on FOOD 2030, Plovdiv, Bulgaria (13/6/2018)
Towards Biomedical Research as a Digital EnterprisePhilip Bourne
Philip Bourne outlines his vision of transforming biomedical research into a digital enterprise by making data and other digital assets more open, interoperable, and accessible across boundaries through initiatives like the NIH's Big Data to Knowledge initiative; this would help address issues like the slow pace of discovery and non-reproducibility of research by better connecting scientists and their work.
The document summarizes a panel discussion on data sharing featuring the executive directors of PCORI and NIH who discussed their organizations' efforts to build large clinical research networks and promote genomic and clinical data sharing. They addressed challenges around data standards, privacy, and incentivizing data sharing and publication of results. The associate director for data science at NIH then outlined plans to develop a biomedical research data commons to enable discovery and innovation through open data access and analytics tools.
From Research to Practice - New Models for Data-sharing and Collaboration to ...Health Data Consortium
Watch the webinar here: http://encore.meetingbridge.com/MB005418/140528/
Webinar transcript: http://hdc.membershipsoftware.org/Files/webinars/HDC-PwC%20NIH%20&%20PCORI%20Webinar%20Transcript%205_28_14.pdf
Patient-Centered Outcomes Research Institute (PCORI) Executive Director Joe Selby, MD, MPH; National Institutes of Health (NIH) Director and PCORI Board of Governors member Francis Collins, MD, PhD; and NIH Associate Director for Data Science Philip Bourne, PhD discussed new and emerging trends in big data for health, including:
- How researchers, patients, clinicians, and others are forging new models for data-sharing.
- Leveraging the quantity, variety, and analytic potential of health-related data for research and practice.
- Addressing patients’ perspectives, needs, and concerns in creating new opportunities for innovation and translational science.
- Exciting initiatives such as PCORnet, the National Patient-Centered Clinical Research Network initiative that PCORI is now helping to develop, and related open data and technology efforts such - as the NIH Health Systems Collaboratory and Big Data to Knowledge (BD2K) initiative.
Discover more health data resources on our website at http://www.healthdataconsortium.org/
Ginny Pannabecker, Life Science & Scholarly Communications Librarian at Virginia Tech, is an ACRL Science and Technology Section (STS) liaison to the American Institute of Biological Sciences (AIBS). This presentation shares key points for librarians and researchers from an AIBS workshop on "Changing Practices in Data Publications," which took place in December 2014 and involved representatives from federal funding agencies; publishers and librarians; scientific societies and journals; and data services / providers.
This document summarizes the agenda and discussions from the Genetic Engineering & Society Center's Internal Advisory Committee Meeting on August 30, 2017. The meeting included student presentations on various graduate research projects, as well as highlights from staff members and faculty. Staff highlights included an open philanthropy grant to promote biosafety practices in DIY biology labs and thoughts on redesigning the Center's communications materials. Faculty highlights included public engagement efforts for projects on synthetic biology and gene drives funded by DARPA and NSF/USDA. The discussion section focused on planning a future art and science exhibit.
Funding agencies are instituting requirements for data management and sharing as a condition of receiving research funds. This presentation addresses why researchers should care about research data management, what libraries have to do with it, and a case study of what one research specialist at the University of Colorado Anschutz Medical Campus is doing in this area.
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
This document discusses the changing landscape of data science and AI in biomedicine. Some key points:
- We are at a tipping point where data science is becoming a driver of biomedical research rather than just a tool. Biomedical researchers need to become data scientists.
- Data science is interdisciplinary and touches every field due to the rise of digital data. It requires openness, translation of findings, and consideration of responsibilities like algorithmic bias.
- Advances like AlphaFold2 show the power of large collaborative efforts combining data, computing resources, engineering, and domain expertise. This points to the need for public-private partnerships and new models of open data sharing.
- The definition of
AI in Medical Education A Meta View to Start a ConversationPhilip Bourne
- AI has the potential to significantly impact medical education and healthcare.
- Chatbots and large language models can provide a rich training ground for students to learn, while augmented reality may change the student-patient dynamic.
- AI tools like predictive analytics and imaging analysis can assist in research, diagnosis, and personalized treatment, but models are still limited and education of implications is needed.
- If developed responsibly with oversight, AI could help democratize healthcare and create new industries, but history shows technology disruptions can also lead to deception if misused. The impacts and timeline of AI in medicine remain uncertain.
AI+ Now and Then How Did We Get Here And Where Are We GoingPhilip Bourne
The document discusses the past, present, and future of artificial intelligence (AI). It describes how AI has advanced due to increases in data and improvements in algorithms and computing technology. An example of AI, ChatGPT, is discussed as using large language models, pre-training, and transformers to generate language. The future of AI is uncertain but could involve neural networks that mimic the brain more closely. AI may disrupt many industries like education and research in the coming years or decades through forces of digitization, disruption, and other factors. The impacts and timeline of AI progress are difficult to predict precisely.
Thoughts on Biological Data SustainabilityPhilip Bourne
This document discusses approaches to improving biological data sustainability. It proposes moving from the current BDS 1.0 model to a BDS 2.0 model. BDS 1.0 is characterized by increasing data and costs but decreasing funds for innovation. BDS 2.0 would recognize the monetary value of data and embrace public-private partnerships and a data economy. It suggests a "data credits" system where data curation is a service with monetary value. The document provides examples of how this could work for the Protein Data Bank (PDB) and more globally. It argues BDS 2.0 could encourage competition, globalization, and private sector engagement to better foster sustainable and FAIR biological data.
The document discusses FAIR data and its importance. FAIR stands for Findable, Accessible, Interoperable, and Reusable. The author argues that data science is becoming a major driver in many fields due to the large amounts of digital data being created. For data and data science to reach their full potential, data needs to be FAIR so it can be easily discovered, accessed, integrated and reused. An example is given of a researcher combining health and vehicle crash data using techniques from data science to improve emergency care. Making data FAIR enables greater collaboration, public-private partnerships and opportunities for translation.
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
Data science is driving major changes in biomedical research by enabling new types of integrative, multi-scale analyses. However, biomedical research may no longer lead data science due to a lack of comprehensive data infrastructure and cultural barriers. Responsible data science that balances openness, ethics, and benefiting patients could help establish biomedicine's continued leadership role. Major challenges include limited resources, attracting diverse talent, and prioritizing strategic initiatives over conforming to traditional models of research.
Presented online as part of the NASM series in Advancing Drug Discovery see https://www.nationalacademies.org/event/40883_09-2023_advancing-drug-discovery-data-science-meets-drug-discovery
Biomedical Data Science: We Are Not AlonePhilip Bourne
This document discusses biomedical data science and the opportunities and challenges presented by new developments in data science. Some key points:
- We are at a tipping point where biomedical research is no longer the sole leader in data science due to advances in many other fields. Biomedical researchers need to become data scientists to stay relevant.
- Data science is being driven by the massive growth of digital data and requires an interdisciplinary approach. It is touching every field and attracting many students.
- Developing effective data systems and infrastructure is a major challenge to enable open sharing and analysis of data. Initiatives are underway but more collaboration is needed across sectors.
- Advances in machine learning, like Alpha
BIMS7100-2023. Social Responsibility in ResearchPhilip Bourne
Social responsibility in research refers to conducting studies that benefit society while avoiding harm. It involves considering risks and benefits to human and animal subjects, ensuring transparency and integrity, and engaging stakeholders. Socially responsible research also addresses equity, diversity and inclusion. Data sharing is an important aspect of social responsibility, as it enables reproducibility and collaborative research. However, data must be shared in a FAIR manner and maintained over time to realize its full benefits. Researchers should consider social responsibility throughout the entire research lifecycle.
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
This document provides an overview of data science from the perspective of Philip Bourne. Some key points:
- Data science is disruptive to higher education and all disciplines are being impacted by large amounts of digital data.
- Data science can be defined using a 4+1 model focusing on value, design, systems, analytics, and practice.
- Principles of excellence, inclusivity, openness, and fairness should guide data science work.
- Lessons from advances in computational biology and AlphaFold2 show the importance of open data, collaboration, and engineering challenges.
- A data science school should focus on responsible data practices while balancing open research that benefits patients.
The document provides an overview of the School of Data Science at the University of Virginia and its approach to collaborating with Novo Nordisk on diabetes research. It discusses that the School of Data Science aims to catalyze discovery through interdisciplinary research, educate a diverse workforce, and serve the community by applying data science. It also provides examples of using artificial intelligence to recognize patterns related to diabetes and details potential areas of collaboration between the School and Novo Nordisk, including student projects, visiting fellows, faculty partnerships, and PhD mentorship.
Towards a US Open research Commons (ORC)Philip Bourne
On August 2nd, 2021, US scientists and officials met to discuss establishing a US Open Research Commons (ORC) to make research data and computing resources more accessible and interoperable across public and private sectors. Currently, US resources are siloed and limited in discoverability. Other countries have established similar initiatives that the US is not formally represented in. An ORC could pool resources to benefit a more diverse group of researchers in addressing societal challenges, but establishing one requires overcoming cultural and institutional barriers between agencies through policy leadership. Immediate action is needed for the US to remain competitive in open science.
This document discusses opportunities for precision education arising from the move to digital education during the COVID pandemic. It notes that for the first time, essentially all educational materials were digital, creating opportunities to make content findable, accessible, interoperable, and reusable. This could improve content quality through transparency and ratings similar to academic publishing. It also enables aggregated views of content and student performance, improved content and syllabi, and recommender systems. Challenges include issues around content ownership, sharing rules, bias, privacy, security, and adoption of next generation learning management systems.
Philip Bourne presented on how data can advance sustainability. He discussed how high throughput DNA digital data changed biomedicine and spawned the new field of data science. Data science now touches all domains, including helping achieve UN Goal 10 of reducing inequalities through projects like using data to understand the history of Native American displacement. While data presents endless opportunities, it also has weaknesses like being messy and non-conclusive, and threats like bias and lack of training. Bourne advocates for building trust through evidence and creating an open data environment to realize data's potential, while acknowledging that sustaining open data faces challenges around proprietary concerns, security, and driving social change.
Frontiers of Computing at the Cellular and Molecular ScalesPhilip Bourne
3 basic points when establishing a new biomedical initiative. Presented at Frontiers of Computing in Health and Society, George Mason University, September 21, 2021.
The document discusses the importance of social responsibility in research. It makes three key points:
1. Research should maximize benefit to society by making findings accessible and usable by the public who funds the research.
2. Under certain conditions like privacy, all research should be openly accessible so others can build upon it.
3. Most research data is lost within 10-15 years of publication according to studies, highlighting the need for open data standards to ensure long-term availability.
The chapter Lifelines of National Economy in Class 10 Geography focuses on the various modes of transportation and communication that play a vital role in the economic development of a country. These lifelines are crucial for the movement of goods, services, and people, thereby connecting different regions and promoting economic activities.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
Walmart Business+ and Spark Good for Nonprofits.pdf
Bioinformatics in the Era of Open Science and Big Data
1. Bioinformatics in the Era of
Open Science and Big Data
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
1/28/14
SIB Biel/Bienne
1
2. My Bias
• RCSB PDB/IEDB Database Developer – Views on
community, quality, sustainability …
• PLOS Journal Co-founder – Open Science Advocate
• Associate Vice Chancellor for Innovation – Business
models, interaction with the private
sector, sustainability
• Professor – Mentoring, reward system, value (or not) of
research
• Associate Director of NIH for Data Science - ??
1/28/14
SIB Biel/Bienne
2
3. The History of Bioinformatics
According to Bourne
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
1980s
1990s
2000s
2010s
2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service
A Partner
A Driver
The Raw Material:
Non-existent
Limited /Poor
More/Ontologies
Big Data/Siloed Open/Integrated
The People:
No name
1/28/14
Technicians
Industry recognition data scientists
SIB Biel/Bienne
Academics
3
4. We Need to Start By Asking How Are
We Using the Data Now!
Only Then Can We Make Rational
Decisions About Data – Large or Small
1/28/14
SIB Biel/Bienne
4
5. Web Logs etc. Are
Not Enough
Structure Summary page activity for
H1N1 Influenza related structures
Jan. 2008
Jul. 2008
Jan. 2009
Jul. 2009
Jan. 2010
Jul. 2010
3B7E: Neuraminidase of A/Brevig Mission/1/1918
H1N1 strain in complex with zanamivir
1RUZ: 1918 H1 Hemagglutinin
1/28/14
5
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
SIB Biel/Bienne
[Andreas Prlic]
6. We Need to Learn from Industries
Whose Livelihood Addresses the
Question of Use
1/28/14
SIB Biel/Bienne
6
7. Next Consider What We Do Every Day
We take actions on digital data
increasingly across boundaries
1/28/14
SIB Biel/Bienne
7
8. Actions on Data Implies:
•
•
•
•
•
•
•
•
•
Insuring data quality and hence trust
Making data sustainable
Making data open and accessible
Making data findable
Providing suitable metadata and annotation
Making data queryable
Making data analyzable
Presenting data as to maximize its value
Rewarding good data practices
1/28/14
SIB Biel/Bienne
8
9. Actions on Data Implies:
•
•
•
•
•
•
•
•
•
Insuring data quality and hence trust
Making data sustainable
Making data open and accessible
Making data findable
Providing suitable metadata and annotation
Making data queryable
Making data analyzable
Presenting data as to maximize its value
Rewarding good data practices
1/28/14
SIB Biel/Bienne
9
10. Boundaries on Data Implies:
• Working across biological scales
• Working across biomedical disciplines
• Working across basic and clinical research and
practice
• Working across institutional boundaries
• Working across public and private sectors
• Working across national and international
borders
• Working across funding agencies
1/28/14
SIB Biel/Bienne
10
11. Boundaries on Data Implies:
• Working across biological scales
• Working across biomedical disciplines
• Working across basic and clinical research and
practice
• Working across institutional boundaries
• Working across public and private sectors
• Working across national and international
borders
• Working across funding agencies
1/28/14
SIB Biel/Bienne
11
12. These Issues Have Been Around
Almost As Long As Bioinformatics
The Good News is That “Big Data” Has
Bought More Attention to the Problem
1/28/14
SIB Biel/Bienne
12
13. What Are Big Data?
• Large datasets from high throughput
experiments
• Large numbers of small datasets
• Data which are “ill-formed”
• The why (causality) is replaced by the what
• A signal that a fundamental change is taking
place – a tipping point?
1/28/14
SIB Biel/Bienne
13
14. That Change is Embodied in:
The Digital Enterprise
• Consists of digital assets
• E.g. datasets, papers, software, lab notes
• Each asset is uniquely identified and has
provenance, including access control
• E.g. publishing simply involves changing the
access control
• Digital assets are interoperable across the
enterprise
1/28/14
SIB Biel/Bienne
14
15. The Enterprise Is Almost Anything..
Your Lab, your Institution, the SIB,
the NIH….
1/28/14
SIB Biel/Bienne
15
16. Consider an Academic Institution As A
Digital Enterprise
•
Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors,
whose research profiles are on-line and well described, are automatically notified of Jane’s
potential based on a computer analysis of her scores against the background interests of the
neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research
rotation. During the rotation she enters details of her experiments related to understanding a
widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line
research space – an institutional resource where stakeholders provide metadata, including access
rights and provenance beyond that available in a commercial offering. According to Jane’s
preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a
graduate student in the chemistry department whose notebook reveals he is working on using
bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a
number of times in their notes, which is of interest to two very different disciplines – neurology and
environmental sciences. In the analog academic health center they would never have discovered
each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage.
The collaboration results in the discovery of a homologous human gene product as a putative target
in treating the neurodegenerative disorder. A new chemical entity is developed and patented.
Accordingly, by automatically matching details of the innovation with biotech companies worldwide
that might have potential interest, a licensee is found. The licensee hires Jack to continue working
on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the
license. The research continues and leads to a federal grant award. The students are employed,
further research is supported and in time societal benefit arises from the technology.
From What Big Data Means to Me JAMIA 2014
1/28/14
SIB Biel/Bienne
16
17. The NIH is Starting to Think About the
Digital Enterprise, Witness…
bd2k.nih.gov
1/28/14
SIB Biel/Bienne
17
18. What Defines the Digital Enterprise
•
•
•
•
•
•
•
Trans-NIH collaboration – change culture
Long-term NIH strategic planning
The BD2K Initiative
A “hub” of data science activities
International cooperation
Interagency cooperation
Data sharing policies
1/28/14
SIB Biel/Bienne
18
19. Consider One NIH Scenario
• NIH-Drive
– Investigator A from the NCI makes frequent
reference to the over expression of genes x and y.
– Investigator B from the NHLBI makes frequent
reference to the under expression of genes x and y
– Automatic notification of a potential common
interest before publication or database deposition
1/28/14
SIB Biel/Bienne
19
20. The NIH Process
An external advisory group provided a
valuable blueprint for what should be
done
http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf
1/28/14
SIB Biel/Bienne
20
21. Blueprint Recommendations
• Promote central and federated catalogs
– Establish minimal metadata framework
– Tools to facilitate data sharing
– Elaborate on existing data sharing policies
• Support methods and applications
– Fund all phases of software development
– Leverage lessons from National Centers
• Training
– More funding
– Enhance review of training apps
– Quantitative component to all awards
• On campus IT strategic plan
– Catalog of existing tools
– Informatics laboratory
– Ditto big data
• Sustainable funding commitment
1/28/14
SIB Biel/Bienne
acd.od.nih.gov/diwg.htm
21
22. General Features of NIH Data Science
• Lightweight metadata standards
• Data & software registries
• Expanded policies on data sharing, open
source software
• Training programs & reward systems
• Institutional incentives
• Private sector incentives
• Data centers serving community needs
1/28/14
SIB Biel/Bienne
22
23. What is Under Way?
• Now:
–
–
–
–
–
Data centers (under review)
Data science training grants (call Q1 14)
Pilot data catalog consortium (call out)
Genomic Research Data Alliance (being finalized)
Piloting “NIH-drive
• What Is Planned:
– Extended public-private programs specifically for data science
activities
– Interagency activities
– International exchange programs
– Cold Spring Harbor-like training facilities – by-coastal?
– Programs for better data descriptions
– Reward institutions/communities
– Policies to get clinical trial data into the public domain
1/28/14
SIB Biel/Bienne
23
24. The History of Bioinformatics
According to PEB
The Roots in Bioinformatics Series PLOS Comp Biol
1980s
1990s
2000s
2010s
2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service
A Partner
Driver
The Raw Material:
Non-existent
Limited /Poor
More/Ontologies
Big Data/Siloed Open/Integrated
The People:
No name
1/28/14
Technicians
Industry recognition data scientists
SIB Biel/Bienne
Academics
24
25. Why Will Science Become More Open?
• The public (and hence the politicians demand
it)
• Its the right thing to do
• Its part of the modern psyche
• The scholarly enterprise is broken and more
stakeholders are acknowledging it
1/28/14
SIB Biel/Bienne
25
26. Personal Evidence
• I have a paper with 16,000 citations that no
one has ever read
• I have papers in PLOS ONE that have more
citations than ones in PNAS
• I have data sets I am proud of but no place to
put them
• I “cant” reproduce work from my own lab
1/28/14
SIB Biel/Bienne
26
27. Politicians Demand It:
G8 open data charter
1/28/14
SIB Biel/Bienne
http://opensource.com/government/13/7/open-data-charter-g8 27
28. What Are Some of the Ramifications of
Open Science?
1/28/14
SIB Biel/Bienne
28
29. Open Science Has The Potential to
Deinstitutionalize
Daniel Hulshizer/Associated Press
1/28/14
SIB Biel/Bienne
29
30. Open Science Has The Potential to
Deinstitutionalize
Daniel Hulshizer/Associated Press
1/28/14
SIB Biel/Bienne
30
31. An Example of That Potential:
The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
1/28/14
SIB Biel/Bienne
31
32. Open Science Has The Potential to
Deinstitutionalize
Daniel Hulshizer/Associated Press
1/28/14
SIB Biel/Bienne
32
33. Open Science Has The Potential to
Deinstitutionalize
Daniel Hulshizer/Associated Press
1/28/14
SIB Biel/Bienne
33
34. There Still Needs to be a Reward System
The Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that
relate to the journal that are
missing of stubs
Develop a Wikipedia page in the
sandbox
Have a Topic Page Editor Review
the page
Publish the copy of record with
associated rewards
Release the living version into
Wikipedia
1/28/14
SIB Biel/Bienne
34
35. One Possible End Product of Open
Science
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
2.
1/28/14
3. A composite view of
journal and database
content results
3.
2. Clicking the paper figure retrieves
data from the PDB which is
analyzedSIB Biel/Bienne
1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
PLoS Comp. Biol. 2005 1(3) e34
35
36. Change in the Way we Support the
Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Repositories
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Data Journals
New Reward
Systems
Training
Institutional Repositories
1/28/14
SIB Biel/Bienne
Commercial Repositories
36
37. Change in the Way we Support the
Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Repositories
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Data Journals
New Reward
Systems
Training
Institutional Repositories
1/28/14
SIB Biel/Bienne
Commercial Repositories
37
38. automate: workflows, pipeline &
service integrative frameworks
CS
SE
pool, share & collaborate web
systems
scientific software
engineering
semantics & ontologies
machine readable documentation
nanopub
1/28/14
[Carole Goble]
SIB Biel/Bienne
38
39. Why is This Important to Me
Personally?
• My wife is being treated for stage 1 breast
cancer
• This highlights for me the disparity
between what is happening in the lab and
what is happening in the clinic
– In the lab cancer is a personalized and
treatable condition
– In the clinic we are still equally “poisoning”
patients with drugs first introduced 10-20
years ago
1/28/14
SIB Biel/Bienne
39
42. Most Laboratories
• We are the long tail
• Goodbye to the student is
goodbye to the data
• Very few of us have
complied (or will comply
with the data
management plans we
write into grants)
• Too much software is
unusable
S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack
Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136
1/28/14
SIB Biel/Bienne
42
43. Today’s Research Lifecycle is Digitally
Fragmented at Best
• Proof:
– I cant immediately reproduce the research in
my own laboratory
• It took an estimated 280 hours for an average user
to approximately reproduce the paper
– Workflows are maturing and becoming helpful
– Data and software versions and accessibility
prevent exact reproducability
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
1/28/14
SIB Biel/Bienne
43
44. We Have Some Really Big Problems to
Solve – The Commons Can Help
1/28/14
SIB Biel/Bienne
44
45. What Really Happens When You Take a
Drug?
• Can we predict drug efficacy and toxicity?
• Can we reuse old drugs?
• Can we design personalized medicines?
1/28/14
SIB Biel/Bienne
45
46. One Drug, One Gene, One Disease
Bernard M. Nat Rev Drug Disc 8(2009), 959-968
1/28/14
SIB Biel/Bienne
46
47. Polypharmacology
• Tykerb – Breast cancer
• Gleevac – Leukemia, GI
cancers
• Nexavar – Kidney and liver
cancer
• Staurosporine – natural product
– alkaloid – uses many e.g.,
antifungal antihypertensive
Collins and Workman 2006 Nature Chemical Biology 2 689-700
1/28/14
SIB Biel/Bienne
47
48. Polypharmacology is Not Rare but Common
• Single gene knockouts only
affect phenotype in 10-20% of
cases
A.L. Hopkins Nat. Chem. Biol. 2008 4:682-690
• 35% of biologically active
compounds bind to two or
more targets that do not have
similar sequences or global
shapes
Paolini et al. Nat. Biotechnol. 2006 24:805–815
Predict side effects
Repurpose drugs
Kaiser et al. Nature 462 (2009) 175-81
1/28/14
SIB Biel/Bienne
48
49. Drug Binding is Dynamic
• Drug effect dependents on
not only how strong (binding
affinity) but also how long the
drug is “stuck” in the protein
(residence time).
• Molecular Dynamics (MD)
simulation is powerful but
computationally intensive.
~ns
1 day simulation
~ms – hours
>106 days
D. Huang et al. (2011), PLoS Comp Biol 7(2):e1002002
1/28/14
SIB Biel/Bienne
49
51. Multiscale Modeling of Drug
Actions
Understanding of
dynamics and
kinetics of proteinligand interactions
Traditional
Approach
Knowledge
representation
and discovery &
model integration
1/28/14
Slide from Lei Xie
Prediction of molecular
interaction network on
a genome scale
physiological process
Systems-based
Approach
SIB Biel/Bienne
Reconstruction,
analysis and
simulation of
biological networks
51
52. More Generally Any Translationalbased Research That Involves
Modeling at Multiple Scales
http://sagebase.org/
1/28/14
SIB Biel/Bienne
52
53. The History of Bioinformatics
According to Bourne
The Roots in Bioinformatics Series PLOS Comp Biol
1980s
1990s
2000s
2010s
2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service
A Partner
A Driver
The Raw Material:
Non-existent
Limited /Poor
More/Ontologies
Big Data/Siloed Open/Integrated
The People:
No name
1/28/14
Technicians
Industry recognition data scientists
SIB Biel/Bienne
Academics
53
54. In Summary:
By the End of the Decade Biomedical
Research will Be a Truly Digital
Enterprise and Computational
Scientists Will Be At the Forefront
You Have Much to Look Forward Too
1/28/14
SIB Biel/Bienne
54