Microsoft invested mlassively into advanced clinical genomics services in the public cloud Azure. This is a healthcare revolution allowing to create new treatment based on better Genomic information
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
The document discusses lessons learned from Bionimbus, a petabyte-scale science cloud service provider. Some key points:
- Bionimbus provides cloud-based storage and computing resources for genomic and biomedical research projects dealing with large datasets, such as sequencing a million genomes which would generate around 1 exabyte of data.
- Lessons from operating at this scale include how to effectively store, manage, analyze, and share extremely large datasets across many users and projects in an open science environment.
- The growth of "big data" sciences like genomics poses challenges around data volume but also opportunities to drive new scientific discoveries through large-scale genomic analysis.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
The National Resource for Network Biology (NRNB) 2011 annual report highlights several accomplishments:
1) NRNB research studies identified correlated genotypes in friendship networks and genetic interactions underlying DNA damage responses.
2) Collaborations made progress mapping the yeast genetic interaction network and connecting networks to disease.
3) Cytoscape 3.0 development is on track to improve the software's modularity, APIs, and plugin architecture.
4) In its first year, NRNB established support services, research projects, and over 30 collaborations to advance network biology.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
The document discusses three main data domains in modern systems biology: 1) experimental data from genomics experiments like microarrays, 2) existing biological knowledge, and 3) genetics data from sequencing. It notes that large amounts of experimental and genetic data have become available in recent years but are difficult to understand without integrating existing biological knowledge. The document then provides examples of how pathways can be used to integrate different types of 'omics data and existing knowledge to better understand experimental results.
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
The document discusses lessons learned from Bionimbus, a petabyte-scale science cloud service provider. Some key points:
- Bionimbus provides cloud-based storage and computing resources for genomic and biomedical research projects dealing with large datasets, such as sequencing a million genomes which would generate around 1 exabyte of data.
- Lessons from operating at this scale include how to effectively store, manage, analyze, and share extremely large datasets across many users and projects in an open science environment.
- The growth of "big data" sciences like genomics poses challenges around data volume but also opportunities to drive new scientific discoveries through large-scale genomic analysis.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
The National Resource for Network Biology (NRNB) 2011 annual report highlights several accomplishments:
1) NRNB research studies identified correlated genotypes in friendship networks and genetic interactions underlying DNA damage responses.
2) Collaborations made progress mapping the yeast genetic interaction network and connecting networks to disease.
3) Cytoscape 3.0 development is on track to improve the software's modularity, APIs, and plugin architecture.
4) In its first year, NRNB established support services, research projects, and over 30 collaborations to advance network biology.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
The document discusses three main data domains in modern systems biology: 1) experimental data from genomics experiments like microarrays, 2) existing biological knowledge, and 3) genetics data from sequencing. It notes that large amounts of experimental and genetic data have become available in recent years but are difficult to understand without integrating existing biological knowledge. The document then provides examples of how pathways can be used to integrate different types of 'omics data and existing knowledge to better understand experimental results.
This document provides a summary of the 2013 annual progress report for the National Resource for Network Biology (NRNB). It describes the NRNB network in 2013, including personnel from various institutions collaborating on projects. It also summarizes the findings of the NRNB External Advisory Committee meeting in December 2012. The Committee found that the NRNB had made strong progress in developing new network analysis tools and demonstrating their value. They provided suggestions to optimize projects, measure success beyond citations, and prepare for renewal. Specific projects like TRD C on network-extracted ontologies were praised for progress. Cytoscape 3.0 development was also discussed.
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
The document summarizes the accomplishments of the National Resource for Network Biology (NRNB) over the past year, including:
- Over 100 publications citing NRNB funding and high usage of Cytoscape tools
- 18 supported tools, 93 collaborations, and training of over 100 users
- Progress on developing algorithms for differential network analysis, predictive networks, and multi-scale networks
- Launch of two new NRNB workgroups on single cell genomics and patient similarity networks
- 18 new collaboration projects in areas like cancer, neuroinflammation, and drug transporters
The National Resource for Network Biology (NRNB) aims to advance network biology science through bioinformatic methods, software, infrastructure, collaboration, and training. In the past year, the NRNB made progress in its specific aims, including developing new network analysis methods, catalyzing changes in network representation, establishing software and databases, engaging in collaborations, and providing training opportunities. Going forward, the NRNB plans to further develop methods for differential and predictive network analysis, multi-scale network representation, and pathway analysis tools.
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET
This document discusses harnessing quantitative systems pharmacology to deliver personalized medicine through the Microphysiology Systems Database (MPS-Db). The MPS-Db is presented as a solution for aggregating, analyzing, sharing, and modeling in vitro data from microphysiology systems (MPS) to accelerate drug development and implementation of personalized medicine approaches. Key features and capabilities of the MPS-Db are described, including supporting various model types, integrating with other databases, and performing data analysis, reproducibility evaluation, and computational modeling. Commercial and non-profit versions are discussed.
Machine learning has many applications and opportunities in biology, though also faces challenges. It can be used for tasks like disease detection from medical images. Deep learning models like convolutional neural networks have achieved performance exceeding human experts in detecting pneumonia from chest X-rays. Frameworks like DeepChem apply deep learning to problems in drug discovery, while platforms like Open Targets integrate data on drug targets and their relationships to diseases. Overall, machine learning shows promise for advancing biological research, though developing expertise through learning resources and implementing models to solve real-world problems is important.
Ranadip Pal and his team won a prize in 2012 for using multiple types of 'omics data to more successfully predict drug sensitivity than other teams. Pal then improved on this model by combining five types of data, reducing uncertainty by over 40%. Scientists are increasingly seeing the value of integrated omics analyses to better understand cellular processes. A company called General Metabolics offers integrated omics services to analyze multiple data types from cell samples together, providing insights that individual analyses cannot. This allows researchers to gain a clearer understanding of cellular physiology.
The National Resource for Network Biology aims to provide freely available, open-source software tools to enable researchers to assemble biological data into networks and pathways and use these networks to better understand biological systems and disease; it pursues this mission through technology research and development projects, driving biological projects, collaboration and service projects, training, and dissemination; key components include the Cytoscape software platform, supercomputing infrastructure, and partnerships with over 30 external research groups.
This document describes a new type of heatmap called a "CoolMap" that allows for flexible multi-scale exploration of molecular network data. CoolMaps allow data to be collapsed and aggregated at different levels of a hierarchical tree, enabling visualization and pattern discovery across scales. This approach addresses limitations of conventional heatmaps and enables linking data to existing biological knowledge. Several case studies demonstrate how CoolMaps can provide new insights into gene expression, nutrition, DNA methylation, glucose monitoring, and network data. The core concepts and near-ready software releases are presented, along with acknowledgments.
This document summarizes the analysis of multi-omics data from a study of lung squamous cell carcinoma. The analysis integrated genomic, transcriptomic, clinical and phenotypic data from the study using various bioinformatics tools. Key findings included the identification of genes associated with patient survival outcomes and the stratification of patients into prognosis groups. Differential expression and mutational analysis revealed molecular differences between the basaloid and squamous cell carcinoma subtypes. Integration across molecular levels identified genes whose expression changed due to mutations. Pathway and network analysis provided functional insight. Integration with metabolic models highlighted potentially druggable targets. The multi-omics, multi-scale analysis framework generated novel biological insights from the data.
- The document discusses various approaches for applying machine learning and artificial intelligence to drug discovery.
- It describes how molecules and proteins can be represented as graphs, fingerprints, or sequences to be used as input for models.
- Different tasks in drug discovery like target binding prediction, generative design of new molecules, and drug repurposing are framed as questions that AI models can aim to answer.
- Techniques discussed include graph neural networks, reinforcement learning, and conditional generation using techniques like translation models.
- Several recent works applying these approaches for tasks like predicting drug-target interactions and generating synthesizable molecules are referenced.
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Tom Connor
Introducing the HPC challenges associated with developing a set of clinical microbial genomics services in the NHS in Wales. Demonstrating the potential of these technologies, and the impact it is already having for the patients of the Welsh NHS.
The document discusses the growing role of data sharing communities and standards in genomics research. It outlines key paradigm shifts from studying individual genomes to pangenomes and from culturable to unculturable organisms. It highlights the increasing data volumes from large sequencing projects and metagenomics. It emphasizes that community agreement on data stewardship and standards is needed to fully leverage these genomic data resources through the work of groups like the Genomic Standards Consortium.
Presentation Alliance of European Life Sciences Law Firms
(Julian Hitchcock and Sofie van der Meulen) on legal aspects of big data in pharma. Topics: privacy, IP, medical devices and IVD.
This document discusses computational tools for analyzing biological networks and molecules. It begins by describing how networks can be inferred from molecular data using the Cytoscape platform and its apps. It then discusses how networks can help identify proteins of interest by describing a case study where network analysis identified additional proteins that provided an "explanation" for differences observed in proteomic data. The document concludes by discussing how computational analysis can help address questions that go beyond simply identifying the best network, such as which parts of a network are well-supported or could be modified to better fit the data.
Tijana Milenković is an assistant professor who develops algorithms for network alignment and mining of biological networks. Her lab has developed methods like GRAAL, H-GRAAL, and MAGNA for mapping similar nodes between networks to transfer knowledge across species. MAGNA directly optimizes edge conservation during alignment to improve accuracy. The lab has also applied network alignment to study aging networks and predict novel aging genes, and developed tools for dynamic network analysis and de-noising networks via link prediction.
In recent years there has been a dramatic increase in the number of freely accessible online databases serving the chemistry community. The internet provides chemistry data that can be used for data-mining, for computer models, and integration into systems to aid drug discovery. There is however a responsibility to ensure that the data are high quality to ensure that time is not wasted in erroneous searches, that models are underpinned by accurate data and that improved discoverability of online resources is not marred by incorrect data. In this article we provide an overview of some of the experiences of the authors using online chemical compound databases, critique the approaches taken to assemble data and we suggest approaches to deliver definitive reference data sources.
Machine learning techniques are needed to analyze the vast amounts of genomic and clinical data being generated through techniques like next-generation sequencing and electronic medical records. This data holds promise for advancing cancer research, but traditional methods cannot handle the scale and complexity of big data. Machine learning algorithms can extract useful insights from these large datasets to help diagnose, treat, and cure cancer patients.
Presentation "The Impact of All Data on Healthcare"
Keith Perry
Associate VP & Deputy CIO
UT MD Anderson Cancer Center
With continuing advancement in both technology and medicine, the drive is on to make all data meaningful to drive medical discovery and create actionable outcomes. With tools and capabilities to capture more data than ever before, the challenge becomes linking existing structured and unstructured clinical data with genomic data to increase the industry’s analytical footprint.
Learning Objectives:
∙ Discuss the need to make all data meaningful in order to speed discovery of new knowledge
∙ Provide examples of an analytical direction that supports evolution in medicine
∙ Expose the challenges facing the industry with respect to ~omits
This document provides a summary of the 2013 annual progress report for the National Resource for Network Biology (NRNB). It describes the NRNB network in 2013, including personnel from various institutions collaborating on projects. It also summarizes the findings of the NRNB External Advisory Committee meeting in December 2012. The Committee found that the NRNB had made strong progress in developing new network analysis tools and demonstrating their value. They provided suggestions to optimize projects, measure success beyond citations, and prepare for renewal. Specific projects like TRD C on network-extracted ontologies were praised for progress. Cytoscape 3.0 development was also discussed.
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
The document summarizes the accomplishments of the National Resource for Network Biology (NRNB) over the past year, including:
- Over 100 publications citing NRNB funding and high usage of Cytoscape tools
- 18 supported tools, 93 collaborations, and training of over 100 users
- Progress on developing algorithms for differential network analysis, predictive networks, and multi-scale networks
- Launch of two new NRNB workgroups on single cell genomics and patient similarity networks
- 18 new collaboration projects in areas like cancer, neuroinflammation, and drug transporters
The National Resource for Network Biology (NRNB) aims to advance network biology science through bioinformatic methods, software, infrastructure, collaboration, and training. In the past year, the NRNB made progress in its specific aims, including developing new network analysis methods, catalyzing changes in network representation, establishing software and databases, engaging in collaborations, and providing training opportunities. Going forward, the NRNB plans to further develop methods for differential and predictive network analysis, multi-scale network representation, and pathway analysis tools.
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET
This document discusses harnessing quantitative systems pharmacology to deliver personalized medicine through the Microphysiology Systems Database (MPS-Db). The MPS-Db is presented as a solution for aggregating, analyzing, sharing, and modeling in vitro data from microphysiology systems (MPS) to accelerate drug development and implementation of personalized medicine approaches. Key features and capabilities of the MPS-Db are described, including supporting various model types, integrating with other databases, and performing data analysis, reproducibility evaluation, and computational modeling. Commercial and non-profit versions are discussed.
Machine learning has many applications and opportunities in biology, though also faces challenges. It can be used for tasks like disease detection from medical images. Deep learning models like convolutional neural networks have achieved performance exceeding human experts in detecting pneumonia from chest X-rays. Frameworks like DeepChem apply deep learning to problems in drug discovery, while platforms like Open Targets integrate data on drug targets and their relationships to diseases. Overall, machine learning shows promise for advancing biological research, though developing expertise through learning resources and implementing models to solve real-world problems is important.
Ranadip Pal and his team won a prize in 2012 for using multiple types of 'omics data to more successfully predict drug sensitivity than other teams. Pal then improved on this model by combining five types of data, reducing uncertainty by over 40%. Scientists are increasingly seeing the value of integrated omics analyses to better understand cellular processes. A company called General Metabolics offers integrated omics services to analyze multiple data types from cell samples together, providing insights that individual analyses cannot. This allows researchers to gain a clearer understanding of cellular physiology.
The National Resource for Network Biology aims to provide freely available, open-source software tools to enable researchers to assemble biological data into networks and pathways and use these networks to better understand biological systems and disease; it pursues this mission through technology research and development projects, driving biological projects, collaboration and service projects, training, and dissemination; key components include the Cytoscape software platform, supercomputing infrastructure, and partnerships with over 30 external research groups.
This document describes a new type of heatmap called a "CoolMap" that allows for flexible multi-scale exploration of molecular network data. CoolMaps allow data to be collapsed and aggregated at different levels of a hierarchical tree, enabling visualization and pattern discovery across scales. This approach addresses limitations of conventional heatmaps and enables linking data to existing biological knowledge. Several case studies demonstrate how CoolMaps can provide new insights into gene expression, nutrition, DNA methylation, glucose monitoring, and network data. The core concepts and near-ready software releases are presented, along with acknowledgments.
This document summarizes the analysis of multi-omics data from a study of lung squamous cell carcinoma. The analysis integrated genomic, transcriptomic, clinical and phenotypic data from the study using various bioinformatics tools. Key findings included the identification of genes associated with patient survival outcomes and the stratification of patients into prognosis groups. Differential expression and mutational analysis revealed molecular differences between the basaloid and squamous cell carcinoma subtypes. Integration across molecular levels identified genes whose expression changed due to mutations. Pathway and network analysis provided functional insight. Integration with metabolic models highlighted potentially druggable targets. The multi-omics, multi-scale analysis framework generated novel biological insights from the data.
- The document discusses various approaches for applying machine learning and artificial intelligence to drug discovery.
- It describes how molecules and proteins can be represented as graphs, fingerprints, or sequences to be used as input for models.
- Different tasks in drug discovery like target binding prediction, generative design of new molecules, and drug repurposing are framed as questions that AI models can aim to answer.
- Techniques discussed include graph neural networks, reinforcement learning, and conditional generation using techniques like translation models.
- Several recent works applying these approaches for tasks like predicting drug-target interactions and generating synthesizable molecules are referenced.
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Tom Connor
Introducing the HPC challenges associated with developing a set of clinical microbial genomics services in the NHS in Wales. Demonstrating the potential of these technologies, and the impact it is already having for the patients of the Welsh NHS.
The document discusses the growing role of data sharing communities and standards in genomics research. It outlines key paradigm shifts from studying individual genomes to pangenomes and from culturable to unculturable organisms. It highlights the increasing data volumes from large sequencing projects and metagenomics. It emphasizes that community agreement on data stewardship and standards is needed to fully leverage these genomic data resources through the work of groups like the Genomic Standards Consortium.
Presentation Alliance of European Life Sciences Law Firms
(Julian Hitchcock and Sofie van der Meulen) on legal aspects of big data in pharma. Topics: privacy, IP, medical devices and IVD.
This document discusses computational tools for analyzing biological networks and molecules. It begins by describing how networks can be inferred from molecular data using the Cytoscape platform and its apps. It then discusses how networks can help identify proteins of interest by describing a case study where network analysis identified additional proteins that provided an "explanation" for differences observed in proteomic data. The document concludes by discussing how computational analysis can help address questions that go beyond simply identifying the best network, such as which parts of a network are well-supported or could be modified to better fit the data.
Tijana Milenković is an assistant professor who develops algorithms for network alignment and mining of biological networks. Her lab has developed methods like GRAAL, H-GRAAL, and MAGNA for mapping similar nodes between networks to transfer knowledge across species. MAGNA directly optimizes edge conservation during alignment to improve accuracy. The lab has also applied network alignment to study aging networks and predict novel aging genes, and developed tools for dynamic network analysis and de-noising networks via link prediction.
In recent years there has been a dramatic increase in the number of freely accessible online databases serving the chemistry community. The internet provides chemistry data that can be used for data-mining, for computer models, and integration into systems to aid drug discovery. There is however a responsibility to ensure that the data are high quality to ensure that time is not wasted in erroneous searches, that models are underpinned by accurate data and that improved discoverability of online resources is not marred by incorrect data. In this article we provide an overview of some of the experiences of the authors using online chemical compound databases, critique the approaches taken to assemble data and we suggest approaches to deliver definitive reference data sources.
Machine learning techniques are needed to analyze the vast amounts of genomic and clinical data being generated through techniques like next-generation sequencing and electronic medical records. This data holds promise for advancing cancer research, but traditional methods cannot handle the scale and complexity of big data. Machine learning algorithms can extract useful insights from these large datasets to help diagnose, treat, and cure cancer patients.
Presentation "The Impact of All Data on Healthcare"
Keith Perry
Associate VP & Deputy CIO
UT MD Anderson Cancer Center
With continuing advancement in both technology and medicine, the drive is on to make all data meaningful to drive medical discovery and create actionable outcomes. With tools and capabilities to capture more data than ever before, the challenge becomes linking existing structured and unstructured clinical data with genomic data to increase the industry’s analytical footprint.
Learning Objectives:
∙ Discuss the need to make all data meaningful in order to speed discovery of new knowledge
∙ Provide examples of an analytical direction that supports evolution in medicine
∙ Expose the challenges facing the industry with respect to ~omits
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
Data commons are emerging as a solution to challenges in analyzing and sharing large biomedical datasets. A data commons co-locates data with cloud computing infrastructure and software tools to create an interoperable resource for the research community. Examples include the NCI Genomic Data Commons and the Open Commons Consortium. The open source Gen3 platform supports building disease- or project-specific data commons to facilitate open data sharing while protecting patient privacy. Developing interoperable data commons can accelerate research through increased access to data.
This document discusses how machine learning can be applied to help plastic surgeons better analyze and interpret the large amounts of patient data that are now routinely collected. It begins by explaining that traditional data analysis techniques struggle with "big data," which contains complex patterns. Machine learning, a subfield of artificial intelligence, can generate algorithms capable of acquiring knowledge from historical examples to help address this challenge. The document then provides examples of how machine learning has already been successfully applied in other fields and in cancer treatment. It proposes that plastic surgeons should also look to machine learning approaches to more efficiently deliver healthcare and improve surgical outcomes by extracting meaningful insights from their extensive patient data collections. Specific potential applications discussed include burn surgery, microsurgery, and various types of reconstruct
Caris Life Sciences needed to more efficiently process, analyze, manage and store terabytes of data daily from cancer patient samples. They deployed an IBM high-performance computing solution including servers, storage technologies, and analytics software. This enabled Caris to speed up analysis of molecular data to advance personalized cancer treatment options for patients. The solution provided massive scalability and processing power to handle Caris' large and growing data needs.
Caris Life Sciences needed to more efficiently process, analyze, manage and store terabytes of data daily from cancer patient samples. They deployed an IBM high-performance computing solution including servers, storage technologies, and analytics software. This enabled Caris to speed up analysis of molecular data to advance personalized cancer treatment options for patients. The solution provided massive scalability and processing power to handle Caris' large and growing data needs.
Federal Research & Development for the Florida system Sept 2014 Warren Kibbe
This document discusses challenges in cancer data integration and analysis. It proposes the development of open science models, standardized data elements, and sustainable informatics infrastructure. Emerging technologies like mobile devices, social media, and cloud computing create opportunities to build a national "learning health system" for cancer. The National Cancer Institute is pursuing initiatives like the Cancer Genomics Data Commons and cloud pilots to leverage large genomic and clinical datasets using these technologies and develop predictive models to improve outcomes. The ultimate goal is a system that facilitates data sharing, continuous learning from all cancer patients, and personalized, predictive oncology.
This document contains the table of contents and introductory letter for an issue of the Catalyst magazine. The table of contents lists 17 article titles covering topics in various scientific fields. The introductory letter from the Editor-in-Chief thanks the production team for their work on the issue over the past year, highlighting contributions from the editors, illustrators, authors, and translators who made the issue possible.
Block23 is a global initiative offering genomic scans and personal health data analysis using AI to discover cancer prevention and treatment. It aims to collect genomic and health data from over 100,000 people and store it on the blockchain, allowing for broad data analysis to identify statistically significant relationships between genetics, lifestyle, and disease. This will provide a basis for precision medicine. The blockchain and a crypto token called PMX will allow individuals to own and control their data, while incentivizing sharing with medical researchers and companies in exchange for tokens. Block23 hopes to build a global precision medicine ecosystem through this decentralized model.
The document discusses how genomics and blockchain technologies will transform healthcare by making genomic data and insights more accessible and affordable globally. It describes Shivom's vision of creating a genomic data ecosystem where individuals own and can choose to share their genomic data securely via blockchain, and how this could benefit research, precision medicine and personalized healthcare. Key features of the Shivom platform include genome sequencing and storage, a marketplace for healthcare services, and tools to incentivize data sharing and collaboration between individuals, researchers and organizations.
Genome sequencing technology available today can accurately sequence a whole genome from an individual’s test sample for a surprisingly low cost.
As a result, the adoption of this technology is rapidly expanding as medical centers around the world embrace its utility in informing healthcare decisions—an emerging reality of personalized medicine.
- The traditional business model of personal genomics companies sees individuals pay to sequence their genomes and receive analysis results, while the companies keep the genomic data and sell it to pharmaceutical companies. However, this model has limitations in addressing high sequencing costs for individuals, lack of individual control over their data, and lack of incentives.
- The proposed Nebula model uses blockchain technology to connect individuals directly with data buyers, eliminating personal genomics companies as middlemen. This is intended to reduce sequencing costs for individuals, give them control over their genomic data and how it is used, and provide incentives.
- The model aims to satisfy both individuals, by addressing the above issues, and data buyers' needs around data availability, acquisition, and
This document discusses using data mining techniques like association rule mining and improved apriori algorithm with fuzzy logic to develop an expert system that can predict the risk of osteoporosis based on a patient's clinical data and history. It aims to help doctors make more informed decisions early on to prevent osteoporosis. The system would find relationships between various risk factors and diagnose osteoporosis severity to identify at-risk patients before costly tests. Literature on using different algorithms like decision trees and neural networks for medical diagnosis and predicting osteoporosis risk is also reviewed.
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...DNA Compass
World DNA Day and Genome Day, Dalian China 2011
"Possible Solution for Managing the Worlds Genetic Data" given by Alice Rathjen, Founder & President DNA Guide, Inc.
Proposes genetic tests be given a rating for quality of science, medical utility and viewing risk so as to facilitate the flow of genetic information in a responsible manner from the lab to the physician and patient. Explains how technology combined with public policy could enable both privacy and personalized medicine to thrive. Advocates individual ownership over personal genetic data and suggests the genome as a data format could provide the foundation for digital human rights.
tags: DNA, genetic testing, privacy, personalized medicine, FDA regulation
Benefits of Big Data in Health Care A Revolutionijtsrd
Lifespan of a normal human is increasing with the world population and it produces new challenge in health care. big data change the method of data management ,leverage data and analyzing data.with the help of big data we can reduces the costs of treatment, reducing medication and provide better treatment with predictive analytics. Health related data collected from various sources like electronic health record EHR ,medical imaging system, genomic sequencing, pay of records, pharmaceutical research , and medical devices, etc. are refers to as big data in healthcare. Dr. Ritushree Narayan ""Benefits of Big Data in Health Care: A Revolution"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22974.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-miining/22974/benefits-of-big-data-in-health-care-a-revolution/dr-ritushree-narayan
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
This document discusses how biomedical discovery is being disrupted by big data. Large genomic, phenotype, and environmental datasets are needed to understand complex diseases that result from combinations of many rare variants. However, analyzing large biomedical data is costly and difficult given the standard model of local computing. The document proposes creating large "commons" of community data and computing as an instrument for big data discovery. Examples are given of the Cancer Genome Atlas project, which has petabytes of research data on thousands of cancer patients, and how tumors evolve over time. Overall, the document argues that new models of shared biomedical clouds and commons are needed to enable cost-effective analysis of big biomedical data.
A look at the key trends and challenges in applying Big Data to transform healthcare by supporting research, self care, providers and building ecosystems. Purchase the report here: https://gumroad.com/l/PlXP
Similar to Microsoft genomics to advance clinical science (20)
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
1. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H
Partnering
to Advance
Clinical
Genomics
The Microsoft Cloud empowers organizations of
all sizes to re-envision the way they bring together
people, data, and processes that better engage
patients and customers, empower care teams
and employees, optimize clinical and operational
effectiveness, and digitally transform health.
2. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H
Advancing genomic understanding
Microsoft’s investment in genomics
Partnering for success
Curoverse
BC Platforms
DNAnexus
Democratizing genomics
Table of contents
3. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H 2
“We’ve done medicine the way
we’ve done medicine because
we haven’t had the benefit of
genomics,” said Dr. Simon Kos,
Chief Medical Officer at Microsoft.
“We prescribe a therapy based on how the average
population will respond, but for individual patients, that
might not be the right therapy—which is cold comfort
for you as an individual patient when you get a therapy
that doesn’t work. Over time, we’ve managed to
elevate medicine to a science, but right now that
science is driven by aggregate evidence rather than
personalized evidence.”
Advancing genomic understanding
The availability of more powerful tools is driving
advances in genomics research and applications that
are bringing tailored, personalized care plans for
patients into greater focus. “When you’ve got genomic
information about a patient, with all of its implications
understood, and add the patient’s phenotypic and
demographic information, then you start to get the
real picture,” said Kos.
4. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H 3
Infinitely salable, readily available resources
in the cloud
As they gather information from more and more
patients, researchers and clinicians may find that the
volume of data outstrips their in-house resources. A
paper published in the journal PLOS Biology reports
that “as much as 2–40 exabytes of storage capacity
will be needed by 2025 just for the human genomes.”*
Attempting to manage that level of information on-
premises can quickly become a perfect storm of
problems: massive storage requirements, massive
compute requirements, and strict requirements around
security compliance, privacy and data sovereignty.
Fortunately, the cloud, with its practically unlimited
ability to scale, presents a viable alternative. The cloud
can automatically provide additional storage in minutes
—in other words, as soon as it’s needed. In contrast,
the process of requisitioning, purchasing, and setting
up additional disk space on-premises can take months
in many organizations.
Computing resources are similarly scalable on demand.
Translating raw genomic data into useful information
requires extensive processing that only supercomputers
or cloud computing resources can handle. While only a
small number of elite research centers, major medical
centers, or large pharmaceutical companies have access
to super computers, any organization—from the smallest
startup to the largest enterprise—can access the cloud.
A more in-depth understanding of how individuals who
have certain genes might respond to certain therapies
will open the doors to precision medicine, a tantalizing
prospect to clinicians. Technology and medical research
advances are combining to dramatically improve
the timeliness and cost effectiveness of genomics,
making it relevant in clinical scenarios, even for
medical emergencies.
Of course, the ability to tailor treatments much more
specifically based on an individual’s genomic profile
requires more than their genetic data. “It’s not enough
to have the genome,” said Kos. “We’ve got to know
what the genome means, and that requires collaborative
research.” Since volumes of genomes must be
compared to yield insights, the more genomes that are
available, the more accurate and informative the
analysis will be.
“Any one organization may have petabytes of genomic
information, which is huge, but it’s just a fraction of all
of the genomes that have been sequenced in the
world,” said Kos. “Because these are large file sizes, be-
cause historically they’ve been held on-premises, and
because the data is highly sensitive, there’s no way to
aggregate the world’s genomic information to do
de-identified ethical studies into what these genomes
mean. We need large amounts of data, aggregated in
a single place, secure but accessible. That’s what our
cloud platforms afford.”
5. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H 4
heavy, long-running workloads that push Microsoft’s
cloud platform, Azure, to its limits. “We’re the only
cloud provider that has a research team dedicated to
genomics,” said Miller. “We understand the needs and
concerns of researchers because we’re researchers
ourselves.” To advance genomics, the team takes an
open-source approach, sharing much of its work with
the research community.
In addition to the work her team has done to improve
secondary and tertiary analysis, they’ve helped integrate
genomics-specific analytical tools and capabilities into
the Microsoft platform. “Combining these tools with the
research we’ve done makes our results significantly
more powerful,” Miller explained. “In essence, we’re
blending data science and artificial intelligence with
core bioinformatics.”
Cloud services can be resources, like utilities, that
genomics solutions draw on only as required, keeping
costs down. Industry experts who run datacenters for
a living keep the underlying software current with the
latest features and security patches as soon as they
become available, ensuring that clinicians and
researchers always have immediate access to the
latest technology. Moreover, cloud providers take
physical, technical, and operational measures to
achieve and maintain security and privacy standards
that meet compliance requirements.
Learn more about privacy, security, and compliance
in the Microsoft cloud
“Someday, with the added speeds and feeds from
cloud infrastructure, we’ll be able to sequence and
analyze a full genome for less than a hundred dollars,”
said Kos. “As more sequenced genomes become
available, genomics will move into the mainstream.
Your primary doctor will be able to use your genomic
information for practical purposes.”
Microsoft’s investment in genomics
In 2007, Microsoft established a team focused on
genomics in its Artificial Intelligence and Research
division. This team of engineers, data scientists, and
scientists, led by Geralyn Miller, works hand in hand
with Microsoft’s cloud computing team to create
genomics-specific platforms, consistently running
Project Premonition seeks to detect pathogens in the
wild before they cause outbreaks of disease by capturing
mosquitoes and sequencing the genomes of the mosquito,
the blood they have ingested, and any viruses they are carry-
ing. The project is using drones to find mosquito hotspots
and breeding sites and catches specimens with robotic traps.
A core focus of Microsoft’s genomics team is to
develop analytics, machine learning, and artificial
intelligence algorithms to aid clinicians who want to
make more accurate diagnoses more quickly, and
reduce trial-and-error when prescribing treatments.
Microsoft Research offers Factor Spectrally Transformed
Linear Mixed Models (FaST-LMM), a set of tools for genome-wide
association studies (GWAS). The tools help researchers handle huge
volumes of data by cataloging and comparing nucleotide variations
at specific positions.
Learn more here.
6. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H 5
The goal is not only to inform a patient’s current
condition, but also predict, based on data collected
from the patient and similar subpopulations, how their
condition may evolve, and help suggest the optimal
course of action.
“For example, just because you have lung cancer
doesn’t mean all lung cancers are the same,” Miller
explained. “Tumors don’t always identify with the tissue
of origin. Today, clinicians generally start with the treat-
ment that’s most commonly used. With genomics, you
can get more specific about the type of cancer and the
best type of treatment. If you can tell upfront whether a
person has an inherited tendency for an adverse reaction
to a drug, you can make a better choice as the doctor.
It’s all about taking the raw data and getting to an
insight that’s actionable in the context of a care situation.”
Partnering for success
Building on its collaboration with researchers, Microsoft
has developed a vibrant ecosystem of solution partners
who focus solely on genomics. As these partners build
solutions based on input from researchers and scientists,
they help Microsoft further develop and accelerate
commercialization of their cloud-based platforms,
optimizing services for storing and analyzing vast
amounts of genomics data.
Curoverse: pooling genomic data while
protecting privacy
“The number of genomes being sequenced today is
growing at three times the rate of Moore’s Law,” said
Keith Elliston, Chief Commercial Officer for Curoverse,
which provides open source genomics enabling tech-
nology. “You’re not done once you’ve sequenced a
single genome. The germ line genome is just the base
map. Things change. When you develop tumors, that’s
a change. Then there’s the whole microbiome, which
is all the bacteria living on your skin, in your gut and in
your hair, in your nose. These have a big impact on
what’s happening. You look at terminally differentiated
cells. Your immune system has rearrangements of the
whole genome in every cell. You have all these
genomes that become very important, so we’re not
talking about one genome per person. We’re talking
about literally thousands of genomes per person.”
Learn more about genomic and biomedical
computing solutions from Curoverse
Since single genomes don’t yield insight in isolation,
the ability to amass genomic data, and then share it
broadly and easily, is particularly important. But
moving data into a single repository—even within
the cloud—isn’t practical.
“If you have a hundred thousand genome centers in
the world, and they each have a hundred thousand
genomes, the model of bringing all of them into one
database will never work,” said Elliston. “It’s all going
to live in the hundred thousand different centers out
there. Your genome sequence is the most private and
personal data you have. It’s really the fundamental
blueprint that makes you you, and from that
perspective, there are lots of laws about how that
data must be protected. If you put data up on the
cloud securely and privately, you’re not going to mix
it with anybody else’s.”
The cloud makes it possible to assemble genome
data from multiple organizations into a data lake,
while retaining the appropriate privacy and security
controls. Cloud-based algorithms can run on
individual datasets regardless of whether they’re
cloud-based or stored in localized datacenters.
7. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H
Even with faster sequencing, however, the resources
required to process the millions of genomes that will
be necessary to yield a broad range of useful clinical
insights are gargantuan. “When you go from ten
thousand genotype patients to one hundred thousand
to one million patients,” said Nino Da Silva,
Executive Vice President of BC Platforms, a provider
of bioinformatics and genome data management
software headquartered in Switzerland, “it will take
2.4 million CPU core hours just to impute the raw data.
And at this point we haven’t even started analyzing.”
Fortunately, the cloud makes such intensive computing
requirements manageable without a large, upfront
investment in infrastructure. The calculations required
to sequence genomes, and then analyze them, can
take place in the cloud, even if sensitive patient data
remains on-premises. The role of companies like BC
Platforms is to offer end-to-end solutions that provide
ready access to genomic information, even for
clinicians who aren’t geneticists, or don’t have a high
level of training in genetics.
“We take the data and make it understandable,” Da
Silva explained. “When a patient comes in with a
referral for a specific question, such as whether they
have the BRCA breast cancer gene, our system will take
the raw data, align and create the usable file (gVCF)
and from there create the variant core report. We then
inject the report into the electronic medical record or
While the algorithms generate insight, the genome
data stays where it is. “When our platform, Arvados,
does genetic variation analysis,” Elliston explained, “it
federates the data in a way that encapsulates the
analysis in a workflow, which we can distribute across
the datacenters where those data live. We do the
analysis, aggregate the results, and only export
metadata. We don’t move the private individual data.
Even on the same cloud infrastructure, it’s stored in
distinct places. With Arvados, it can still be analyzed
with everyone else’s data, while respecting all that
privacy and ownership.”
lab system without any manual intervention, so
doctors can see the results where they need it, with
the rest of the patient data.” The genomic information
becomes one of many tools that clinicians use in
treating their patients.
Learn more about genomic data management and
analysis solutions from BC Platforms
“Once we’ve optimized the whole flow from the
diagnostics devices in the laboratory to the patient
records, we’ll need to form a standard of what’s going
to be reported and how it will be used in terms of the
actual patient care,” said Da Silva. “It’s also important
to understand that clinical decision makers, university
researchers and potentially commercial partners can
all benefit from different pieces of the same know-
ledge base. We must ensure that doctors and patients
always have the most up-to-date, personalized, valid
interpretation available.”
Microsoft’s method of running the Burrows-Wheeler Aligner
(BWA) and the Broad Institute’s Genome Analysis Toolkit (GATK)
on its Azure cloud computing system is seven times faster than
the previous version, allowing researchers and medical
professionals to get results in just four hours instead of 28.
Learn more here.
Microsoft has a unique ability to do hybrid cloud installations
using local datacenters in combination with Azure. Microsoft
can deploy the Azure Stack on-premises at the customer site or
at the facility of a hosting partner. The clinical data remains se-
curely in the local data warehouse to comply with data sover-
eignty requirements, while the processing and analysis takes
place in the Azure cloud.
BC Platforms: turning raw genomic data
into analytical insights
It took 13 years and USD $2.7 billion to sequence the
first genome. Steady improvements in algorithms
reduced sequencing time to 28 hours. Algorithms
Microsoft has developed with the Broad Institute can
yield results in just four hours.
6
8. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H
Da Silva believes that at some point, clinicians will rely
on genetic data about patients as a matter of course.
“Genomics is like any other medical system,” he said,
“and I think we should perceive it that way. Nobody
would today consider replacing a hip without doing
an X-ray or an MRI first and interpreting it. I think
genetics will become the same, that nobody will treat
a patient for cancer without having specific genetic
information. Not doing so will be considered as
uninformed as doing hip surgery without an X-ray.”
DNAnexus: employing genomics in
clinical settings
Dr. David Shaywitz, Chief Medical Officer at DNAnexus,
a Mountain View, California company that offers a
cloud-based genome informatics and data manage-
ment platform, agrees that genomics will become a
standard component of clinical evaluation. “There’s an
increased recognition of the relevance of genetics as
people have a better understanding of it and as the
tools have become basically faster, cheaper, and
better,” he said.
Yet many clinicians remain skeptical about genomics.
“For over a generation, physicians, even in medical
school, have been hearing about the promise of
genetics, and have caught glimpses of ways genetic
diagnosis and genetically informed treatments could
really be helpful,” Shaywitz explained. “But there’s
always been a disconnect between the promise
of genetic medicine and the reality of precision
or personalized medicine.” The successful use of
genomics to improve diagnoses, drug selection, and
outcomes for patients with specific cancers, such as
breast cancer, is encouraging skeptics to reevaluate.
“What’s so exciting now are the specific tools available
for patients,” said Shaywitz. “There are more and more
clear use cases—clear opportunities where you can
demonstrate that patients’ lives were dramatically
improved through the use of genetic information.”
For example, it’s estimated that about one-third of the
children under the age of one who are hospitalized
suffer from a genetic condition, and that 25 to 40
percent of mortality in high acuity neonatal intensive
care units (NICUs) may be attributable to a genetic
disorder. “An accurate genetic diagnosis—which in
some cases can now be achieved within about 24
hours —can immediately influence the direction of
care, and also avoid a prolonged and often excruciat-
ing diagnostic odyssey,” said Shaywitz. “While the
primary driver is, as always, humanistic, that is, it’s the
‘right thing to do,’ there are also immediate economic
benefits, given the high cost of keeping children in the
NICU. The result is that appropriate, early use of
genetics in these cases can be a win for everyone.”
Learn more about sharing and management of
genomic data and tools to accelerate genomics
from DNAnexus
DNAnexus has developed a platform for local and
distributed sequencing and analysis, built on the
Microsoft cloud, that enables an end-to-end, rapid
genomics solution to support both researchers and
clinicians. “Oncologists seeking to provide the best care
for their cancer patients are going to be performing
molecular evaluations,” Shaywitz predicted, “and cancer
centers that want to provide the best care for their
patients will want to collect and organize this
information—both the genetic information and the
patient journey information.” Over time, as more
medical centers and research facilities around the world
gather genetic data, it will become more useful for
identifying the best treatments options.
7
9. D I G I T A L T R A N S F O R M A T I O N I N H E A L T H
“Ideally, we’ll have a learning healthcare system,” said
Shaywitz. “We’d learn from the clinical journey of the first
patient integrating their clinical and genetic information,
and apply it intelligently so that we treat the next patient
in a fashion that reflects the knowledge gained from
each previous interaction. It will be a virtuous cycle.”
Democratizing genomics
The successful use of genomics to improve diagnoses
and outcomes for neonatal infants and patients with
specific cancers, such as breast cancer, is encouraging
efforts to develop more genomic information.
Microsoft and its partners are hard at work making
this information easier to sequence, store, analyze,
and digest, thus transforming the promise of genomics
into real-world evidence that clinicians can use to
improve care.
“We’re taking our research investment and getting the
IP into the hands of our customers and partners through
a series of Azure-based offerings,” said Miller. “Our goal
with our partners is to offer a turnkey way to do the
bioinformatics pipeline—to reduce complexity. When
you run analyses across large sample sets through the
same pipeline, you end up with more consistent data.
In essence, we’re democratizing genomics.”
Learn more about Microsoft’s investments in genomics.
8
Project Hanover, is a Microsoft Research effort that seeks to
empower biomedicine with cutting-edge computer science,
focusing on three areas: machine reading of the millions of bio-
medical articles to form queryable knowledge bases, cancer
decision support using AI with a current focus on Acute Myeloid
Leukemia (AML), and the use of AI in chronic disease management.
DNAnexus, in partnership with Microsoft, supports
The Stanford Center for Genomics and Personalized Medicine
(SCGPM), which is creating a new paradigm of patient-centered
medicine that encompasses the entire genome of individuals
“to improve disease prediction, prevention, and treatment of
conditions such as cancer, diabetes, heart disease, asthma,
schizophrenia, and many others.” Efforts include reducing the
time to achieve effective treatment, integrating bioinformatics
with the patient’s electronic medical record, eliminating trial
and error in prescribing drugs, identifying disease risks, offering
patients genetic counseling and engaging them to enhance
their compliance with prescribed treatments.
10. Scale. The cloud provides an almost infinite set of computing
resources, which is necessary for storing or processing massive
amounts of data. When you need more resources, you can scale
up instantly and automatically. When you need fewer resources,
you scale down instantly and automatically. In other words, you
get exactly what you need, when you need it—no more, no less.
Choosing a cloud offeringBenefits of cloud computing
Health organizations need cloud services that are not only
powerful, but also secure and compliant. Here is a short
checklist to consider when evaluating cloud offerings:
• Is it Enterprise-ready? Do the services it offers scale to large
organizations with many users or customers?
• Does it offer more than infrastructure, such as platforms
that support core functions for productivity,
communications, relationship management, and analytics,
so you can easily tailor solutions to meet your needs?
• Does it work with on-premises environments and with
technologies from multiple cloud providers and device
manufacturers?
• Does it have certifications to show it complies with
health security and privacy regulations in the countries
where you operate, so you can safely store and process
personal health information?
Economics. Few organizations have the resources to set up
or manage very-largescale data centers. The cloud model, like a
utility, is pay-as-you-go. This way you never have to worry about
having fewer resources than you need, and your organization
never has to worry about paying for more than you’re using.
Speed. Traditionally, getting access to new or better tools, more
storage space, or more computing power has involved a lot of
waiting. This happens when IT has to get budget, purchase
hardware and/or software, and set it up before they can give you
what you need. With the cloud, it takes minutes instead of days,
weeks, or months, to get more storage space, compute cycles, or
updated software.
Trust. Security is a big concern when it comes to cloud adoption.
Health data needs to be protected. The industry experts who
manage cloud data centers take physical, technical, and
operational measures to achieve and maintain security and privacy
so your organization can meet the compliance requirements of
regulators. They deploy the latest security patches as soon as
they’re available, so you get them automatically.
9D I G I T A L T R A N S F O R M A T I O N I N H E A L T H