Big Data and machine learning are increasingly important in biomedical science and clinical practice. Big Data refers to large and complex datasets that are too large for traditional tools to handle. Machine learning involves algorithms that can recognize patterns in data without being explicitly programmed. Some challenges of working with big data and machine learning include issues with data volume, variety, and veracity. However, techniques like distributed analysis, standards, and validation can help address these challenges.
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Innovation applications of microphysiological systems (MPS) have been growing over the past decade, especially with respect to the use of complex human tissues for assessing safety of drug candidates – but broad industry adoption of MPS methods has not yet become a reality.
This webinar addresses some recent advances in MPS development and begins to explore the barriers to increased incorporation of MPS to improve drug safety assessment and to provide safer, more effective drugs into the clinical pipeline.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
Machine learning, health data & the limits of knowledgePaul Agapow
Lecture for Imperial College London's MSc in Health Data Analytics, critiquing a recent paper on COVID diagnosis and moving out to talk about good practices (& limits) in ML and model building
Federated Learning (FL) is a learning paradigm that enables collaborative learning without centralizing datasets. In this webinar, NVIDIA present the concept of FL and discuss how it can help overcome some of the barriers seen in the development of AI-based solutions for pharma, genomics and healthcare. Following the presentation, the panel debate on other elements that could drive the adoption of digital approaches more widely and help answer currently intractable science and business questions.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
In this webinar Dr Henriette Harmse from EMBL-EBI presents how they are using their ontology services at EMBL-EBI to scale up the annotation of data and deliver added value through ontologies and semantics to their users.
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Innovation applications of microphysiological systems (MPS) have been growing over the past decade, especially with respect to the use of complex human tissues for assessing safety of drug candidates – but broad industry adoption of MPS methods has not yet become a reality.
This webinar addresses some recent advances in MPS development and begins to explore the barriers to increased incorporation of MPS to improve drug safety assessment and to provide safer, more effective drugs into the clinical pipeline.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
Machine learning, health data & the limits of knowledgePaul Agapow
Lecture for Imperial College London's MSc in Health Data Analytics, critiquing a recent paper on COVID diagnosis and moving out to talk about good practices (& limits) in ML and model building
Federated Learning (FL) is a learning paradigm that enables collaborative learning without centralizing datasets. In this webinar, NVIDIA present the concept of FL and discuss how it can help overcome some of the barriers seen in the development of AI-based solutions for pharma, genomics and healthcare. Following the presentation, the panel debate on other elements that could drive the adoption of digital approaches more widely and help answer currently intractable science and business questions.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
In this webinar Dr Henriette Harmse from EMBL-EBI presents how they are using their ontology services at EMBL-EBI to scale up the annotation of data and deliver added value through ontologies and semantics to their users.
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
Presentation by Ola Spjuth (Uppsala University and Scaleout Systems) on 2019-10-16 at Swedish e-Science Academy 2019 in Lund, Sweden.
Research website at Uppsala University: https://pharmb.io
Scaleout Systems: https://scaleoutsystems.com
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
This workshop is a hands-on introduction to machine learning with R and was presented on December 8, 2017 at the University of South Carolina for the 2017 Computational Biology Symposium held by the International Society for Computational Biology Regional Student Group-Southeast USA.
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
Presentation at SLAS Europe 2019 in Barcelona on 28 june, 2019.
High-content microscopy in automated laboratories present many challenges for storing and processing data, and to build AI models to aid decision making. We have established an informatics system to serve a robotized cell profiling setup with incubators, liquid handling and high-content microscopy for microplates. The informatics system consists of computational infrastructure (CPUs, GPUs, storage), middleware (Kubernetes), imaging database and software (OMERO), and workflow system (Pachyderm) to perform online prioritization of new data, and automate the process from acquired images to continuously updated and deployed AI models. The AI methodologies include Deep Learning models trained on image data, and conventional machine learning models trained on data from Cell Painting experiments. The microservice architecture makes the system scalable and expandable, and a key objective is on improving screening and toxicity assessment using AI-aided intelligent experimental design.
It seems that AI is also becoming a buzzword, like design thinking. Everyone is talking about AI or wants to have AI, and sees all the ideas and benefits – that’s fine, but how do you get started? But what’s different now? Three innovations have finally put AI on the fast track: Big Data, with the internet and sensors everywhere; massive computing power, especially through the Cloud; and the development of breakthrough algorithms, so computers can be trained to accomplish more sophisticated tasks on their own with deep learning. If you use new technology, you need to explore and know what’s possible. With design thinking, it aids to outline the steps and define the ways in which you’re going to create the solution. Starting with mapping the customer journey, defining who will be using that service enhanced with intelligent technology, or who will benefit and gain value from it. We discuss how these two worlds are coming together, and how you get started to transform your venture with Artificial Intelligence using Design Thinking.
Speaker: Claudio Mirti, Principal Solution Specialist – Data & AI, Microsoft
Advancing Foundation and Practice of Software AnalyticsTao Xie
Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
Presentation on how past medical records can be used to provide appropriate and timely treatment for patients using Genetic Algorithm and Feature Selection
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
Considerations and challenges in building an end to-end microbiome workflowEagle Genomics
Many of the data management and analysis challenges in microbiome research are shared with genomics and other life-science big-data disciplines. However there are aspects that are specific: some are intrinsic to microbiome data, some are related to the maturity of the field, with others related to extracting business value from the data.
In this talk I'll discuss work in biomedical image and volume segmentation and classification, as well as outcome prediction modeling from insurance claims data that I've pursued at LifeOmic here in the Triangle. In the former case datasets include radiological image volumes, retinal fundus images, and cell images created with fluorescent microscopy. The latter includes MIMIC-III data represented as FHIR objects. I'll discuss the relative challenges and advantages of doing ML locally vs. on a cloud-based platform.
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...Health Catalyst
It’s been over six years since IBM’s Watson amazed all of us on Jeopardy, but it has yet to deliver similar breakthroughs in healthcare. The headlines in last week’s Forbes article read, “MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine.” Is it really a setback for the entire industry or not? Health Catalyst’s EVP for Product Development, Dale Sanders, believes that the challenges are unique to IBM’s machine learning strategy in healthcare. If they adjust that strategy and better manage expectations about what’s possible for machine learning in medicine, the future will be brighter for Watson, their clients, and AI in healthcare, in general. Watson’s success is good for all of us, but it’s failure is bad for all of us, too.
Join Dale as he discusses:
The good news: Machine learning technology is accelerating at a rate beyond Moore’s Law. Dale believes that machine learning algorithms and models are doubling in capability every six months.
The bad news: The healthcare data ecosystem is not nearly as rich as many would believe, and certainly not as rich as that used to train Watson for Jeopardy. Without high-volume, high-quality data, Watson’s potential and the constant advances in machine learning algorithms will hit a glass ceiling in healthcare.
The best news: By adjusting strategy and expectations, there are still plenty of opportunities to do great things with machine learning by using the current data content in healthcare, while we build out the volume and breadth of data we need to truly understand the patient at the center of the healthcare picture… and you don’t need an army of PhD data scientists to do it.
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
Presentation by Ola Spjuth (Uppsala University and Scaleout Systems) on 2019-10-16 at Swedish e-Science Academy 2019 in Lund, Sweden.
Research website at Uppsala University: https://pharmb.io
Scaleout Systems: https://scaleoutsystems.com
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
This workshop is a hands-on introduction to machine learning with R and was presented on December 8, 2017 at the University of South Carolina for the 2017 Computational Biology Symposium held by the International Society for Computational Biology Regional Student Group-Southeast USA.
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
Presentation at SLAS Europe 2019 in Barcelona on 28 june, 2019.
High-content microscopy in automated laboratories present many challenges for storing and processing data, and to build AI models to aid decision making. We have established an informatics system to serve a robotized cell profiling setup with incubators, liquid handling and high-content microscopy for microplates. The informatics system consists of computational infrastructure (CPUs, GPUs, storage), middleware (Kubernetes), imaging database and software (OMERO), and workflow system (Pachyderm) to perform online prioritization of new data, and automate the process from acquired images to continuously updated and deployed AI models. The AI methodologies include Deep Learning models trained on image data, and conventional machine learning models trained on data from Cell Painting experiments. The microservice architecture makes the system scalable and expandable, and a key objective is on improving screening and toxicity assessment using AI-aided intelligent experimental design.
It seems that AI is also becoming a buzzword, like design thinking. Everyone is talking about AI or wants to have AI, and sees all the ideas and benefits – that’s fine, but how do you get started? But what’s different now? Three innovations have finally put AI on the fast track: Big Data, with the internet and sensors everywhere; massive computing power, especially through the Cloud; and the development of breakthrough algorithms, so computers can be trained to accomplish more sophisticated tasks on their own with deep learning. If you use new technology, you need to explore and know what’s possible. With design thinking, it aids to outline the steps and define the ways in which you’re going to create the solution. Starting with mapping the customer journey, defining who will be using that service enhanced with intelligent technology, or who will benefit and gain value from it. We discuss how these two worlds are coming together, and how you get started to transform your venture with Artificial Intelligence using Design Thinking.
Speaker: Claudio Mirti, Principal Solution Specialist – Data & AI, Microsoft
Advancing Foundation and Practice of Software AnalyticsTao Xie
Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
Presentation on how past medical records can be used to provide appropriate and timely treatment for patients using Genetic Algorithm and Feature Selection
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
Considerations and challenges in building an end to-end microbiome workflowEagle Genomics
Many of the data management and analysis challenges in microbiome research are shared with genomics and other life-science big-data disciplines. However there are aspects that are specific: some are intrinsic to microbiome data, some are related to the maturity of the field, with others related to extracting business value from the data.
In this talk I'll discuss work in biomedical image and volume segmentation and classification, as well as outcome prediction modeling from insurance claims data that I've pursued at LifeOmic here in the Triangle. In the former case datasets include radiological image volumes, retinal fundus images, and cell images created with fluorescent microscopy. The latter includes MIMIC-III data represented as FHIR objects. I'll discuss the relative challenges and advantages of doing ML locally vs. on a cloud-based platform.
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...Health Catalyst
It’s been over six years since IBM’s Watson amazed all of us on Jeopardy, but it has yet to deliver similar breakthroughs in healthcare. The headlines in last week’s Forbes article read, “MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine.” Is it really a setback for the entire industry or not? Health Catalyst’s EVP for Product Development, Dale Sanders, believes that the challenges are unique to IBM’s machine learning strategy in healthcare. If they adjust that strategy and better manage expectations about what’s possible for machine learning in medicine, the future will be brighter for Watson, their clients, and AI in healthcare, in general. Watson’s success is good for all of us, but it’s failure is bad for all of us, too.
Join Dale as he discusses:
The good news: Machine learning technology is accelerating at a rate beyond Moore’s Law. Dale believes that machine learning algorithms and models are doubling in capability every six months.
The bad news: The healthcare data ecosystem is not nearly as rich as many would believe, and certainly not as rich as that used to train Watson for Jeopardy. Without high-volume, high-quality data, Watson’s potential and the constant advances in machine learning algorithms will hit a glass ceiling in healthcare.
The best news: By adjusting strategy and expectations, there are still plenty of opportunities to do great things with machine learning by using the current data content in healthcare, while we build out the volume and breadth of data we need to truly understand the patient at the center of the healthcare picture… and you don’t need an army of PhD data scientists to do it.
I gave this talk in the "Presidential Symposium" at the annual meeting of the American Association of Physicists in Medicine, in Annaheim, California. The President of AAPM, Dr. Maryellen Giger, wanted some people to give some visionary talks. She invited (I kid you not) Foster, Gates, and Obama. Fortunately Bill and Barack had other commitments, so I did not need to share the time with them.
ODSC East 2017: Data Science Models For GoodKarry Lu
Abstract: The rise of data science has been largely fueled by the promise of changing the business landscape - enhancing one's competitive advantage, increasing business optimization and efficiency, and ultimately delivering a better bottom-line. This promise reaches across sectors as machine learning methods are getting better, data access continues to grow, and computation power is easily accessible. However, because the practice of doing data science can be expensive, there is a danger that this so-called promise of data science may only be available to the most well-resourced organizations with sophisticated data capabilities and staff. For the past five years, DataKind has been working to ensure social change organizations too have access to data science, teaming them up with data scientists to build machine learning and artificial intelligence solutions that aim to reduce human suffering. In doing so, DataKind has learned what it takes to apply data science in the social sector and the many applications it has for creating positive change in the world. This session presents DataKind projects showcasing the wide range of applications for ML/AI for social good. From using satellite imagery and remote sensing techniques to detect wheat farm boundaries to protect livelihoods in Ethiopia, to leveraging NLP to automate the time consuming process of synthesizing findings from academic studies to inform conservation efforts and to classifying text records to better understand human rights conditions across the world to using machine learning to reduce traffic fatalities in U.S. cities, learn about some of the latest breakthroughs and findings in the data science for social good space and learn how you can get involved
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
These are my #AI slides for medical deep learning using #radiology and medical imaging examples. Please use them & modify to teach your own group about medical AI.
di Riccardo Bellazzi
Università di Pavia
ICS Maugerio Pavia
Slide per l'incontro dal titolo "Big data, machine learning e medicina di precisione."
10 maggio 2018, Milano, Fondazione Giannino Bassetti
Video integrale: https://www.fondazionebassetti.org/it/focus/2018/08/big_data_machine_learning_e_me.html
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
Diamond Age Data Science and Zafgen, Inc, co-present on their work in using bioinformatics data effectively in the context of a small therapeutics company.
Eleanor Howe, PhD, CEO of Diamond Age, presents on the different types of computational biologist, the characteristics of a good bioinformatics team, and the pluses and minuses of using deep learning/AI in a discovery biology context.
Huseyin Mehmet, VP of Discovery Research at Zafgen, describes his team's work with Diamond Age and uses their capabilities to inform Zafgen's drug development. He discusses the needs of biotech companies for a diverse, experience bioinformatics team.
Can drug repurposing be saved with AI 202405.pdfPaul Agapow
Presented at DigiTechPharma, London May 2024.
What is drug repurposing. Why is it needed? What systematic approaches are there? Is AI a solution? Why not?
IA, la clave de la genomica (May 2024).pdfPaul Agapow
A.k.a. AI, the key to genomics. Presented at 1er Congreso Español de Medicina Genómica. Spanish language.
On the failure of applied genomics. On the complexity of genomics, biology, medicine. The need for AI. Barriers.
Digital Biomarkers, a (too) brief introduction.pdfPaul Agapow
Presentation at the Artid workshop, U. Bristol, March 2024, on digital biomarkers for improved clinical trials and monitoring of complex diseases, including neurological & movement disorders.
Journal club and talk given to Health Data Analytics MSc, February 2023. Reflecting on how to do good machine learning over biomedical data, the pitfalls and good practices
Where AI will (and won't) revolutionize biomedicinePaul Agapow
Presented AI & Big Data Expo, London, December 2022.
Given the hype and success of machine learning and AI in other fields, its application in healthcare is only natural.
- However, the actual successes in medicine have been limited, with a number of high-profile failures.
- Here, I propose that biology is uniquely complex, with our lack of domain knowledge limiting the application of AI.
- However, there is reason for cautious optimism, with AI-lead approaches shifting the odds in our favour.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Big Data & ML for Clinical Data
1. Big Data & Machine Learning for
Clinical Data
Paul Agapow <p.agapow@imperial.ac.uk>
Data Science Institute, Imperial College London
2. Biomedical science is now data
science
I was a biochemist, immunologist,
and then a infectious disease
bioinformatician
I’m now a “biomedical data
scientist”
I will be a Health Informatics
Director at AstraZeneca
About me & these lectures
WikiMedia Commons
3. We increasingly use & need:
Lots of complex data
Real world evidence (outside RCTs)
Computational tools
Statistical analysis
Complex interactions
Precision medicine: prediction &
(sub)typing
Also:
Cheap
Successful in other domains
But lots of hype and jargon
Biomedical science is now data science
WikiMedia Commons
4. The world is increasingly
“datafied” – we make more and
bigger datasets
Devices
Routine collection
Aggregation & integration
Big Data is “too big”for
conventional approaches
Part 1: Big Data
WikiMedia Commons
5. “Quantity has a quality of its
own”
Often free
Real
Rich, deep, interactions
Needed for ML and other
assumption-light approaches
Why Big Data?
By Ender005 - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=49888192
6. Many diseases with the same clinical presentation have different
molecular phenotypes
Several overlapping terms
stratified: separate patients into groups for treatment
precision:
tailor treatment to individual
improved targeted therapies with fewer side effects
“Right medication, right dose, right patient, right time, right route”
Also personalised, P4 …
E.g. asthma
Why Big Data? Precision medicine
7. Volume
Velocity
Variety
Veracity
Value
The 3 / 4 / 5 Vs of Big Data
By MuhammadAbuHijleh - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=46431834
8. Limits labile to technological
progress
Memory
Compute
Data schema
Solutions: distributed & parallel
computation, new high-end
databases
The problem with volume: tools & platforms
WikiMedia Commons
9. Multiple hypothesis testing
and false discovery
Bias: a sample is not the
population
The Past is not the Present
Observation without
understanding
The curse of dimensionality
Privacy
Some ML-specific issues
The problem with volume: methodology
From KDNuggets
10. Many, many types of data
How do we use multiple types?
Which type do we use?
Disease is systemic
Interactions
Evidence
Solutions: integrated analysis,
independent analysis with
validation
The problem with variety
Wu, Sanin, Wang (2016) Clinical Applications and Systems
Biomedicine
11. Much biodata is uncertain
Noise
Mistakes
People lie
A sample is not a population
Incompatible systems
Most analyses are not reproducible
Solutions: imputation, standards,
cross-validation etc.
The problem with veracity
By Khaydock - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=25102900
12. How do we
Re-use data
Compare data
Store data from multiple sources
Even know what data is
FAIR, OHDSI / OMOPS, HPO
Even just metadata helps for
cataloguing
But: multiple & incomplete
standards, translation, complexity
Solution: Standards & ontologies
WikiMedia Commons
13. Much data cannot leave its
home institution
Hospitals
Registries
Insurance companies
Governance is hard & slow
So take the analysis to the data
Data looks the same but may
be internally different
Solution: Federated analysis
International Collaboration for Autism Registry Epidemiology
14. In a vast sea of biodata, how do you
discover anything? How do you avoid
cherry-picking?
Solutions:
Distinguish discovery from
exploration
Non-parametric methods (e.g.
machine learning)
Some problems don’t have a single
solution but many (e.g. prediction)
The problem with it all: discoverability
EnterpriseKnowledge.com
15. Write analyses as recipes
Snakemake, Nextflow, Flowr
Use recreatable computational
systems
Docker
“Your biggest collaborator is
you, six months ago”
But: it’s work
Solution: Reproducibility
From RevolutionR
16. Big Data is “too big” for current conventional tools & practices
But it’s ideal for solving many biomedical problems
There are problems with valid discovery and just handling the data
Standards, distributed databases and analysis and
Summary: Big Data
17. “a field of Artificial Intelligence”
“(the science of) getting computers to learn and act like humans do”
“getting computers to act without being explicitly programmed”
“computer systems that automatically improve with experience”
“neural networks”
“using statistical techniques to give computer systems the ability to
learn”
Part 2: Machine Learning
18. In practice:
broadly-defined set of
algorithms that recognise &
generalise patterns in data
“non-parametric” or
assumption-light
may require training over
initial dataset
What is Machine Learning?
By Chire - Own work, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=11711077
19. Enough data
Enough compute
Technical progress
Need 'good enough'
solutions
Prediction & forecasting
Categorization
Pattern recognition
Early, startling success
Why now?
Ray Kurzweil The Singularity is Near
21. How is ML different to stats?
Statistical Machine
Assumptions strong weak
Data small large
Optimize by fitting training
Solutions “the best” “good enough”
Hypothesis proof exploration
Test p-values etc. validation
22. In practice:
a field of scientific research
machine learning
neural networks
deep learning
more of an objective than a methodology
computational systems that duplicate / emulate / replace human effort
What is Artificial Intelligence
23. • Many methods
• Broadly split into:
• Unsupervised: finds structure within data
• e.g. (most) clustering, self-organised maps, principal component
analysis
• Supervised: trained using labelled examples
• e.g. regression, decision trees, naive bayes, neural networks
• Categories can blur
• e.g. k-means, nearest neighbour?
• Which is better?
What are ML methods?
24. • (Train a model from data)
• This model encapsulates or generalizes the data
• (Validate the model against test data)
• This model transforms features into labels
• Continuous outputs (e.g. real numbers) are regressions
• Discrete outputs (e.g. categories) are classifications
ML terms & process
25. • Take gene expression profiles from patients and cluster to:
• See genes with similar expression profiles
• Similar patients
• Train a model on radiographs with tumours labelled, use to diagnose
unlabelled images
• Find patients with similar symptoms & signs (computational
phenotypes) in HER
• Train on histories of patients to forecast their future condition
• Find out how terms in a medical corpus relate to each other
Examples of ML
28. What does ‘similar’ mean? How
do we measure it?
Which features & how weighted?
Noise & overlapping clusters
Non-numeric, non-ordered data
What shapes can clusters be?
How many clusters? When do we
stop?
…
Clustering isn’t simple
By Chire - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=17085331
29. Varies but:
Start with record-feature matrix
Normalise data
(“Supervised”: select number of
clusters)
Run algorithm
Validate
Clustering process
WikiMedia Commons
31. A cluster partitioning is a hypothesis
How do we assess? Validate:
External: compare against external label or data
e.g. accuracy, entropy
Internal: goodness of clustering
e.g. sum squared errors, cluster cohesion & separation,
silhouette
Relative: against another clustering scheme
e.g. is this better with 3 or 4 clusters
Validating clusters
32. Average over each point:
1. Calculate the average distance to all
other members of its cluster, a
2. For each other cluster, calculate the
average distance to every member.
The minimum of these is b
3. The silhouette width is (b−a) /
max(a,b), the higher the better
Clustering process
33. What if there are sub-clusters or
structure?
• Use hierarchical clustering
• Use homogeneity or
completeness metrics to
compare
Nesting & hierarchies
34. • Complex, heterogeneous
disease
• Many attempts at clustering
• Use transcriptomic &
proteomic data
• Validate with clinical
• 4 clusters with characteristic
genes & clinical behaviour
Example: asthma
35. a.k.a. deep learning, (artificial)
neural networks, “AI”
A series of layers of nodes, each of
which transforms the previous layer.
Training sets weights on
transformations
Capable of learning representations
Supervised learning: deep networks
WikiMedia Commons
36. There’s little information in an
individual pixel (gene, data point …)
But individual data points make up
more complete entities
Each layer takes the layer below and
creates higher-level entities
(representations) from it.
The system “recognises” higher-
level features that can appear
anywhere in the data.
What’s a representation?
WikiMedia Commons
37. Radiologists are overwhelmed
Want to catch errors &
double-check
Train ANN over medical
imagery with tumour labelled
Accuracy similar to humans
Example: diagnosis from medical imagery
From Nvidia
38. • The model is right but learns
the wrong thing (from our
point of view)
• Solutions:
• Interpreting models
• Better (more examined) data
Problem: useless solutions
Ribeiro et al. (2016) Why Should I Trust You?
39. Reversing the model & asking “why”
What features are important
Mechanistic insight
But many ML models are tangled & horribly complex
And ML community often uninterested
Solutions:
Choose an intepretable model
Software that explores feature space (LIME, Lift, IML)
Problem: interpretability
40. • Bias (systematic error) vs. Variance
(random error)
• Want a model that captures the
regularities in training data AND
generalizes to unseen data.
• This is impossible
• Solutions:
• Use a variety of data
• Feature selection
• Regularization
Problem: how do models get it wrong?
From KDNuggets
41. • What do we want from our ML
models?
• Power / accuracy
• Insight
• Error tolerance
• e.g. drug discovery vs drug safety
Problem: how good do models have to be?
After Harel
42. • Much (most) data has few positives
• Results in an imbalanced model
• Solutions:
• Over- and under-sampling
• Pre-train with poor data
• Ensemble methods
Problem: imbalanced data & lack of data
DataScience.com
43. Machine learning uses large amounts of data with few assumptions to
make models that generalise that data
This is useful for situations where we don’t have an explicit model and
just need ‘a’ solution.
But this means we need to examine our data and validate our
solutions
A ‘bad’ solution can be useful, depending on what you want to
achieve.
Summary: Machine Learning