SlideShare a Scribd company logo
1 of 23
Foundation Multimodels
2 October 2023
Foundation Model vs. Task-Specific Models
• What are foundation models?
• (Extremely) vast (and diverse) training data
• Unlabelled data (mostly)
• Self-supervised learning (SSL) (usually)
• Often based on large language models (LLM, e.g. BERT, GPT)
• Fine-tuned for specific tasks after initial (emergent) learning
• Generative AI? (potentially, depends on task)
• Contrast with task-specific models
• Single model intended to perform specific task
• Relatively “fragile” – can’t usually be effectively repurposed to other tasks,
breaks easily with different data sources
Task-specific Models
Chest x-ray
Chest X-ray Model
Atelectasis
Effusion
Pneumonia
Fibrosis
Abdominal CT scan
Abdominal CT Scan
Model
Ascite
Cyst
Tumour
Stomach Cancer
Retinal Image
Model
Retinal Fundus photo
Diabetic Retinopathy
Age-related Macular Degeneration
Glaucoma
Medical
Notes
• Learns relation
between single
input/modality, and one
(or more) targets
• Task generally
prospectively defined
(inputs known to be
correlated with labels)
(where we started)
Foundation AI Model (FAI)
Chest x-ray
Atelectasis
Effusion
Pneumonia
Fibrosis
Abdominal CT scan
Foundation
Model
Ascite
Cyst
Tumour
Stomach Cancer
Retinal Fundus photo
Diabetic Retinopathy
Age-related Macular Degeneration
Glaucoma
Medical
Notes
• Model has underlying
“foundation” of
“general knowledge”
Initial “general/foundational” Self-Supervised Learning (SSL),
on vast amount of image/textual data (possibly related)
Other
inputs
Other predictions (e.g. Age,
Gender, Alzheimer risk, various
Cancer risks, etc.)
Main disadvantage of a
Foundation AI model
compared to task-specific
models would be its
computational requirements
Why Build A Foundation?
• When humans “know” something,
• Check against diverse pieces of knowledge (discriminative/deductive)
• Derive (new concepts) from diverse pieces of knowledge (generative, often
inductive/probabilistic, sometimes called “creativity”)
• E.g. No idea if Ivory Coast and Mali are next to each other, not explicitly stated in dataset
• But multiple statements of “crossing border from Ivory Coast to Mali”, or vice-versa
• Can infer that they are adjacent (in theory)
• Since there is often no easy way to figure out if some knowledge is
useful, just learn as much as possible (reflected in ever-increasing size
of GPT/LLM models)!
First FAI Model for Ophthalmology
Source: “A foundation model for generalizable disease detection from retinal images”, Zhou et al., Nature 2023
RetFound FAI
• Main distinction from previous
(multitask) models appears to be
the initial large-scale self-
supervised (foundation) learning
• 904,170 colour fundus photos,
736,442 OCT scans, all rescaled to
256x256
• A number of SSL models were
tried, masked autoencoder (with
ViT-large encoder) found to be the
best
RetFound FAI vs. Supervised Learning
• RetFound was compared against
a supervised learning (SL) model
with the same transformer
architecture, and other SSL pre-
training combinations
• SL-ImageNet actually performs
pretty closely to RetFound on
internal validation
• Value of SSL appears more with
external validation (but generally
still not overwhelming)
Masked Autoencoder (Generative SSL)
(trained) ViT
Encoder
FAI Task Adaptation
Test CFP
Encoder
high-level
features
Multilayer
Perceptron
Prediction
• RetFound uses a masked
autoencoder, with the training
objective being to reconstruct
input images from a randomly-
masked version of the image
• Once the autoencoder is trained,
only the (ViT) encoder is used to
generate the high-level features
for task-specific classification
• Comparison against established
SL DCNN models (trained directly
on the [augmented] images,
instead of high-level features)
would have been interesting
Main value of FAI here
lies in producing good
high-level features?
Self-supervised since no
labels are required in
training the autoencoder,
only the (randomly
masked) image itself
Note that actually still
need the task labels, to
train the task-specific
classifiers!
Standard Classification
ELIXR – X-ray LLM+Vision FAI
• Embeddings for Language/Image-aligned X-Rays (thus ELIXR)
• Language-aligned image encoder+Fixed LLM (PaLM 2)
• ELIXR-C is first trained using Contrastive
Language-Image Pre-training (CLIP)
• This aligns a vision-only SupCon image
encoder, with a T5 language encoder
(i.e. learns to bring representations
of an image & associated text closer,
in a shared high-dimensional space)
Source: “ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders”, Xu et al., arXiv 2023
ELIXR – X-ray LLM+Vision FAI
• ELIXR-B then uses the trained ELIXR-C image encoder
and a fixed PaLM2-S LLM; only the adapter between
the image encoder and LLM is trained with an
attention mechanism
• Phase 1: A vision-language model (Q-Former) is
trained to understand & represent both images and
text reports in a shared embedding space, by:
• Image-text contrastive learning (ITC)
• Image-grounded text generation (ITG)
• Image-test matching (ITM)
• Phase 2: The Q-Former + extra MLP to the LLM is then
trained to generate the impressions section of the text
report, from the image embeddings (as image-based
LLM token inputs)
Hugging Face IDEFICS
• IDEFICS is adapted from the Flamingo architecture, which combines
two frozen models: LLaMA (text, main backbone) & OpenClip (vision)
• Major contribution is the preparation of a (very large) OBELICS
multimodal (text & image) web dataset
Sources: “OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”, Laurencon et al., arXiv 2023;
“Flamingo: a Visual Language Model for Few-Shot Learning”, Alayrac et al. arXiv 2022;
https://huggingface.co/docs/transformers/model_doc/idefics
General FAI for Medicine (MedFAI)
• Obvious extension: from “ophthalmology FAI” to “medical FAI”
• Several desirable attributes going “beyond human physicians”:
• Holistic
Modern medicine is necessarily fragmented into specialties
(too much for any single physician to know/learn)
Cross-specialty boundaries difficult to cross
(e.g. eye [images] as window to heart/brain/cardiovascular health etc.)
• Comprehensive
Can in theory query implications of Variable(s) A → Condition/Outcome B, for
any A and B, with evidence-based justification
• Predictive
Physicians generally can only diagnose current conditions, not future
Holistic MedFAI
• Previously, AI models in medicine are generally designed to replicate
existing capabilities, or at least prospectively
• For example, it is known that retinal fundus photographs can be used
to diagnose diabetic retinopathy (DR)
• So we plan to train an AI model to classify DR from retinal photos
• Then just a matter of collecting sufficient labelled data
(both for model development and [external] validation)
• Often encounter delays with data acquisition, model robustness
(if insufficient data)
Holistic MedFAI
• For a general foundation model, the idea is instead to
(retrospectively) throw in all available (reasonably valid) data
• Then, gaps, missing labels and (minor) inaccuracies in data can be
addressed by the vast foundational base of knowledge (possibly from
other specialities, or even outside medicine proper)
• Might expect general knowledge (e.g. “Is this a retinal photograph?”,
“Is this a blurred photograph?”) to be answerable by an FAI with
minimal/no specific training
Comprehensive MedFAI
• For task-specific AI models, a single model is trained to perform a
single, narrowly-scoped task (relate one set of inputs, to one output)
• For multitask AI models, the single model can perform multiple such
tasks, but typically the tasks are still all predefined during
development
Retinal Image
Model
Retinal Fundus photo
Diabetic Retinopathy
Retinal Image
Model
RFP
Diabetic Retinopathy
Age-related Macular Degeneration
Glaucoma
Heart Attack Stroke Parkinson’s From the Ophthalmology FAI
OCT Images
Comprehensive MedFAI
• For a (comprehensive) MedFAI, there are multiple (very many)
possible inputs (images/medical notes/clinical variables), and also
multiple (very many) possible outputs (conditions/diseases/etc.)
• Consider a very conservative model of 100 inputs (with one set of
clinical variables as just one input) and 100 outputs: there are already
10,000 combinations (of course, some more important than others)!
• Then note that the usual task-specific AI (or major journal paper)
covers just one (or a few) of these combinations
• MedFAI Application: in theory, the FAI can systematically go through
all possible combinations, and flag out (discover) promising novel
correlations, for further investigation if necessary
Comprehensive MedFAI – Sparsity
• MedFAI Application: from the available (limited) patient data, is it
possible to diagnose for the condition(s) of interest with (reasonable)
accuracy?
• In practice, patient data is limited (tests are
inconvenient/expensive/invasive/painful)
• Thus, physicians do not have complete data
• Often, do not know if available data is relevant to condition of interest
Comprehensive MedFAI – Test Optimization
• MedFAI Application: if the available patient data is insufficient, what
data (i.e. medical tests) would be needed, to diagnose the condition
to the desired level of accuracy?
• FAI should in theory be able to present various plausible medical test
options, with different advantages/disadvantages (availability,
accuracy, cost, comfort, reduced side-effects, etc.)
• Both patient needs and organizational/national needs can be taken
into account
• On the organizational side, tests can be planned/administered taking
into account utility vs. costs, with evidential backing
Comprehensive MedFAI – (Automatic)
Imputation
• MedFAI Application: from the available (limited) patient data, is it
possible to impute (synthesize/predict) the rest of the patient data?
• In this case, what is predicted is not the ultimate (desired) outcome
itself, but the (unknown) input
• For example, if HbA1c is unknown for a patient, perhaps it might be
imputed to high accuracy given other data such as age, gender, blood
pressure, various imaging scans, etc. in an individualized manner
• The updated patient profile (with imputed data) might then improve
accuracy on the actual desired outcomes
Predictive MedFAI
• Existing AI models largely try to replicate existing physician/grader
abilities/workflow, i.e. diagnose an existing condition
• However, for an (F)AI model, future projection vs. current diagnosis is
“just another task”, that may be possible with sufficient data/labels
• Future prediction generally has lower accuracy than current
diagnosis, which is expected to an extent since patient
agency/external circumstances are not fixed in the intervening period
• Therefore, the potential for performance improvement via additional
data (and FAI) may be relatively greater, than for diagnosis tasks
• Especially relevant for mass preventive programs/interventions
Local Advantages Towards MedFAI Application
• Patient data is (relatively):
• Centralized (only a few integrated healthcare clusters)
• Comprehensive (developed public health system)
• Digitized (readily available for MedFAI development)
• Diverse (multiple ethnicities)
• Unbiased (high and broad coverage)
• Available computing resources
• Chroma @ Alice @ SGH Campus
• Note that current projects are relatively “deep”, i.e. still prospectively define a
small set of inputs, towards some output (with specific engineering)
• FAI would in contrast be relatively “broad”, i.e. from all available inputs,
discover connections with all available outputs
Towards Rapid MedFAI Development
• Data acquisition has often been the major factor delaying past
medical projects (models etc. often standard)
• FAI has no strict data requirements
(can start working with what is available/easily obtained)
• Initial prototype can go forward without complete coverage of all
specialties, then incrementally added when available
• Linking (anonymized) patient data from various sources would
probably be the major issue (when validating)

More Related Content

Similar to Foundation Multimodels.pptx

SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient FlowSHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient FlowAlexander Kolker
 
AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxDivyaGaurav4
 
Module -3 expert system.pptx
Module -3 expert system.pptxModule -3 expert system.pptx
Module -3 expert system.pptxSyedRafiammal1
 
Mutualinfo Deformregistration Upci 04aug05
Mutualinfo Deformregistration Upci 04aug05Mutualinfo Deformregistration Upci 04aug05
Mutualinfo Deformregistration Upci 04aug05martindudziak
 
New challenges monolixday2011
New challenges monolixday2011New challenges monolixday2011
New challenges monolixday2011blaudez
 
Terminology in openEHR
Terminology in openEHRTerminology in openEHR
Terminology in openEHRPablo Pazos
 
FACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRFFACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRFAras Masood
 
Expert System With Python -1
Expert System With Python -1Expert System With Python -1
Expert System With Python -1Ahmad Hussein
 
Week 11 12 chap11 c-2
Week 11 12 chap11 c-2Week 11 12 chap11 c-2
Week 11 12 chap11 c-2Zahir Reza
 
Standardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarStandardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarAhmad C. Bukhari
 
SALUS Presentation in AMIA CRI 2013 - San Francisco
SALUS Presentation in AMIA CRI 2013 - San FranciscoSALUS Presentation in AMIA CRI 2013 - San Francisco
SALUS Presentation in AMIA CRI 2013 - San FranciscoA. Anil Sinaci
 
Effect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient FlowEffect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient FlowAlexander Kolker
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillanceJoel Saltz
 

Similar to Foundation Multimodels.pptx (20)

SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient FlowSHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
 
U mpres
U mpresU mpres
U mpres
 
MedGIFT projects in medical imaging
MedGIFT projects in medical imagingMedGIFT projects in medical imaging
MedGIFT projects in medical imaging
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
 
AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptx
 
Module -3 expert system.pptx
Module -3 expert system.pptxModule -3 expert system.pptx
Module -3 expert system.pptx
 
Mutualinfo Deformregistration Upci 04aug05
Mutualinfo Deformregistration Upci 04aug05Mutualinfo Deformregistration Upci 04aug05
Mutualinfo Deformregistration Upci 04aug05
 
New challenges monolixday2011
New challenges monolixday2011New challenges monolixday2011
New challenges monolixday2011
 
Terminology in openEHR
Terminology in openEHRTerminology in openEHR
Terminology in openEHR
 
FACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRFFACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRF
 
Expert System With Python -1
Expert System With Python -1Expert System With Python -1
Expert System With Python -1
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
Week 11 12 chap11 c-2
Week 11 12 chap11 c-2Week 11 12 chap11 c-2
Week 11 12 chap11 c-2
 
Standardization of the HIPC Data Templates
Standardization of the HIPC Data TemplatesStandardization of the HIPC Data Templates
Standardization of the HIPC Data Templates
 
Standardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarStandardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So Far
 
SALUS Presentation in AMIA CRI 2013 - San Francisco
SALUS Presentation in AMIA CRI 2013 - San FranciscoSALUS Presentation in AMIA CRI 2013 - San Francisco
SALUS Presentation in AMIA CRI 2013 - San Francisco
 
Effect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient FlowEffect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient Flow
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 

More from lauragutierrez90

More from lauragutierrez90 (7)

AI y Oftalmologia .pptx
AI y Oftalmologia .pptxAI y Oftalmologia .pptx
AI y Oftalmologia .pptx
 
Ojo Rojo
Ojo RojoOjo Rojo
Ojo Rojo
 
Fisiologia renal
Fisiologia renalFisiologia renal
Fisiologia renal
 
FIBRILACION AURICULAR
FIBRILACION AURICULARFIBRILACION AURICULAR
FIBRILACION AURICULAR
 
TEP y TVP tromboembolismo Pulmonar
TEP y TVP tromboembolismo PulmonarTEP y TVP tromboembolismo Pulmonar
TEP y TVP tromboembolismo Pulmonar
 
Anatomia de la region órbitaria y globo ocular
Anatomia de la region órbitaria y globo ocularAnatomia de la region órbitaria y globo ocular
Anatomia de la region órbitaria y globo ocular
 
Ambliopia y preimer taller (1)
Ambliopia y preimer taller (1)Ambliopia y preimer taller (1)
Ambliopia y preimer taller (1)
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Foundation Multimodels.pptx

  • 2. Foundation Model vs. Task-Specific Models • What are foundation models? • (Extremely) vast (and diverse) training data • Unlabelled data (mostly) • Self-supervised learning (SSL) (usually) • Often based on large language models (LLM, e.g. BERT, GPT) • Fine-tuned for specific tasks after initial (emergent) learning • Generative AI? (potentially, depends on task) • Contrast with task-specific models • Single model intended to perform specific task • Relatively “fragile” – can’t usually be effectively repurposed to other tasks, breaks easily with different data sources
  • 3. Task-specific Models Chest x-ray Chest X-ray Model Atelectasis Effusion Pneumonia Fibrosis Abdominal CT scan Abdominal CT Scan Model Ascite Cyst Tumour Stomach Cancer Retinal Image Model Retinal Fundus photo Diabetic Retinopathy Age-related Macular Degeneration Glaucoma Medical Notes • Learns relation between single input/modality, and one (or more) targets • Task generally prospectively defined (inputs known to be correlated with labels) (where we started)
  • 4. Foundation AI Model (FAI) Chest x-ray Atelectasis Effusion Pneumonia Fibrosis Abdominal CT scan Foundation Model Ascite Cyst Tumour Stomach Cancer Retinal Fundus photo Diabetic Retinopathy Age-related Macular Degeneration Glaucoma Medical Notes • Model has underlying “foundation” of “general knowledge” Initial “general/foundational” Self-Supervised Learning (SSL), on vast amount of image/textual data (possibly related) Other inputs Other predictions (e.g. Age, Gender, Alzheimer risk, various Cancer risks, etc.) Main disadvantage of a Foundation AI model compared to task-specific models would be its computational requirements
  • 5. Why Build A Foundation? • When humans “know” something, • Check against diverse pieces of knowledge (discriminative/deductive) • Derive (new concepts) from diverse pieces of knowledge (generative, often inductive/probabilistic, sometimes called “creativity”) • E.g. No idea if Ivory Coast and Mali are next to each other, not explicitly stated in dataset • But multiple statements of “crossing border from Ivory Coast to Mali”, or vice-versa • Can infer that they are adjacent (in theory) • Since there is often no easy way to figure out if some knowledge is useful, just learn as much as possible (reflected in ever-increasing size of GPT/LLM models)!
  • 6. First FAI Model for Ophthalmology Source: “A foundation model for generalizable disease detection from retinal images”, Zhou et al., Nature 2023
  • 7. RetFound FAI • Main distinction from previous (multitask) models appears to be the initial large-scale self- supervised (foundation) learning • 904,170 colour fundus photos, 736,442 OCT scans, all rescaled to 256x256 • A number of SSL models were tried, masked autoencoder (with ViT-large encoder) found to be the best
  • 8. RetFound FAI vs. Supervised Learning • RetFound was compared against a supervised learning (SL) model with the same transformer architecture, and other SSL pre- training combinations • SL-ImageNet actually performs pretty closely to RetFound on internal validation • Value of SSL appears more with external validation (but generally still not overwhelming)
  • 9. Masked Autoencoder (Generative SSL) (trained) ViT Encoder FAI Task Adaptation Test CFP Encoder high-level features Multilayer Perceptron Prediction • RetFound uses a masked autoencoder, with the training objective being to reconstruct input images from a randomly- masked version of the image • Once the autoencoder is trained, only the (ViT) encoder is used to generate the high-level features for task-specific classification • Comparison against established SL DCNN models (trained directly on the [augmented] images, instead of high-level features) would have been interesting Main value of FAI here lies in producing good high-level features? Self-supervised since no labels are required in training the autoencoder, only the (randomly masked) image itself Note that actually still need the task labels, to train the task-specific classifiers! Standard Classification
  • 10. ELIXR – X-ray LLM+Vision FAI • Embeddings for Language/Image-aligned X-Rays (thus ELIXR) • Language-aligned image encoder+Fixed LLM (PaLM 2) • ELIXR-C is first trained using Contrastive Language-Image Pre-training (CLIP) • This aligns a vision-only SupCon image encoder, with a T5 language encoder (i.e. learns to bring representations of an image & associated text closer, in a shared high-dimensional space) Source: “ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders”, Xu et al., arXiv 2023
  • 11. ELIXR – X-ray LLM+Vision FAI • ELIXR-B then uses the trained ELIXR-C image encoder and a fixed PaLM2-S LLM; only the adapter between the image encoder and LLM is trained with an attention mechanism • Phase 1: A vision-language model (Q-Former) is trained to understand & represent both images and text reports in a shared embedding space, by: • Image-text contrastive learning (ITC) • Image-grounded text generation (ITG) • Image-test matching (ITM) • Phase 2: The Q-Former + extra MLP to the LLM is then trained to generate the impressions section of the text report, from the image embeddings (as image-based LLM token inputs)
  • 12. Hugging Face IDEFICS • IDEFICS is adapted from the Flamingo architecture, which combines two frozen models: LLaMA (text, main backbone) & OpenClip (vision) • Major contribution is the preparation of a (very large) OBELICS multimodal (text & image) web dataset Sources: “OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”, Laurencon et al., arXiv 2023; “Flamingo: a Visual Language Model for Few-Shot Learning”, Alayrac et al. arXiv 2022; https://huggingface.co/docs/transformers/model_doc/idefics
  • 13. General FAI for Medicine (MedFAI) • Obvious extension: from “ophthalmology FAI” to “medical FAI” • Several desirable attributes going “beyond human physicians”: • Holistic Modern medicine is necessarily fragmented into specialties (too much for any single physician to know/learn) Cross-specialty boundaries difficult to cross (e.g. eye [images] as window to heart/brain/cardiovascular health etc.) • Comprehensive Can in theory query implications of Variable(s) A → Condition/Outcome B, for any A and B, with evidence-based justification • Predictive Physicians generally can only diagnose current conditions, not future
  • 14. Holistic MedFAI • Previously, AI models in medicine are generally designed to replicate existing capabilities, or at least prospectively • For example, it is known that retinal fundus photographs can be used to diagnose diabetic retinopathy (DR) • So we plan to train an AI model to classify DR from retinal photos • Then just a matter of collecting sufficient labelled data (both for model development and [external] validation) • Often encounter delays with data acquisition, model robustness (if insufficient data)
  • 15. Holistic MedFAI • For a general foundation model, the idea is instead to (retrospectively) throw in all available (reasonably valid) data • Then, gaps, missing labels and (minor) inaccuracies in data can be addressed by the vast foundational base of knowledge (possibly from other specialities, or even outside medicine proper) • Might expect general knowledge (e.g. “Is this a retinal photograph?”, “Is this a blurred photograph?”) to be answerable by an FAI with minimal/no specific training
  • 16. Comprehensive MedFAI • For task-specific AI models, a single model is trained to perform a single, narrowly-scoped task (relate one set of inputs, to one output) • For multitask AI models, the single model can perform multiple such tasks, but typically the tasks are still all predefined during development Retinal Image Model Retinal Fundus photo Diabetic Retinopathy Retinal Image Model RFP Diabetic Retinopathy Age-related Macular Degeneration Glaucoma Heart Attack Stroke Parkinson’s From the Ophthalmology FAI OCT Images
  • 17. Comprehensive MedFAI • For a (comprehensive) MedFAI, there are multiple (very many) possible inputs (images/medical notes/clinical variables), and also multiple (very many) possible outputs (conditions/diseases/etc.) • Consider a very conservative model of 100 inputs (with one set of clinical variables as just one input) and 100 outputs: there are already 10,000 combinations (of course, some more important than others)! • Then note that the usual task-specific AI (or major journal paper) covers just one (or a few) of these combinations • MedFAI Application: in theory, the FAI can systematically go through all possible combinations, and flag out (discover) promising novel correlations, for further investigation if necessary
  • 18. Comprehensive MedFAI – Sparsity • MedFAI Application: from the available (limited) patient data, is it possible to diagnose for the condition(s) of interest with (reasonable) accuracy? • In practice, patient data is limited (tests are inconvenient/expensive/invasive/painful) • Thus, physicians do not have complete data • Often, do not know if available data is relevant to condition of interest
  • 19. Comprehensive MedFAI – Test Optimization • MedFAI Application: if the available patient data is insufficient, what data (i.e. medical tests) would be needed, to diagnose the condition to the desired level of accuracy? • FAI should in theory be able to present various plausible medical test options, with different advantages/disadvantages (availability, accuracy, cost, comfort, reduced side-effects, etc.) • Both patient needs and organizational/national needs can be taken into account • On the organizational side, tests can be planned/administered taking into account utility vs. costs, with evidential backing
  • 20. Comprehensive MedFAI – (Automatic) Imputation • MedFAI Application: from the available (limited) patient data, is it possible to impute (synthesize/predict) the rest of the patient data? • In this case, what is predicted is not the ultimate (desired) outcome itself, but the (unknown) input • For example, if HbA1c is unknown for a patient, perhaps it might be imputed to high accuracy given other data such as age, gender, blood pressure, various imaging scans, etc. in an individualized manner • The updated patient profile (with imputed data) might then improve accuracy on the actual desired outcomes
  • 21. Predictive MedFAI • Existing AI models largely try to replicate existing physician/grader abilities/workflow, i.e. diagnose an existing condition • However, for an (F)AI model, future projection vs. current diagnosis is “just another task”, that may be possible with sufficient data/labels • Future prediction generally has lower accuracy than current diagnosis, which is expected to an extent since patient agency/external circumstances are not fixed in the intervening period • Therefore, the potential for performance improvement via additional data (and FAI) may be relatively greater, than for diagnosis tasks • Especially relevant for mass preventive programs/interventions
  • 22. Local Advantages Towards MedFAI Application • Patient data is (relatively): • Centralized (only a few integrated healthcare clusters) • Comprehensive (developed public health system) • Digitized (readily available for MedFAI development) • Diverse (multiple ethnicities) • Unbiased (high and broad coverage) • Available computing resources • Chroma @ Alice @ SGH Campus • Note that current projects are relatively “deep”, i.e. still prospectively define a small set of inputs, towards some output (with specific engineering) • FAI would in contrast be relatively “broad”, i.e. from all available inputs, discover connections with all available outputs
  • 23. Towards Rapid MedFAI Development • Data acquisition has often been the major factor delaying past medical projects (models etc. often standard) • FAI has no strict data requirements (can start working with what is available/easily obtained) • Initial prototype can go forward without complete coverage of all specialties, then incrementally added when available • Linking (anonymized) patient data from various sources would probably be the major issue (when validating)