2024_개보위_개인정보 미래포럼_의료 인공지능 모델과 프라이버시 이슈.pdf

Why does medical AI
need more datasets?
And Its privacy concerns.
의료 인공지능 모델과
프라이버시 이슈
Medical Imaging & Intelligent Reality Lab.
Convergence Medicine/Radiology,
University of Ulsan College of Medicine
Asan Medical Center
Namkug Kim, PhD www.mi2rl.co

Conflict of Interests
Establishments of Somansa Inc, CyberMed Inc, Clinical Imaging Solution Inc, and Anymedi
Inc.
Researches with LG Electronics, Coreline Soft Inc.,, Osstem Implant, CGBio, VUNO,
Kakaobrain, AnyMedi Inc.

Venn Diagram of Generative AI
AI
ML(1950s-)
DL(2012-)
Generative AI
Computer Vision
(image, video)
NLP
(Text, speech)
LLM
Data Science Big Data
LVM
LMM
Modified by NKim

Challenges in Medicine?
4
Clinical
Needs
More
accuracy,
Help
for
clinician,
Cost
effective,
Workflow
optimization
Etc.
Technical
Push
Digital
Transform,
AI,
Deep
Learning,
LLM,
Genetic
Evolution,
Robotics,
etc
AI (deep learning) Minimal invasive
surgery
Robotics Digital Transform
ChatGPT (LLM) Computer Aided Surgery
Generative AI
On device AI, Edge
computing, (On premise)
LVM/LMM
Foundation
Model Digital Twin
Artificial
Organ
AGI Causal AI
Quantum AI
Intelligent /
Quantitative
Medicine
Learn2Learn
Autoregressive
Continual AI
Metaverse
LangChain
Predictive
Medicine
Quantum
computing

Five tribes of AI research
From Pedro Domingos, Professor, University of Washington at ML conf ATL

Connectionism
Comparative psychologist
S-R framework of behavioral
psychology
Associations forming between stimuli and
responses.
become strengthened or weakened by the
S-R pairings.
trial and error learning due to rewards.
connectionism : learning could be
adequately explained without referring to
any unobservable internal states.
6
Ivan P. Pavlov(1849-1936),
Nobel Prize@1904
Edward Lee Thorndike
(1874-1949)
Oliver Gordon Selfridge
(1926-2008)
Thorndike’s “Law of Effect” (1920’s)
Reward strengthens connections for
operant response
Selfridge’s Pandemonium (1955)
A parallel, distributed model of
pattern recognition
Lindsey & Norman’s ‘Human Information
Processing’ (1977)

Representation Learning
Encoder, Decoder, Latent Space
9

Data Labeling
Number1
Acceptable Performance
55000 labels per class
Human Level Performance
1M labels per class
Issues especially in medical data
Too expensive
Inaccurate
Incomplete/ambigous
laborious
11
Ian Goodfellow, Yoshua Bengio, Aaron Courville 'Deep Learning’
https://www.deeplearningbook.org/
How infants may learn …

Baby/Animal perception
12
Courtesy of Le Cun’s presentation.
Courtesy of Le Cun’s presentation. Hubel and Wiesel, The Journal of
physiology 148(3), 574–591 (1959)

Big Transfer (BiT): General Visual
Representation Learning (2020)
14

Self-Supervised Learning: Contrastive Learning
Pretext Tasks
Learn representation that could help downstream taks
Exemplar (2014), Context Prediction (2015), Jigsaw puzzle (2016), …
Contrastive Learning
Make similar/dissimilar data near/far
15
https://blog.research.google/2021/06/extending-contrastive-learning-
to.html
BERT (Devlin 2018), RoBERTa (Ott 2019)
Courtesy of Alfred
Canziani

ML vs SSL
16
데이터 ~ 가우시안 분포

Self-Supervised Learning: Contrastive Learning
17
Emerging properties in self-supervised vision transformers. ICCV 2021
(pp. 9650-9660).

Generative Model
20
Courtesy of Seungbin Im, https://horizon.kias.re.kr/25133/

How to teach :
Supervised, Semi-Supervised, Unsupervised, Self-Supervised
23
https://arxiv.org/pdf/2002.08721.pdf

Autoregressive Inference
28
Obligatory RNN diagram. Source: Chris Olah.

1542M
762M
345M
117M
GPT-1 2018, Oneway attention w transformer
GPT-2 2019, 1.5B parms on 40G texts (10 times bigger)
GPT-3: 2020,175B parms on 570GB? texts (100 times bigger)
GPT-3.5: 2022.11(5), 175?B parms
GPT-4: 2023.3(8), 1.74?TB parms (4K -> 8K -> 32K -> 128K tokens)
PaLM： 540B parms
PaLM2： ?Ｂ parms
Claude : ? parms, (100K token), w custom transformer
Gemini : 2023.11, ?T parms (32k token) w PaLM
Gemini Ultra : 2024.2
LLM : Size matter!
https://jalammar.github.io/illustrated-gpt2/

Emergent Ability of LLM
Emergent ability:
Non-Linearity of LLM’s
performance at
specific size!
Interactive
Zero-shot, few-shot
learning
Etc
Prompting, Chang-of-
thought
32
arXiv:2206.07682; Google Research, Stanford, UNC Chapel Hill,
DeepMind

Emergent Ability of LLM
33
Wei et al., 2022; also see Nye et al., 2021

Scaling Law in LLM
LLM 성능 (데이터 크기 D, 계산량 C, 모델크기(w/o
임베딩) N)
• 모델의 형태(width, depth) << 모델의 크기
• 다른 항목에 의해 병목 현상이 발생하지 않는 경우 세 가
지 요소 N, D, C과 각각 power-law관계
• N와 D를 같이 키우면 성능은 예측 가능하게 증가
, 한쪽을 고정하면 어느 시점에서 성능이 향상되지 않는다.
→ 큰 모델은 더 많은 데이터를 필요로 한다.
• 모델 크기(w 임베딩 레이어), 모델 크기 증가 != 성능향상,
모델 크기(w/o 임베딩 레이어), 모델 크기 증가 -> 성능향
상 → 임베딩 레이어 크기가 매우 큼.
• 큰 모델은 작은 모델에 비해 동일한 성능에 더 적은 데이
터로 도달.
→ 데이터 대비 학습 효과가 좋다(Test Loss 기준).
34
arXiv:2001.08361, Jared Kaplan ∗,. Open AI -> Anthropic

Data scale for LLM
35
https://github.com/Mooler0410/LLMsPracticalGuide

RLHF: RL with Human Feedback
Training language models to follow instructions with human feedback,
arXiv:2203.02155

Unsupervised, Supervised fine-tuning, RLHF
The development of an LLM visualised as the beast Shoggoth
(courtesy of Helen Toner’s post)

4 months: Emerging Topics; LLM

Evolutionary Tree of LLM; 선캄브리아
진화
폭발적, 지수적
2017 transformer
2020
거대 모델 emergent ability
2022
거대모델 대중화
2023
5-7 3개월 만여개 LLM
10, 100, 10000
10 foundation model
100 application models (2주)
10000 finetuning models (의지)
43
# Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

랭체인(LangChain)
언어 모델(LLM)을 기반으로 애플리케이션을 구축하기 위한
오픈 소스 프레임워크
54

LLM extension
AlphaZero(2018) -> AlphaTensor(2022), AlphaDev(2023) w transformer
AlphaCode2(2023) w Gemini Pro, AlphaMissense(2023), GNOME(2023)
AlphaGeometry(2024)
FunSearch (PaLM2)
Making new discoveries in mathematical sciences using LLMs
55
Nature 625, 468–475 (2024).

Foundation Model
56
“AI is undergoing paradigm shift with the rise of models that are
trained on broad data at scale and are adaptable to a wide range
of downstream tasks.” -> foundation models
• Keys for foundation models
• Unsupervised/Self Supervised Learning
• No lables
• Easy Adaptation
• Zero-shot: apply to new tasks without any training
examples for those specific tasks
• Linear probe: train a linear model on the features
• Fine-tune: adjust the entire network to perform better in
the target task
• Examples: BERT, GPT, CLIP, etc
Rishi Bommasani, arXiv:2108.07258, Standford, 200pg
Opportunities
Increase developer velocity
Reduce infrastructure cost
Tapping into state-of-the-art AI
Risks
Bias propagation
Increasing model scale
Closed-source mode

57
• Long tail ?
• Explainable and
generalizable model
• Privacy preserving
• Integraiton with LLM

Dataset:
904,170 CFPs from MEH-MIDAS, Kaggle EyePACS
736,442 OCT scans from Cell 172, 1122–1131.e9
(2018)
Processing: AutoMorph, 256x256,
Masked autoencoder with encoder, ViT-large;
decoder, ViT-small
With SimCLR, SwAV, DINO, MoCo-v3
Eight NVIDIA Tesla A100 (40 GB), 14day
58
Nature, 2023

My Recent Research
Generative Model
Diffusion Inversion/AE
LVM, LMM
Foundation model
Digital Twin
AR/3DP/Robot Integration
63

혁신 경쟁 어디서 오는가?
중국 vs 미국/유럽
미국 vs 유럽
한국은?
83
VS
VS

Big data : How to create value
IoT Thermometer
Kinsa : Startup @ USA
Real-time body temperature bigdata@USA
Patients-derived health data
Regional basis
Influenza stats
Kinsa : Realtime vs CDC : 3 week delay
B2B model:
Demands and production
독감 예방 접종 혹은 항생제, 살균제 등의 약
칫솔, 오렌지 쥬스, 수프 등
Influenza trends: comparison
with CDC 2.5 y

Innovation of Openness
Since 1980 for
big data
research
85

IoT + AI?
Dexcom
연속혈당측정기기:1형 당뇨
Proteus Digital Health
알약형 센서: 환자의 약 섭취 여부 기록
C형 간염 환자 : Gilead의 Harvoni (수만
UDS/1m)
애플워치 (wearable device)
불면증, 고혈암, 심장 부정맥, (암, AD)
Color Genomics, 23andMe
유전자 키트
86
소아당뇨(TypeI) 환자가 한달동안 맞아야 하는 인슐린 주사양. 추가로 수시로
혈당을 체크하기 위해 바늘을 하루 최대 20번 찔러야 한다. (대니재단제공)

개인정보 수집, 이용, 생성, 활용
PHR Sharing
Network
87
THEDATAMAP
https://thedatamap.org/map2013/index.php

2019년 GPT-2는 10까지 안정적으로 셀 수 없었습니다. 불
과 4년이 지난 지금, 딥러닝 시스템은 소프트웨어를 작성
하고, 필요에 따라 사실적인 장면을 생성하고, 지적 주제
에 대해 조언하고, 언어와 이미지 처리를 결합하여 로봇
을 조종할 수 있습니다. AI 개발자가 이러한 시스템을 확
장함에 따라 명시적인 프로그래밍 없이도 예기치 않은 능
력과 행동이 자연스럽게 나타납니다. AI의 발전은 많은 사
람들에게 놀라울 정도로 빠르게 이루어지고 있습니다.
88

가명처리방법 재식별 가능?
법적 이슈; 법적 책임
주제 (재식별 당사자)
기술적 이슈; Endless
war!
해커와 보안, 자물쇠
와 열쇠
Hash 인정 안됨 ->
인정됨 (Quantum
computing?)
89
보건의료데이터가이드라인. 2024.1.19

얼굴 아닌 식별력 있는 개인정보?
90
Diffusion Model로 생성된 국과수 지문@MI2RL
Retina Image GAN https://arxiv.org/pdf/1907.12296
ECG GAN https://arxiv.org/abs/1909.09150

재식별 in detail (영상의 경우)
DICOM Header 삭제
3D영상의 피부 부분 삭제
(얼굴 vs whole body)
성리학적 세계관에 근거한 근
본주의적 사고
의료영상만 문제?
그 자체가 biometric
Fingerprint, retinal pattern
Master table이 없는 경우 재
식별화가 안됨
91
정책적 해결 필요
재식별 주체를 처벌!
Honest broker 활용
Outlier 케이스 경우 opt
in/out으로 처리
Blue button; Data account; opt
out / in
선한 사마리안 연구자개념
(vs 회사 연구자와 구분)
Sampling 기반으로 평가/처
리 필요

생성형 모델에서 개인정보 이슈
92

Ethical use of AI in radiology
promote wellbeing, minimize harm, and ensure that the benefits and harms are
distributed among the possible stakeholders in a just manner
appropriately transparent and highly dependable, curtail bias in decision making,
and ensure that responsibility and accountability remains with human designers or
operators.
The radiology community should start now to develop codes of ethics and practice
for AI.
Radiologists will remain ultimately responsible for patient care and will need to
acquire new skills to do their best for patients in the new AI ecosystem
94
규제 vs 활용의 밸런스

Collaborators
Radiology
Joon Beom Seo, SangMin LeeA,B
Dong Hyun Yang, Ji Eun Park, Jeong Hyun Lee,
Gilsun Hong, Haejeon Hwang, Yangshin Choi
Pathology
Hyunjeong Go, Gyuheon Choi, Gyungyub Gong,
Yongmi Cho, Seongmo Hong, Sangjeong Ahn
Orthopedics
Kyeonghwan Ko
Anesthesiology
Sung-Hoon Kim, Eun Ho Lee
Neurology
Dong-Wha Kang, Jaehong Lee, Beomjun
Kim, Eun-Jae Lee,
Surgery
Beom Seok Ko, JongHun Jeong
Songchuk Kim, Tae-Yon Sung,
Gastroenterology
Jeongsik Byeon, Kang Mo Kim, Do-hoon Kim,
Jiyoung Lee,
Emergency Medicine
Dong-Woo Seo
Pulmonology and Critical Care Medicine
Sei Won Lee, Jin-won Huh
Plastic Surgery
Woosik Jeong, Jongwoo Choi
Ophthalmology
Yoonjun Kim
ENT
Jongwoo Jeong

2024_개보위_개인정보 미래포럼_의료 인공지능 모델과 프라이버시 이슈.pdf

Recommended

Recommended

More Related Content

Similar to 2024_개보위_개인정보 미래포럼_의료 인공지능 모델과 프라이버시 이슈.pdf

Similar to 2024_개보위_개인정보 미래포럼_의료 인공지능 모델과 프라이버시 이슈.pdf (20)

More from Namkug Kim

More from Namkug Kim (20)

Recently uploaded

Recently uploaded (20)

2024_개보위_개인정보 미래포럼_의료 인공지능 모델과 프라이버시 이슈.pdf