SlideShare a Scribd company logo
1 of 17
Download to read offline
1
Causal inference from
Causal inference from
nonrandomized data key concepts
nonrandomized data key concepts
and recent trends
and recent trends
이봉호
이봉호
25th Jun, 2021
25th Jun, 2021
2
Contents
Contents
서론
Neyman-Rubin 잠재결과모형
평균처치효과의추정법
3.1. 성향점수기반추정법
3.2. 성향점수의추정
3.3. 회귀분석기반추정
최근연구동향
4.1. 조건부평균처치효과및머신러닝의활용
4.2. 차원의저주
4.3. 구조적인과모형
3
1.서론
1.서론
빅데이터로인한관찰자료의폭발적증가와인관추론관심도증가
실험계획법이생긴이래, 인과관계를규명하는가장확실한방법은임의실험이었음
하지만현실의자료대부분은비실험관찰로수집된자료→적합한인과추론필요
흡연과폐암의인과관계논쟁에서급격하게발전하기시작
인과추론은임상실험과정책결정에서주로이용되었음
빅데이터로인한관찰자료의폭발적증가와인과추론관심도증가
보건학, 경제학, 마케팅등에서많은논문이나오고있음
4
2.Neyman-Rubin 잠재결과모형
2.Neyman-Rubin 잠재결과모형
인과적질문의해석방식중하나는→반사실적결과비교
단일개체i에대하여만약실험군과대조군에배정하였을경우의결과값을각각 , 로나
타낼때ATE는다음과같다.
현실에서는각i에대하여 또는 만관찰된다.
(0)
Yi (1)
Yi
τ := E[Y(1) − Y(0)]
(0)
Yi (1)
Yi
5
2.Neyman-Rubin 잠재결과모형(계속)
2.Neyman-Rubin 잠재결과모형(계속)
이러한구조적결측상황에서도τ의추정이가능하려면아래의가정들이필요하다
무시가능성: 공변량이동일한개체들에대하여는이미잠재결과(잠재결과의분포)가실험분
배기전에상관없이정해져있음
A →B, A ←Z →B 일때, Z= 유일한교란변수임을가정
양수성: 임의의X = x를공변량으로취하는개체들에대하여처치군과대조군이고르게분배
되어있음
Apple과Orange를비교하려면둘다먹어본사람이있어야한다.
일치성: 처치분배기전이오류없이기록되었음을보장
무시가능성과양수성에기반해서ATE는다음과같이표기할수있다.
τ = E[Y(1)|A = 1] − E[Y(1)|A = 0](무시가능성)
= E[Y|A = 1] − E[Y|A = 0](일치성)
6
3. 평균처치효과의추정법
3. 평균처치효과의추정법
7
3.1. 성향점수기반추정법
3.1. 성향점수기반추정법
평균처치효과추정방법: 1) 성향점수기반2)회귀분석기반3) 1 + 2번결합
직관적으로볼때실험군과대조군의결과들의평균의차이를이용해서추정
특정공변량을가지는개체가표본에서다수를차지할경우→Bias Problem
표본론에서는사후층화, 인과추론에는성향점수를통해서보정
성향점수매칭: score에따라100개의군으로나누고그안에서0군과1군을각각동수로무
작위추출(Score는로지스틱(Treat여부~ 교란변수로계산)
성향점수는주어진공변량에대하여실험군에대응될확률로정의되고
으로나타낼수있다
성향점수를추정하였다는가정하에, 평균처치효과에대해역확률추정량고려가능
각표본데이터의출현확률역수를각표본에가중해서가상모집단을만듦
안정된성향점수를추정하려는노력이꾸준히있음(Matching, Stratification 등)
R 패키지는MatchIt이있음
최근에는표본크기의증가→공변량갯수증가→차원의저주해결방안도연구중
= − (1 − )
τ̂ 
Naive 1
n0
∑
n
i=1
Ai Yi
1
n1
∑
n
i=1
Ai Yi
π(x) := P(A = 1|X = x)
8
3.2 성향점수의추정
3.2 성향점수의추정
역확률가중치사용시추정량의성능은성향점수모형의선택과추정방법에영향받음
최대가능도기반의성향점수대신, 공변량의분포를균형되게만드는성향점수(covariate
balancing propensity score; CBPS)의사용을제안
다음방정식의해 를이용, 를성향점수로사용
R 패키지CBPS에해당절차를구현하였음
공변량이란→관심변수이외에종속변수에영향을줄수있는모든변수
CBPS 방법은성향점수가바르게적시되지않은상황에서최대가능도기반방법보다로버스트한
효과크기추정량을제공함이경험적으로알려진바있음
성향점수가낮거나너무높은개체의존재가존재할경우양수성가정에위배될가능성이커지게
되며, 아무리성향점수모형을맞게적시하여도추정량의분산이큰값을취하게되고유한표본성
능이저하된다→가지치기고려
추정된성향점수가특정구간, 이를테면 에속하는표본들만남겨이후분석을시행
α̂ 
CB
π
α̂ 
CB
( − ) = 0
1
n
∑
n
i=1
Ai
πα Xi
Xi
1−Ai
1− ( )
πα Xi
Xi
링크
[a1, a2]
9
3.3. 회귀분석기반추정법
3.3. 회귀분석기반추정법
직관적으로이해하기쉬운방법으로다음과같이표현함
을위모형에맞춰정리하면다음과같다.
가정을만족시, 선형모형이참이라는전제하에 는 와동일
회귀계수추정법을활용해서 ATE의추정량으로사용할수있다.
일반화해서정리하면 로정리할수있음
Q의모형족을올바르게선택할수있다면 추정가능
회귀분석기반추정법은결과값에대한모형의적시를요구
성향점수/회귀분석기반방법론및이중로버스트방법론이안정적인추정량을제공하기위해서는
결과값함수Q 및성향점수π의모형족을올바르게적시해야함
E(Y|A, X) = + + X + A
Aγ β0
Tβ1 X
T
β2
E[Y|A = 1] − E[Y|A = 0]
τ = γ + E(X)
T
β2
E(X) = 0 γ τ
γ̂ 
E(Y|A = a, X = x) = Q(a, x)
τ
10
4. 최근연구동향
4. 최근연구동향
11
4.1. 조건부평균처치효과및머신러닝의활용
4.1. 조건부평균처치효과및머신러닝의활용
전통적으로평균처치효과는집단단위였으나개인단위효과고려필요
Conditional Average Treatment Effect(CATE)
Individualized Treatment Rule(ITR)
(x)를추정하는가장직관적인방법은 의모형화에기반한
회귀분석기반추정, 는그이후간접적으로계산
최근에는비모수적모형과심층신경망등으로π(x)와Q(a,x)에최대한유연한함수공간을상정하
는시도들이제안되고있음
τ(x) := E[Y(1) − Y(0)|X = x]
= E[Y|A = 1, X = x] − E[Y|A = 0, X = x](∵ 무시가능성/일치성)
δ(x) = 1, if τ(x) ≥ 0, 0, if τ(x) < 0
E[Y|A = a, X = x] = Q(a, x)
δ(x)
12
4.2 차원의저주
4.2 차원의저주
직관적으로는, 가능한모든종류의교란변수를공변량X에포함하면무시가능성가정을만족시킬
가능성이더높아질수도있다.
문제는양수성가정의위반인데, 공변량의차원이증가할수록단위개체의근접이웃개체들을찾
기가힘들게되고, 따라서어떤공변량 에대하여 또는0 일가능성이
더커지게된다.
공변량이고차원일경우의관찰연구는주로참성향점수가희소로지스틱선형모형을가정하는경
우이루어졌다.
교차추정을고려하는사람들도있었음교차추정은표본을두집합으로나누어, 한집합은성향점
수의추정에만사용하고, 다른집합으로처치효과계산절차수행
x0 P(A = 1| = ) = 1
X0 x0
13
4.3. 구조적인과모형
4.3. 구조적인과모형
사회과학에서인과관계의효과를연구하는전통적인주요연구방법은구조방정식
구조방정식은모수적가정과분포가정이강하다는지적을받아왔으나, 보다근본적으로는‘개
입’의개념을명확하게설명하지못한다는한계가있었음
Pearl J., DAG기반비모수구조적인과모형으로확장+ Do Calculus 제안
Z = , X = + , Y = β/X +
UZ Zα UX UY
14
예시
예시
X,Y,Z는관심대상의확률변수이며, 는독립오차항
연산은확률변수X를 처럼외생적으로고정시키는상황
Z →X의Causal Path는끊어지게됨
좌측:
관찰된자료(X,Y,Z)는식(4.2)를따르는확률분포P의표본
우측:
(Y, Z)의결합분포는 이고 로정의
, ,
UX UY UZ
do(X = )
x0 X = x0
Z = ( ), X = f X(Z, ), Y = f Y(X, Z, )
fZ UZ UX UY
Z = ( ), X = , Y = ( , Z, )
fZ UZ x0 fY x0 UY
P(Y, Z|do(X = ))
x0 (Y, Z)
Pm
15
예시(계속)
예시(계속)
위두식을비교하면다음과같은사실정리할수있음
1.
2.
3.
(Z = z|X = ) = Pm(Z = z)
x0
(Z) = P(Z)
Pm
(Y|Z, X) = P(Y|Z, X)
Pm
P(Y = y|do(X = )) = Pm(Y = y|X = )
x0 x0
= (Y = y|X = , Z = z) ⋅ (Z = z|X = )
∑
z
Pm x0 Pm x0
= (Y = y|X = , Z = z) ⋅ (Z = z)
∑
z
Pm x0 Pm
= P(Y = y|X = , Z = z) ⋅ P(Z = z)
∑ x0
z
16
예시(정리)
예시(정리)
에의해서술되므로자료에의한경험적추정이가능
관찰된자료로부터 를계산할수있게된
를공변량, 를처치, 를결과값이라볼때, 무시가능성과일치성가정하에서다음증명가능
P
P(Y = y|do(X = ))
x0
Z X Y
τ(ATE) = E[Y(1)|A = 1] − E[Y(1)|A = 0](무시가능성)
= E[Y|A = 1] − E[Y|A = 0](일치성)
17
결론
결론
관찰자료자체에서는검증할수없는가정을하는점에서근본적인한계
그럼에도개입의효과를측정하기위한개념틀을제공한다는점에서유용
지면상누락된분석기법들은다음과같음
성향점수가잘못적시된시나리오에서처치효과추정량의변화를검토함으로써추정량의안정
성을측정하는기법들(민감도분석, sensitivity analysis)
인과경로에중개변수가존재하는경우의처치효과추정기법(중개분석, mediation
analysis)
두번이상의처치에대한의사결정이필요한경우의처치효과추정기법(교차분석,
interaction analysis) -표본내개체간에처치효과가간접적영향을미치는시나리오에서간
접효과(spillover effect)를보정하는기법

More Related Content

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Causal inference from nonrandomized data key concepts and recent trends