SlideShare a Scribd company logo
1 of 22
Download to read offline
7 Juillet 2014
Christophe Geissler,
Quinten,
IAF.
DÉTECTION DE PROFILS:
A¨PPLICATIONS EN SANTE ET EN
ECONOMETRIE
1) QUINTEN EN BREF
2) SCIENCES DE LA VIE ET PREDICTION
3) ETUDES DE CAS
4) COMPARAISON DE METHODES
5) SUJETS DE RECHERCHE
PLAN
Hommage à AK, ~780-850
QUINTEN IN SHORT
A company providing data-oriented strategic advisory.
Since 2008, over 100 missions for more than 25 clients
Historical focus on Life Sciences and Healthcare
Now extending to CRM, Insurance and Investment
18 employees, self-financed, annual average growth of 40%
80% of the revenue reinvested in R&D each year, including a proprietary learning
technology
Active member of several technology clusters: Medicen
3
REFERENCES
4
THE HEALTHCARE SECTOR AS ADVANCED ALGORITHMIC
PRESCRIPTOR ?
The prediction/classification needs in life sciences have evolved.
Huge increase of available variables
Limited size of samples (often < 1000) for economic reasons
These needs are not fully met by predictive approaches.
Need for evidence-based methods
Trade-off between predictive power and auditability of recommendations
Exponential increase in computation capacity open the way for exploration-
based methods
With an increasing risk of overfitting the data
Correlation with similar trends in CRM.
Customer profiling: data gathering is key.
5
ALGORITHMIC NEEDS IN EPIDEMIOLOGICAL STUDIES
Databases have large sets of variables (#V >> #Obs)
Practitioners often wish to get rid of a priori selection (or hierarchization) of
variables
Poor tractability by most kinds of regression models
Using ‘sparsity’, ie penalizing complexity in order to simplify the model, does not fully solve the
problem
Leaving the cartesian paradigm: a single ((very)complex) function driving globally the
entirety of the visible phenomena
For a heuristic approach: accepting the possibility of multiple, local, partially
correlated causes to be discovered: the ‘profiles’.
Interpretability of the profiles and descriptive parsimony are mandatory: no black-
box or randomized results.
6
PREDICTION VS DESCRIPTION IN SUPERVISED
METHODS
Supervised problems, ie where there training data are ‘labeled’ by a variable Y to be
explained. Y is the ‘interest phenomenon’.
Y can be a boolean (treatment outcome) or a continuous variable (loss amount, etc).
Explanatory variables X = (Xi)i=1..V in RV, continuous or discrete with possibly missing values.
Predictor: a function Ŷ = F (X) : RV  Dom(Y) verifying: Var(Ŷ – Y | X) < Var (Y)
Explanatory power: capacity to ‘simply’ describe the sets F-1 ([s, 1]), i.e answering the
question ‘Who are the strong responders ?’
Simplicity can be formalized, always imply the number of variables involved in the
predictors.
Simplicity is key when targeting large sets of ‘new’ individuals (not in the training sample).
7
THE PREDICTIVE VS EXPLANATORY TRADE-OFF
8
Problem: separating
‘nicely’ red from blue
points in R2.
Dark colors in the
training sample, light
colors in the test
sample.
THE PREDICTIVE VS EXPLANATORY TRADE-OFF
9
Running four prediction techniques on the
previous set.
Colored areas depending on the predicted
value.
How many words are needed to describe
the dark shaded areas ?
Poor response of linear separators (SVM)
indicate that more dimensions could be
needed in order to improve the description.
PROFILE SEARCH VS DECISION TREES
10
Decisions trees look for optimal cut-offs on explanatory variables: partition of space in non-overlapping regions.
Profile search allows for some controlled degree of intersection.
Toy data-base with a phenomenon taking place on two
overlapping rectangles on variables a and b, hidden
among 250 random variables. CART response: up to 14 levels to partition space
507 patients
Typology 1
6,4 % AEX
507 patients
Typology 2
10% AEX
808 patients
Typology 3
13% AEX
USE CASE IN HEALTHCARE
CLUSTERING : A NON SUPERVISED APPROACH
Database : 2000 patients / 1000 variables
Patient without
Adverse Event X
Patient with
Adverse Event X
10% got the Adverse Event X (200 patients)
Singular value
Decomposition
(SVD) : Clustering
(PCA, K-Means ...)
11
Are there various typologies of patients in this database ?
Do these typologies show any deviations with regard to Adverse Event X ?
Are these difference important enough to avoid treating some typologies ?
ASSOCIATIVE RULES DISCOVERY: QFINDER ALGORITHM
Identification and characterization of singular profiles
Database : 2000 patients / 1000 variables
Patient without
Adverse Event X
Patient with
Adverse Event X
10% got the Adverse Event X (200 patients)
Data processing
(QFinder)
12
What are the various profiles of patients with the highest risk of Adverse Event X ?
What are the key characteristics of each of these profiles ?
How to prevent Adverse Event X ?
Age > 56
Average Daily Dose = High
Treatment duration > 50 days
126 patients
47% Adverse Event X
108 patients
60% Adverse Event X
Gender : female
Diabetes =Yes
Menopause = Yes
59 patients
75% Adverse Event X
Blood Pressure = High
Dyslipidemia = Yes
Interpretable and actionable results
Optimality of recommendations
MANY CRITERIA HAVE LITTLE OR NO INFLUENCE
EXAMPLE OF PROFILE
Detection of mutually influent factors not seen by regressions
ACTION : AVOID THE HIGH DOSE ON PATIENTS > 56 TREATED > 50 DAYS
AVOID TREATING MORE THAN 50 DAYS PATIENTS > 56 WITH THE HIGH DOSE
10%
Database size :
2000 patients
(100%)
Average rate of adverse events : 10%
13
90%
65%
Size :
739
patients(37%)
AGE > 56
11% 89%
69%
Size :
936
patients(47%)
TREATMENT DURATION > 50 days
8% 92%
Size :
647
patients(32%)
AVERAGE DAILY DOSE : HIGH
13% 87%
HOWEVER Q-FINDER WAS ABLE TO DETECT THEIR COMBINED
INFLUENCE WHEN RELEVANT
Profile size :
126
patient(6,3%)
Patients matching the following characteristics :
Are 4,7 more likely to trigger adverse events
AGE > 56
TREATMENT DURATION > 50 days
AVERAGE DAILY DOSE : HIGH
84%47% 53%
USING PROFILE DETECTION IN INVESTMENT
14
Using machine learning for the detection of recurrent biases on the returns of main assets
classes (interest rates, equity indices, currencies).
Empirical facts:
Financial markets are interaction hubs for investors having a huge diversity in horizon and
risk aversion.
Fluctuations can therefore be caused by a large number of potential factors.
The influence of these factors is not uniform through time.
GLM-type approaches are too difficult to calibrate and yield unstable results.
Retained approach:
Search for signifiant profiles, characterized by conditions on a limited number of variables.
Profiles can be partially intersected. No predefined hierarchy on the variables.
Creating derived variables from primary variables: stationarity and variety.
USING PROFILE DETECTION IN INVESTMENT
Présentation commerciale 2014 15
Exemple:
• Y(t) = D Bund (1month) / stdev (D Bund (1 month))
• 250 explanatory variables:
• Eurozone, US economic indicators
• Interest rates levels and dynamics
• Central money data
• Inflationary anticipations (inflation swaps)
• Risk premia on equity markets
• Energy prices
• Volatilities, correlations
• Training period: 1999-2013.
Average (Y(t), <Training period>) = +0.15 s
-15
-10
-5
0
5
10
19991101
20000428
20001026
20010426
20011024
20020423
20021021
20030421
20031017
20040415
20041014
20050412
20051011
20060411
20061009
20070409
20071008
20080404
20081001
20090401
20090929
20100326
20100924
20110323
20110921
20120321
20120919
Dbund = f(t)
USING PROFILE DETECTION IN INVESTMENT
16
Stylized fact 1: Sharp drop in German equities  increase in risk aversion  rise in
German Govt Bonds.
Validating hypothesis:
X1 = Decile (D (E/P_ratio (Dax) – Bobl yield)).
Interpretation: 3 month variation in German equity risk
premium
r = Correlation (X1, Y) = 9%, R2 = 0.8% :
Decile analysis: E(Y | X1)
Non linearity
General trend conform with intuition -0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 - 2 1 - 3 2 - 4 3 - 5 4 - 6 5 - 7 6 - 8 7 - 9 8 - 10
E(Dbund) = f(Dprime Dax)
USING PROFILE DETECTION IN INVESTMENT
17
Stylized fact 2: Growth acceleration in monetary aggregates  future rise in inflation
loss in Govt Bonds.
Hypothesis validation:
X2 : Decile (D M3 (3 month)) .
r = Correlation (X2, Y) = -1.5%, R2 = 0.4%.
Decile analysis:
Non linearity
General trend conform with intuition
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 - 3 1 - 4 2 - 5 3 - 6 4 - 7 5 - 8 6 - 9 7 - 10 8 - 11
E(DBund) = f(DM3)
USING PROFILE DETECTION IN INVESTMENT
18
When: X1 >= 5 D (DAX Risk Premium) > 5th decile
AND
X2 in [2, 6] D (M3) between 2nd and 6th decile
Then:
E(Y | X1,X2) = +0.83s, True on 21.5% of observations between 1999 and 2012.
These conditions form a market profile. Information ratio: 0.83 x (21.5% x 260/20)0.5 = 1.05
Strong synergy between variables: +90% increase in conditional expectation on Bund performance .
0
1
2
3
4
5
6
7
8
9
-1.5 s
-1.0 s
-0.5 s
0.0 s
0.5 s
1.0 s
1.5 s
0
1
2
3
4
5
6
7
8
9
Espéranceconditionnelle
Influence combinée des deux variables
1.0 s-1.5 s
0.5 s-1.0 s
0.0 s-0.5 s
-0.5 s-0.0 s
-1.0 s--0.5 s
-1.5 s--1.0 s
Combined influence
0
1
19991101
20000107
20000316
20000524
20000802
20001010
20001219
20010226
20010507
20010713
20010921
20011129
20020206
20020417
20020625
20020902
20021108
20030117
20030328
20030605
20030813
20031021
20031229
20040308
20040514
20040722
20040930
20041208
20050216
20050425
20050704
20050909
20051118
20060126
20060406
20060614
20060822
20061031
20070109
20070319
20070525
20070803
20071012
20071220
20080227
20080506
20080715
20080920
20081128
20090205
20090416
20090624
20090902
20091110
20100118
20100325
20100603
20100811
20101020
20101228
20110307
20110513
20110722
20110929
20111207
20120215
20120425
20120703
20120911
20121119
20130125
Occurrences historiques du profil
113 independent occurrences in 14 years
MANAGING THE RISK OF OVERFITTING
19
Parameter Role Influence on
overfitting risk
P Size of training sample P↑: risk↓
ρ Average (coding compression rate
of variables = #modalities / P)
ρ↓: risk↓
y Proportion of 1’s in dependent
variable
y↑: risk↓
k Maximum profile complexity k↓: risk↓
V Total number of variables V↓: risk↓
ε Maximum admissible probability of
finding any configuration by
random search
ε↓: risk↓
0
10
20
30
40
50
60
Nb max
Coding compression of variables
Maximum number of profiles
#V=1
#V=2
#V=3
#V=4
RISK AND REWARDS OF
COMBINATORIAL EXPLORATION
No preselection of variables, no hierarchy, localized search: more freedom is
granted
No free lunch: computation time increases (linear in #Obs, polynomial in #V)
But parallel computation and cloud-computing are perfectly adapted
Risk of overfitting must be carefully controlled
The richness of the descriptive language must be kept at a parsimonious level
in order to prevent ‘nugget-fishing’: interesting maths behind the scene.
Présentation commerciale 2014 20
CURRENT RESEARCH AREAS
Improving the dynamic aggregation of predictors:
Using prediction as a topology on data: COBRA algorithm (G. Biau, B. Guedj).
Weighting schemes based on regret (Lugosi, Stoltz) or regularity (Wintenberger).
Embedding time stationarity requirements in profile search.
Incremental production of backtests.
Visualization of an audit trail between variables and final recommendations.
GPU calculations
…
Présentation commerciale 2014 21
CONTACT
22
11, rue Galvani 75017 Paris, France
+33 (0)1 45 74 33 05
http://www.quinten-france.com
@QuintenFrance
Christophe GEISSLER
33 (0)6 08 60 46 14
c.geissler@quinten-france.com

More Related Content

Viewers also liked

Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaumKezhan SHI
 
From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...Kezhan SHI
 
Confidentialité des données michel béra
Confidentialité des données   michel béraConfidentialité des données   michel béra
Confidentialité des données michel béraKezhan SHI
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohenKezhan SHI
 
Arbres de régression et modèles de durée
Arbres de régression et modèles de duréeArbres de régression et modèles de durée
Arbres de régression et modèles de duréeKezhan SHI
 
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesNorme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesKezhan SHI
 
L'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceL'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceKezhan SHI
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellierKezhan SHI
 
Présentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanPrésentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanKezhan SHI
 
Big data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesBig data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesKezhan SHI
 

Viewers also liked (10)

Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaum
 
From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...
 
Confidentialité des données michel béra
Confidentialité des données   michel béraConfidentialité des données   michel béra
Confidentialité des données michel béra
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohen
 
Arbres de régression et modèles de durée
Arbres de régression et modèles de duréeArbres de régression et modèles de durée
Arbres de régression et modèles de durée
 
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesNorme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
 
L'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceL'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data science
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellier
 
Présentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanPrésentation Françoise Soulié Fogelman
Présentation Françoise Soulié Fogelman
 
Big data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesBig data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuaires
 

Similar to Détection de profils, application en santé et en économétrie geissler

Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitLaure Wynants
 
Spark Therapeutics
Spark TherapeuticsSpark Therapeutics
Spark TherapeuticsHealthegy
 
The standard deviation of the diameter at breast height, or DBH, o.docx
The standard deviation of the diameter at breast height, or DBH, o.docxThe standard deviation of the diameter at breast height, or DBH, o.docx
The standard deviation of the diameter at breast height, or DBH, o.docxchristalgrieg
 
Linear Models and Econometrics Chapter 4 Econometrics.ppt
Linear Models and Econometrics Chapter 4 Econometrics.pptLinear Models and Econometrics Chapter 4 Econometrics.ppt
Linear Models and Econometrics Chapter 4 Econometrics.pptfaisal960287
 
Integrated ACO selected for the NAACOS Innovation Showcase
Integrated ACO selected for the NAACOS Innovation ShowcaseIntegrated ACO selected for the NAACOS Innovation Showcase
Integrated ACO selected for the NAACOS Innovation ShowcaseEric Weaver
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Evangelos Kritsotakis
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
Medicenna corporate presentation q2 2017
Medicenna corporate presentation q2 2017Medicenna corporate presentation q2 2017
Medicenna corporate presentation q2 2017medicenna2016
 
Introduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptxIntroduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptxtadegebreyesus
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Joe Gricar, MS
 
Undergraduate Research work
Undergraduate Research workUndergraduate Research work
Undergraduate Research workPeter M Addo
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Predictive model for falls poster v3
Predictive model for falls poster v3Predictive model for falls poster v3
Predictive model for falls poster v3Marmi Le
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 

Similar to Détection de profils, application en santé et en économétrie geissler (20)

Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
 
Spark Therapeutics
Spark TherapeuticsSpark Therapeutics
Spark Therapeutics
 
The standard deviation of the diameter at breast height, or DBH, o.docx
The standard deviation of the diameter at breast height, or DBH, o.docxThe standard deviation of the diameter at breast height, or DBH, o.docx
The standard deviation of the diameter at breast height, or DBH, o.docx
 
Lg ph d_slides_vfinal
Lg ph d_slides_vfinalLg ph d_slides_vfinal
Lg ph d_slides_vfinal
 
Predictive Medicine
Predictive Medicine Predictive Medicine
Predictive Medicine
 
Linear Models and Econometrics Chapter 4 Econometrics.ppt
Linear Models and Econometrics Chapter 4 Econometrics.pptLinear Models and Econometrics Chapter 4 Econometrics.ppt
Linear Models and Econometrics Chapter 4 Econometrics.ppt
 
Integrated ACO selected for the NAACOS Innovation Showcase
Integrated ACO selected for the NAACOS Innovation ShowcaseIntegrated ACO selected for the NAACOS Innovation Showcase
Integrated ACO selected for the NAACOS Innovation Showcase
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)
 
Bioststistic mbbs-1 f30may
Bioststistic  mbbs-1 f30mayBioststistic  mbbs-1 f30may
Bioststistic mbbs-1 f30may
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Medicenna corporate presentation q2 2017
Medicenna corporate presentation q2 2017Medicenna corporate presentation q2 2017
Medicenna corporate presentation q2 2017
 
Introduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptxIntroduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptx
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06
 
Undergraduate Research work
Undergraduate Research workUndergraduate Research work
Undergraduate Research work
 
Health Data Science Seminar Series
Health Data Science Seminar SeriesHealth Data Science Seminar Series
Health Data Science Seminar Series
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Predictive model for falls poster v3
Predictive model for falls poster v3Predictive model for falls poster v3
Predictive model for falls poster v3
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 

More from Kezhan SHI

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septKezhan SHI
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Kezhan SHI
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[Kezhan SHI
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septKezhan SHI
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_Kezhan SHI
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standardKezhan SHI
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014Kezhan SHI
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilanKezhan SHI
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Kezhan SHI
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2Kezhan SHI
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2Kezhan SHI
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Kezhan SHI
 
Rapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILRapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILKezhan SHI
 
Xavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionXavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionKezhan SHI
 

More from Kezhan SHI (16)

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-sept
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-sept
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2
 
Rapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILRapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNIL
 
Xavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionXavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régression
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

Détection de profils, application en santé et en économétrie geissler

  • 1. 7 Juillet 2014 Christophe Geissler, Quinten, IAF. DÉTECTION DE PROFILS: A¨PPLICATIONS EN SANTE ET EN ECONOMETRIE
  • 2. 1) QUINTEN EN BREF 2) SCIENCES DE LA VIE ET PREDICTION 3) ETUDES DE CAS 4) COMPARAISON DE METHODES 5) SUJETS DE RECHERCHE PLAN Hommage à AK, ~780-850
  • 3. QUINTEN IN SHORT A company providing data-oriented strategic advisory. Since 2008, over 100 missions for more than 25 clients Historical focus on Life Sciences and Healthcare Now extending to CRM, Insurance and Investment 18 employees, self-financed, annual average growth of 40% 80% of the revenue reinvested in R&D each year, including a proprietary learning technology Active member of several technology clusters: Medicen 3
  • 5. THE HEALTHCARE SECTOR AS ADVANCED ALGORITHMIC PRESCRIPTOR ? The prediction/classification needs in life sciences have evolved. Huge increase of available variables Limited size of samples (often < 1000) for economic reasons These needs are not fully met by predictive approaches. Need for evidence-based methods Trade-off between predictive power and auditability of recommendations Exponential increase in computation capacity open the way for exploration- based methods With an increasing risk of overfitting the data Correlation with similar trends in CRM. Customer profiling: data gathering is key. 5
  • 6. ALGORITHMIC NEEDS IN EPIDEMIOLOGICAL STUDIES Databases have large sets of variables (#V >> #Obs) Practitioners often wish to get rid of a priori selection (or hierarchization) of variables Poor tractability by most kinds of regression models Using ‘sparsity’, ie penalizing complexity in order to simplify the model, does not fully solve the problem Leaving the cartesian paradigm: a single ((very)complex) function driving globally the entirety of the visible phenomena For a heuristic approach: accepting the possibility of multiple, local, partially correlated causes to be discovered: the ‘profiles’. Interpretability of the profiles and descriptive parsimony are mandatory: no black- box or randomized results. 6
  • 7. PREDICTION VS DESCRIPTION IN SUPERVISED METHODS Supervised problems, ie where there training data are ‘labeled’ by a variable Y to be explained. Y is the ‘interest phenomenon’. Y can be a boolean (treatment outcome) or a continuous variable (loss amount, etc). Explanatory variables X = (Xi)i=1..V in RV, continuous or discrete with possibly missing values. Predictor: a function Ŷ = F (X) : RV  Dom(Y) verifying: Var(Ŷ – Y | X) < Var (Y) Explanatory power: capacity to ‘simply’ describe the sets F-1 ([s, 1]), i.e answering the question ‘Who are the strong responders ?’ Simplicity can be formalized, always imply the number of variables involved in the predictors. Simplicity is key when targeting large sets of ‘new’ individuals (not in the training sample). 7
  • 8. THE PREDICTIVE VS EXPLANATORY TRADE-OFF 8 Problem: separating ‘nicely’ red from blue points in R2. Dark colors in the training sample, light colors in the test sample.
  • 9. THE PREDICTIVE VS EXPLANATORY TRADE-OFF 9 Running four prediction techniques on the previous set. Colored areas depending on the predicted value. How many words are needed to describe the dark shaded areas ? Poor response of linear separators (SVM) indicate that more dimensions could be needed in order to improve the description.
  • 10. PROFILE SEARCH VS DECISION TREES 10 Decisions trees look for optimal cut-offs on explanatory variables: partition of space in non-overlapping regions. Profile search allows for some controlled degree of intersection. Toy data-base with a phenomenon taking place on two overlapping rectangles on variables a and b, hidden among 250 random variables. CART response: up to 14 levels to partition space
  • 11. 507 patients Typology 1 6,4 % AEX 507 patients Typology 2 10% AEX 808 patients Typology 3 13% AEX USE CASE IN HEALTHCARE CLUSTERING : A NON SUPERVISED APPROACH Database : 2000 patients / 1000 variables Patient without Adverse Event X Patient with Adverse Event X 10% got the Adverse Event X (200 patients) Singular value Decomposition (SVD) : Clustering (PCA, K-Means ...) 11 Are there various typologies of patients in this database ? Do these typologies show any deviations with regard to Adverse Event X ? Are these difference important enough to avoid treating some typologies ?
  • 12. ASSOCIATIVE RULES DISCOVERY: QFINDER ALGORITHM Identification and characterization of singular profiles Database : 2000 patients / 1000 variables Patient without Adverse Event X Patient with Adverse Event X 10% got the Adverse Event X (200 patients) Data processing (QFinder) 12 What are the various profiles of patients with the highest risk of Adverse Event X ? What are the key characteristics of each of these profiles ? How to prevent Adverse Event X ? Age > 56 Average Daily Dose = High Treatment duration > 50 days 126 patients 47% Adverse Event X 108 patients 60% Adverse Event X Gender : female Diabetes =Yes Menopause = Yes 59 patients 75% Adverse Event X Blood Pressure = High Dyslipidemia = Yes Interpretable and actionable results Optimality of recommendations
  • 13. MANY CRITERIA HAVE LITTLE OR NO INFLUENCE EXAMPLE OF PROFILE Detection of mutually influent factors not seen by regressions ACTION : AVOID THE HIGH DOSE ON PATIENTS > 56 TREATED > 50 DAYS AVOID TREATING MORE THAN 50 DAYS PATIENTS > 56 WITH THE HIGH DOSE 10% Database size : 2000 patients (100%) Average rate of adverse events : 10% 13 90% 65% Size : 739 patients(37%) AGE > 56 11% 89% 69% Size : 936 patients(47%) TREATMENT DURATION > 50 days 8% 92% Size : 647 patients(32%) AVERAGE DAILY DOSE : HIGH 13% 87% HOWEVER Q-FINDER WAS ABLE TO DETECT THEIR COMBINED INFLUENCE WHEN RELEVANT Profile size : 126 patient(6,3%) Patients matching the following characteristics : Are 4,7 more likely to trigger adverse events AGE > 56 TREATMENT DURATION > 50 days AVERAGE DAILY DOSE : HIGH 84%47% 53%
  • 14. USING PROFILE DETECTION IN INVESTMENT 14 Using machine learning for the detection of recurrent biases on the returns of main assets classes (interest rates, equity indices, currencies). Empirical facts: Financial markets are interaction hubs for investors having a huge diversity in horizon and risk aversion. Fluctuations can therefore be caused by a large number of potential factors. The influence of these factors is not uniform through time. GLM-type approaches are too difficult to calibrate and yield unstable results. Retained approach: Search for signifiant profiles, characterized by conditions on a limited number of variables. Profiles can be partially intersected. No predefined hierarchy on the variables. Creating derived variables from primary variables: stationarity and variety.
  • 15. USING PROFILE DETECTION IN INVESTMENT Présentation commerciale 2014 15 Exemple: • Y(t) = D Bund (1month) / stdev (D Bund (1 month)) • 250 explanatory variables: • Eurozone, US economic indicators • Interest rates levels and dynamics • Central money data • Inflationary anticipations (inflation swaps) • Risk premia on equity markets • Energy prices • Volatilities, correlations • Training period: 1999-2013. Average (Y(t), <Training period>) = +0.15 s -15 -10 -5 0 5 10 19991101 20000428 20001026 20010426 20011024 20020423 20021021 20030421 20031017 20040415 20041014 20050412 20051011 20060411 20061009 20070409 20071008 20080404 20081001 20090401 20090929 20100326 20100924 20110323 20110921 20120321 20120919 Dbund = f(t)
  • 16. USING PROFILE DETECTION IN INVESTMENT 16 Stylized fact 1: Sharp drop in German equities  increase in risk aversion  rise in German Govt Bonds. Validating hypothesis: X1 = Decile (D (E/P_ratio (Dax) – Bobl yield)). Interpretation: 3 month variation in German equity risk premium r = Correlation (X1, Y) = 9%, R2 = 0.8% : Decile analysis: E(Y | X1) Non linearity General trend conform with intuition -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 - 2 1 - 3 2 - 4 3 - 5 4 - 6 5 - 7 6 - 8 7 - 9 8 - 10 E(Dbund) = f(Dprime Dax)
  • 17. USING PROFILE DETECTION IN INVESTMENT 17 Stylized fact 2: Growth acceleration in monetary aggregates  future rise in inflation loss in Govt Bonds. Hypothesis validation: X2 : Decile (D M3 (3 month)) . r = Correlation (X2, Y) = -1.5%, R2 = 0.4%. Decile analysis: Non linearity General trend conform with intuition -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0 - 3 1 - 4 2 - 5 3 - 6 4 - 7 5 - 8 6 - 9 7 - 10 8 - 11 E(DBund) = f(DM3)
  • 18. USING PROFILE DETECTION IN INVESTMENT 18 When: X1 >= 5 D (DAX Risk Premium) > 5th decile AND X2 in [2, 6] D (M3) between 2nd and 6th decile Then: E(Y | X1,X2) = +0.83s, True on 21.5% of observations between 1999 and 2012. These conditions form a market profile. Information ratio: 0.83 x (21.5% x 260/20)0.5 = 1.05 Strong synergy between variables: +90% increase in conditional expectation on Bund performance . 0 1 2 3 4 5 6 7 8 9 -1.5 s -1.0 s -0.5 s 0.0 s 0.5 s 1.0 s 1.5 s 0 1 2 3 4 5 6 7 8 9 Espéranceconditionnelle Influence combinée des deux variables 1.0 s-1.5 s 0.5 s-1.0 s 0.0 s-0.5 s -0.5 s-0.0 s -1.0 s--0.5 s -1.5 s--1.0 s Combined influence 0 1 19991101 20000107 20000316 20000524 20000802 20001010 20001219 20010226 20010507 20010713 20010921 20011129 20020206 20020417 20020625 20020902 20021108 20030117 20030328 20030605 20030813 20031021 20031229 20040308 20040514 20040722 20040930 20041208 20050216 20050425 20050704 20050909 20051118 20060126 20060406 20060614 20060822 20061031 20070109 20070319 20070525 20070803 20071012 20071220 20080227 20080506 20080715 20080920 20081128 20090205 20090416 20090624 20090902 20091110 20100118 20100325 20100603 20100811 20101020 20101228 20110307 20110513 20110722 20110929 20111207 20120215 20120425 20120703 20120911 20121119 20130125 Occurrences historiques du profil 113 independent occurrences in 14 years
  • 19. MANAGING THE RISK OF OVERFITTING 19 Parameter Role Influence on overfitting risk P Size of training sample P↑: risk↓ ρ Average (coding compression rate of variables = #modalities / P) ρ↓: risk↓ y Proportion of 1’s in dependent variable y↑: risk↓ k Maximum profile complexity k↓: risk↓ V Total number of variables V↓: risk↓ ε Maximum admissible probability of finding any configuration by random search ε↓: risk↓ 0 10 20 30 40 50 60 Nb max Coding compression of variables Maximum number of profiles #V=1 #V=2 #V=3 #V=4
  • 20. RISK AND REWARDS OF COMBINATORIAL EXPLORATION No preselection of variables, no hierarchy, localized search: more freedom is granted No free lunch: computation time increases (linear in #Obs, polynomial in #V) But parallel computation and cloud-computing are perfectly adapted Risk of overfitting must be carefully controlled The richness of the descriptive language must be kept at a parsimonious level in order to prevent ‘nugget-fishing’: interesting maths behind the scene. Présentation commerciale 2014 20
  • 21. CURRENT RESEARCH AREAS Improving the dynamic aggregation of predictors: Using prediction as a topology on data: COBRA algorithm (G. Biau, B. Guedj). Weighting schemes based on regret (Lugosi, Stoltz) or regularity (Wintenberger). Embedding time stationarity requirements in profile search. Incremental production of backtests. Visualization of an audit trail between variables and final recommendations. GPU calculations … Présentation commerciale 2014 21
  • 22. CONTACT 22 11, rue Galvani 75017 Paris, France +33 (0)1 45 74 33 05 http://www.quinten-france.com @QuintenFrance Christophe GEISSLER 33 (0)6 08 60 46 14 c.geissler@quinten-france.com