SlideShare a Scribd company logo
1 of 50
Ohio Center of Excellence in Knowledge-Enabled Computing
Ph.D. Dissertation Defense:
Contrast Pattern Aided Regression and
Classification
February 19, 2016
Vahid Taslimitehrani
Kno.e.sis Center, CSE Dept., Wright State University, USA
Committee Members: Prof. Guozhu Dong (advisor, WSU), Prof. Amit Sheth (WSU),
Prof. T.K. Prasad (WSU), Dr. Keke Chen (WSU), and Prof. Jyotishman Pathak
(Cornell University)
1
Ohio Center of Excellence in Knowledge-Enabled Computing
2
Ohio Center of Excellence in Knowledge-Enabled Computing
3
Does Asthma decrease
the mortality risk from
Pneumonia?
Ohio Center of Excellence in Knowledge-Enabled Computing
Accuracy vs. Interpretability
4
Accuracy
Interpretability
Low
High
High
Lasso
Linear/Logistic
Regression
Naïve Bayes
Decision Trees
Splines
Nearest
Neighbors
Bagging
Neural Nets
SVM
Boosting
Random Forest
Deep Learning
CPXR/CPXC
Source: Joshua Bloom and Henrik Brink of wise.io
*on real dataset
Ohio Center of Excellence in Knowledge-Enabled Computing
5
Modeling Techniques Lack Accuracy
and Interpretability
Heterogeneity &
Diversity of Given
Dataset
Predictors-Response
Interactions
Universal Model’s
Assumption
Ohio Center of Excellence in Knowledge-Enabled Computing
Predictors-Response Interactions
6
Interactive effect:
The effect of a variable on prediction
changes and varies with changes in the
values of other independent variable(s)
which are interacting with the variable.
It is not the genes or the environment!
It is their interaction that’s important.
Ohio Center of Excellence in Knowledge-Enabled Computing
Universal Model’s Assumption &
Heterogeneity
What is the universal model’s
assumption?
7
What are heterogeneous and
diverse data points?
Ohio Center of Excellence in Knowledge-Enabled Computing
Solution
1.New type of regression & classification models called Pattern
Aided Regression and Classification (PXR and PXC)
2.The new algorithms to build PXR and PXC models called Contrast
Pattern Aided Regression and Classification (CPXR and CPXC)
3.The new algorithm to handle imbalanced datasets called Contrast
Pattern Aided Classification on Imbalanced datasets (CPXCim)
8
Our proposed methodology has three components:
Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: patterns
• A pattern (rule) is a set of conditions describing set of objects.
• Example:
"𝑨𝒈𝒆 ≥ 60" AND “History of hypertension = YES”
is a pattern (rule) describing:
All patients more than 60 years old AND have a history of Hypertension.
• An object matches a pattern if it satisfies every condition in the pattern.
9
Patient ID Age BMI History of Hypertension Diagnosed with Heart Failure
1 75 22 YES YES
2 67 27 NO NO
Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: matching dataset and
contrast patterns
• The matching dataset of pattern 𝑃 in dataset 𝐷 or 𝑚𝑑𝑠(𝑃, 𝐷) is the set of all
instances matching pattern 𝑃.
• The support of pattern 𝑃 in 𝐷 is 𝑠𝑢𝑝𝑝 𝑃, 𝐷 =
𝑚𝑑𝑠(𝑃,𝐷)
𝐷
.
• Contrast patterns: patterns that distinguish objects in different classes. A
pattern is contrast pattern if it matches many objects in one class than in
another class.
• An equivalent class (EC) is a set of patterns with same matching datasets
(having same behavior).
10
Ohio Center of Excellence in Knowledge-Enabled Computing
Introduction: CPXR/CPXC overview
11
𝑷: pattern
𝒇: model
A pattern logically
characterizes a sub-
group of data.
A local model represents
predictor-response
interactions among the
data points of a sub-
group of data.
Regression
Classification
𝒇
CPXR/CPXC
(𝑷 𝟏, 𝒇 𝟏)
(𝑷 𝟐, 𝒇 𝟐)
Local model algorithms
can be simple as linear
regression.
Ohio Center of Excellence in Knowledge-Enabled Computing
Diversity of predictor-response
relationships
• Different pattern-model pairs emphasize different sets of
variables.
• Different pattern-model pairs use highly different
regression/classification models.
• Diverse predictor-response relationships may be neutralized
at the global level.
12
Ohio Center of Excellence in Knowledge-Enabled Computing
Introduction: Thesis Statement
Study regression and classification techniques to produce accurate
and interpretable models capable of adequately representing
complex and diverse predictor-response interactions and revealing
high intra-dataset heterogeneity.
13
Ohio Center of Excellence in Knowledge-Enabled Computing
Contrast Pattern Aided Regression
(CPXR)
14
Guozhu Dong, Vahid Taslimitehrani, Pattern-Aided Regression
Modeling and Prediction Model Analysis. in IEEE Transactions
on Knowledge and Data Engineering, vol.27, no.9, pp.2452-
2465, Sept. 1 2015
Ohio Center of Excellence in Knowledge-Enabled Computing
A pictorial illustration of a simple PXR
model
15
A small dataset with 100 instances and 2 numerical
predictor variables.
• Different patterns can involve different sets of variables
[describing data regions in different subspaces]
• Matching datasets of different patterns can overlap
0
2
4
6
8
10
0 2 4 6 8 10
Ohio Center of Excellence in Knowledge-Enabled Computing
PXR concepts
16
Regression
Classification
𝒇 𝒃
Given a training dataset 𝐷 =
(𝑥𝑖, 𝑦𝑖) 1 ≤ 𝑖 ≤ 𝑛 , a regression
model built on 𝐷 is called
baseline model and given as 𝑓𝑏.
(𝑷 𝟏, 𝒇 𝑷 𝟏
)
(𝑷 𝟐, 𝒇 𝑷 𝟐
)
CPXR/CPXC
Given the matching dataset
of pattern 𝑃, 𝑚𝑑𝑠(𝑃, 𝐷), a
regression built on
𝑚𝑑𝑠 𝑃, 𝐷 is called local
model and is shown by 𝑓𝑃.
Ohio Center of Excellence in Knowledge-Enabled Computing
Pattern Items Local Model Match
𝑃1 𝑓1
𝑃2 𝑓2
𝑃3 𝑓3
𝑃4 𝑓4
𝑃5 𝑓5
𝑃6 𝑓6
Pattern Aided Regression (PXR)
17
• 𝑃𝑋𝑅 = ( 𝑃1, 𝑓1, 𝑤1 , 𝑃2, 𝑓2, 𝑤2 , … , 𝑃𝑘, 𝑓𝑘, 𝑤 𝑘 , 𝑓𝑑)
• The regression function of PXR as:
𝑓𝑃𝑋𝑅 =
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖 𝑓𝑖(𝑥)
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖
, 𝑖𝑓 𝜋 𝑥 ≠ ∅
𝑓𝑑, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑥 𝑚𝑎𝑡𝑐ℎ𝑒𝑠 𝑃𝑖
Case 3:
Case 2:
Case 1:
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR/CPXC: Quality Measures
• The average residual reduction (arr) of a pattern 𝑃 w.r.t to a prediction
model 𝑓 on a dataset 𝐷 is:
𝑎𝑟𝑟 𝑃 =
𝑥∈𝑚𝑑𝑠(𝑃,𝐷) 𝑟 𝑥(𝑓 𝑏) − 𝑥∈𝑚𝑑𝑠(𝑃,𝐷) 𝑟 𝑥(𝑓 𝑃)
𝑚𝑑𝑠(𝑃,𝐷)
• The total residual reduction (trr) of a PXR/PXC is:
𝑡𝑟𝑟 𝑃𝑋𝑅/𝑃𝑋𝐶 =
𝑥∈𝑚𝑑𝑠(𝑃𝑆,𝐷) 𝑟𝑥(𝑓𝑏) − 𝑥∈𝑚𝑑𝑠(𝑃𝑆,𝐷) 𝑟𝑥(𝑓𝑃𝑋𝑅/𝑃𝑋𝐶)
𝑥∈𝐷 𝑟𝑥(𝑓)
Where 𝑃𝑆 = 𝑃1, … , 𝑃𝑘 is the pattern set, 𝑟𝑥(𝑓) is the 𝑓’s residual on an
instance 𝑥 and 𝑚𝑑𝑠 𝑃𝑆, 𝐷 = 𝑖=1
𝑘
𝑚𝑑𝑠(𝑃𝑖, 𝐷).
18
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR Algorithm
19
Dataset D CPXR
Phase1
Phase2
Phase3
Goal: A small set of cooperating patterns, where each pattern
characterize a subgroup of data points.
• A baseline model makes large residual errors on data points in
the subgroup.
• A highly accurate model is found to correct those errors.
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR Algorithm
20
Baseline
model
Regression/
Classification
LE
SE
Training
Dataset
𝑃2
𝑃3
…
…
(𝑓2, 𝑤2)
(𝑓3, 𝑤3)
…
…
Patterns Local Models
Pattern
Mining
[(𝑃1, 𝑓1, 𝑤1) , (𝑃4, 𝑓4, 𝑤4) , … , (𝑃𝑘, 𝑓𝑘, 𝑤 𝑘)]
(𝑓1, 𝑤1)
(𝑓4, 𝑤4)
(𝑓𝑘, 𝑤 𝑘)
𝑃1
𝑃4
𝑃𝑘
Ohio Center of Excellence in Knowledge-Enabled Computing
• How to determine spliting point 𝜅?
Minimize 𝜌 −
𝑟 𝑖>𝜅 𝑟 𝑖
𝑟 𝑖
• How to select patterns from C𝑃𝑆?
Lets 𝑃𝑆 = 𝑃0 , where 𝑃0 is the pattern 𝑃 in C𝑃𝑆 with the highest 𝑎𝑟𝑟
21
0
1
2
3
4
5
6
0 50 100 150 200
SE LE
CPXR Algorithm
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR/CPXC: Filtering methods
• Contrast patterns of LE with support ratio less than 1.
• Patterns with tiny residual reduction (𝑎𝑟𝑟).
• Patterns with Jaccard similarity more than 0.9
𝐽 𝑃1, 𝑃2 =
𝑚𝑑𝑠(𝑃1, 𝐷) ∩ 𝑚𝑑𝑠(𝑃2, 𝐷)
𝑚𝑑𝑠(𝑃1, 𝐷) ∪ 𝑚𝑑𝑠(𝑃2, 𝐷)
• Patterns with the size of matching datasets less than the number of
predictor variables.
22
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR: Prediction Accuracy Evaluation
• 50 real datasets and 23 synthetic datasets
• Different criteria to generate synthetic datasets
• Compare CPXR’s performance with 5 state-of-the-art
regression methods
• Overfitting and noise sensitivity
• Analysis of parameters
23
𝑅𝑀𝑆𝐸 𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 =
𝑅𝑀𝑆𝐸 𝐿𝑅 − 𝐸𝑀𝑆𝐸(𝑋)
𝑅𝑀𝑆𝐸(𝐿𝑅)
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR: Prediction Accuracy Evaluation
24
Dataset PLR SVR BART GBM CPXR
Tecator 40.62 0.16 19.35 -14.15 65.1
Tree 17.68 7.92 -7.23 -10.82 61.73
Wage 12.2 9.15 25.42 11.86 38.45
Average 18.41 4.94 20.18 14.6 42.89
CPXR’s
performance
vs. other
methods
• CPXR has the highest accuracy in 41 out of 50 datasets.
• CPXR’s results are more accurate than LR in all 50 datasets.
• In 20% of datasets, CPXR achieved more than 60% RMSE
reduction.
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR: Overfitting and Noise Sensitivity
25
5 10 15 20
102030405060
Noise(%)
Dropinaccuracycomparingtocleantestdata(%)
●
●
●
●
●
Datasets
BART
CPXR
Gradient Boosting
NN SVR BART CPXR
0.00.20.40.6
NN SVR BART CPXR
−0.2−0.10.00.10.20.30.4
RMSE
reduction on
synthetic
datasets
Train - Test
Method Training Test
Drop in
accuracy
PLR 37.11% 18.76% 49%
SVR 7.65% 4.8% 37%
BART 41.02% 20.15% 51%
CPXR(LL) 51.4% 39.88% 22%
CPXR(LP) 53.85% 42.89% 21%
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR: Analysis of Parameters
26
5 10 15 20
0.350.400.450.500.550.600.65
k (Number of patterns)
RMSEimprovementoverLR
●
●
●
●
●
●
Datasets
Fat
Mussels
Price
0.02 0.04 0.06 0.08 0.10
0.250.300.350.400.450.500.550.60
minSup
RMSEimprovementoverLR
● ●
●
●
●
Datasets
Fat
Mussels
Price
0.40 0.45 0.50 0.55 0.60 0.65 0.70
0.350.400.450.500.550.60
r
RMSEimprovementoverLR
● ●
●
● ●
● ●
●
Datasets
Fat
Mussels
Price
2% is the optimal minSup.7 patterns as average on
50 datasets.
Ohio Center of Excellence in Knowledge-Enabled Computing
Contrast Pattern Aided Classification
(CPXC)
27
Guozhu Dong, Vahid Taslimitehrani, Pattern Aided
Classification, SIAM Data Mining Conference, 2016
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: PXC Concept
CPXC techniques are quite
similar to those of CPXR
but CPXC has more
challenges as well as more
opportunities than CPXR
28
CPXC
Confidence
of Match
Objective
Functions
Classification
Algorithms
Loss
Functions
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: Confidence of Match
• Given 𝑃𝑋𝐶 = ( 𝑃1, ℎ 𝑃1
, 𝑤1 , 𝑃2, ℎ 𝑃2
, 𝑤2 , … , 𝑃𝑘, ℎ 𝑃 𝑘
, 𝑤 𝑘 , ℎ 𝑑), the class variable
of an instance 𝑥 is defined as:
𝑤𝑒𝑖𝑔ℎ𝑡𝑑 − 𝑣𝑜𝑡𝑒 (𝑃𝑋𝐶, 𝐶𝑗, 𝑥)
=
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖 × 𝑚𝑎𝑡𝑐ℎ (𝑥, 𝑝𝑖) × ℎ 𝑝 𝑖
(𝑥, 𝐶𝑗)
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖 × 𝑚𝑎𝑡𝑐ℎ (𝑥, 𝑝𝑖)
, 𝑖𝑓 𝜋 𝑥 ≠ ∅
ℎ 𝑑, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑚𝑎𝑡𝑐ℎ 𝑥, 𝑝𝑖 > 0
and
𝑚𝑎𝑡𝑐ℎ 𝑥, 𝑝𝑖 =
𝑞 𝑖 𝜖𝑀𝐺(𝑝 𝑖) 𝑡 𝑚𝑎𝑡𝑐ℎ𝑒𝑠 𝑝 𝑖
𝑀𝐺(𝑝 𝑖)
• 𝑚𝑎𝑡𝑐ℎ 𝑥, 𝑝𝑖 is the fraction of 𝑀𝐺 ‘s 𝑞 in 𝑀𝐺 𝑝𝑖 such that 𝑥 matches 𝑞.
• ℎ 𝑝(𝑥, 𝐶𝑗) is the confidence score of local model ℎ on instance 𝑥 for class 𝐶𝑗.
29
Confidence
of Match
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: Loss Functions
30
0.600.650.700.750.800.850.90
ClassError
AUC
●
●
●
Binary Probabilistic Standardized
●
Datasets
ILPD
Hillvalley
Planning
Probabilistic error loss
function returns the
best results.
Loss
Functions
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: Base/Local Algorithms & Objective
Functions
• Different methods for baseline and local classifiers:
– We used 6 classification algorithm for learning the
baseline and local classifiers
31
Classification
Algorithms
• Quality measures on pattern sets
– We used 𝑡𝑟𝑟, AUC, and ACC (accuracy) to measure the
quality of a pattern set
• Quality measures on patterns and weights on local classifiers
– We used 𝑎𝑟𝑟, AUC, and ACC (accuracy) to measure the
quality of a pattern: 𝑎𝑟𝑟 is the winner!
Objective
Functions
Ohio Center of Excellence in Knowledge-Enabled Computing
Experimental results
32
19
Public
Datasets
8
Classification
Algorithms
Noise
Sensitivity &
Overfitting
Running
Time
7
Fold Cross
Validation
minSup = 0.02
rho = 0.45
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: Performance
Dataset Boosting DT NBC Log RF SVM Max CPXC (NBC-DT)
Congress 0.58 0.66 0.6 0.57 0.58 0.58 0.66 0.86
Poker 0.6 0.6 0.5 0.5 0.76 0.5 0.76 0.85
HillValley 0.5 0.63 0.65 0.66 0.6 0.67 0.67 0.89
Climate 0.96 0.81 0.9 0.94 0.97 0.98 0.98 0.97
Mammography 0.94 0.91 0.94 0.94 0.93 0.93 0.94 0.98
Steel 0.96 0.88 0.91 0.95 0.95 0.94 0.95 0.99
33
• CPXC achieved average AUC of 0.886 on the 8 hard datasets.
• Average AUC of the best performing traditional classifier (RF) on hard datasets is 0.638.
• CPXC’s AUC is never lower than RF on the hard datasets.
• CPXC achieved average AUC of 0.983 on the easy datasets while the best performing
traditional algorithms obtained average AUC of 0.968.
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: Noise Sensitivity
34
Drop of AUC vs. noise levels
Method/Noise 0% 5% 10% 15% 20% Average
RF 5.73 6.61 12.48 25.83 33.54 16.84
CPXC 5.87 6.79 12.92 24.7 32.7 16.6
Boosting 7.02 8.93 14.2 26.8 34.65 18.32
Log 7.04 10.56 14.63 24.7 33.94 18.17
NBC 7.06 10.58 15.26 27.89 35.1 19.18
SVM 8.6 10.34 16.28 29.59 38.02 20.57
DT 8.8 11.04 16.78 30.3 43.1 22.00
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXC: Impact of Parameters
35
4 6 8 10 12 14
0.750.800.850.90
k (Number of patterns)
AUC
●
●
●
●
● ●
●
Datasets
Blood
Congress
Hillvalley
Planning
0.02 0.04 0.06 0.08 0.10
0.700.750.800.850.90
minSup
AUC
●
●
●
●
●
Datasets
Blood
Congress
Hillvalley
Planning
0.840.850.860.870.880.890.90
Objective Function
AUC
●
●
●
TER AUC ACC
●
Datasets
ILPD
Hillvalley
Planning
0.3 0.4 0.5 0.6 0.7
0.780.800.820.840.860.880.90
r
AUC
●
●
●
● ●
●
●
●
●
●
Datasets
Blood
Congress
Hillvalley
Planning
Ohio Center of Excellence in Knowledge-Enabled Computing
36
Classification on Imbalanced Datasets
• What is an imbalanced classification problem?
• What are the real world applications?
• Why traditional classification algorithms do not perform well on
imbalanced datasets?
• What is our proposed solution?
Classifying minority instances might be more important that majority class.
Ohio Center of Excellence in Knowledge-Enabled Computing
LE
SE
37
Baseline
model
Classification
LE
SE
Training
Dataset
Weighting
• 𝑒𝑟𝑟∗ ℎ 𝑏, 𝑥 =
𝑒𝑟𝑟 ℎ 𝑏, 𝑥 × 𝛿, 𝑖𝑓𝑥 ∈ 𝑚𝑖𝑛𝑜𝑟𝑖𝑡𝑦 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑠𝑡𝑎𝑛𝑥𝑐𝑒𝑠
𝑒𝑟𝑟(ℎ 𝑏, 𝑥), 𝑖𝑓𝑥 ∈ 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑠𝑡𝑎𝑛𝑥𝑐𝑒𝑠
New Weighting idea
Ohio Center of Excellence in Knowledge-Enabled Computing
A Filtering Method to Remove Imbalanced
Local Models
38
• 𝐼𝑅 𝑚𝑑𝑠 𝑃, 𝐷 =
Number of instances in the majority class
Number of instances in the minority class
𝑃1
𝑃2
𝑃3
𝑃4
…
…
𝑃𝑘
(𝑓1, 𝑤1)
(𝑓2, 𝑤2)
(𝑓3, 𝑤3)
(𝑓4, 𝑤4)
…
…
(𝑓𝑘, 𝑤 𝑘)
Patterns Local Models
Ohio Center of Excellence in Knowledge-Enabled Computing
Experimental results
39
• The average AUC of CPXCim is 14% and 15.2% more than the AUC of
SMOTE and SMOTE-TL, respectively.
• The performance of CPXCim is always better than other imbalanced
classifiers on these 10 datasets.
CPXCim’s performance
Dataset
# of
instances
# of
variables
Imbalance
ratio
CPXCim SMOTE SMOTE-TL
Yeast 1004 8 9.14 0.942 0.7728 0.772
Led7digit 443 7 10.97 0.978 0.8919 0.897
flareF 1066 11 23.79 0.883 0.7463 0.809
Wine Quality 1599 11 29.17 0.76 0.6008 0.59
Average - - - 0.92 0.798 0.807
Ohio Center of Excellence in Knowledge-Enabled Computing
Applications of CPXR & CPXC
40
• Vahid Taslimitehrani, Guozhu Dong. A New CPXR Based Logistic Regression Method and Clinical
Prognostic Modeling Results Using the Method on Traumatic Brain Injury", IEEE International
Conference on Bioinformatics and Bioengineering (BIBE), 2014, On page(s): 283 – 290 (Best Student
Paper)
• Behzad Ghanbarian, Vahid Taslimitehrani, Guozhu Dong, Yakov Pachepsky. Sample dimensions
effect on prediction of soil water retention curve and saturated hydraulic conductivity. Journal of
Hydrology. 528 (2015): 127-137.
• Vahid Taslimitehrani, Guozhu Dong, Naveen Pereira, Maryam Panahiazar, Jyotishman Pathak.
Develolping HER-driven Heart Failure Models using CPXR(Log) with the probabilistic loss function.
Journal of Biomedical Informatics (2016).
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Traumatic Brain Injury
What is Traumatic Brain Injury (TBI)?
It is an important public health problem and a leading
cause of death and disability worldwide.
Problem definition: prediction of patients outcome
within 6 months after TBI event, using the admission data.
• Dataset: 2159 patients collected from a trial and 15 predictor variables
• Two class variables: mortality and unfavorable outcome.
41
Vahid Taslimitehrani, Guozhu Dong. A New CPXR Based Logistic Regression
Method and Clinical Prognostic Modeling Results Using the Method on
Traumatic Brain Injury", Bioinformatics and Bioengineering (BIBE), 2014
IEEE International Conference on, On page(s): 283 – 290 (Best Student
Paper Award)
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Traumatic Brain Injury
Model Basic Basic+CT Basic+CT+Lab
Unfavorable
Specificity 0.89(0.85) 0.87(0.85) 0.91(0.84)
Sensitivity 0.54(0.52) 0.65(0.6) 0.72(0.61)
Accuracy 0.75(0.72) 0.79(0.75) 0.87(0.75)
F1 0.63(0.59) 0.7(0.66) 0.76(0.66)
AUC 0.82(0.76) 0.87(0.8) 0.93(0.81)
42
Variable set change
Mortality Unfavorable
CPXR(Log) Log CPXR(Log) Log
Basic Basic+CT 10% 7.7% 6% 5.2%
Basic+CTBasic+CT+Lab 4.5% 2.5% 6.8% 1.25%
BasicBasic+CT+Lab 15% 11.1% 13.4% 6.6%
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
False positive rate
Truepositiverate
CPXR(Log)
SLogR
SVM
RF
AUC_CPXR(Log) = 0.87
AUC_SLogR = 0.8
AUC_RF = 0.72
AUC_SVM = 0.7
Performance changes when we add more variables
CPXR(Log)’s
performance
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Heart Failure Survival Risk
Models
• Collaboration with Mayo Clinic
• Problem definition: Heart Failure survival prediction models.
• An EHR dataset on 119,749 patients admitted to Mayo Clinic.
• Predictor variables are grouped in the following categories:
– Demographic, Vitals, Labs, Medications and 24 major chronic conditions as co-
morbidities.
• Three groups of CPXC models are developed to predict survival in 1, 2 and 5 years
after heart failure event.
43
Vahid Taslimitehrani, Guozhu Dong, Naveen Pereira, Maryam Panahiazar, Jyotishman Pathak.
Develolping HER-driven Heart Failure Models using CPXR(Log) with the probabilistic loss function.
Journal of Biomedical Informatics (2016).
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Heart Failure Survival Risk
Models
Algorithm 1 Year 2 Year 5 Year
Decision Tree 0.66 0.5 0.5
Random Forest 0.8 0.72 0.72
Ada Boost 0.74 0.71 0.68
SVM 0.59 0.52 0.52
Logistic Regression 0.81 0.74 0.73
CPXC 0.937 0.83 0.786
44
Variable Log f1 f2 f3 f4 f5 f6 f7
Alzheimer 1.75 1.74 0.80 1.88 1.59 1.29 1.58 0.75
Breast Cancer 0.63 1.15 1.62 2.73 1.00 1.00 2.08 0.59
Odds ratios of PXC local models
Performance of difference classifiers
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Heart Failure Survival Risk
Models
Variable sets CPXC Log RF SVM DT Boosting
(Demo&Vital)  (Demo&Vital) +Lab 4.8% 11.5% 19% 17.3% 0% 14.7%
(Demo&Vital)  (Demo&Vital) +Lab+Med 8.9% 13.4% 21.2% 21.7% 0% 5.7%
(Demo&Vital)  (Demo&Vital) +Lab+Med+Co-morbid 27.8% 9.6% 19.1% 19.5% -10.4% 7.6%
(Demo&Vital) +Lab (Demo&Vital) +Lab+Med 3.2% 1.7% 1.7% 3.7% 0% -9.8%
(Demo&Vital) +Lab (Demo&Vital) +Lab+Med+Co-morbid 20.9% -1.7% 0% 1.8% -10.4% -8.1%
(Demo&Vital) +Lab+Med (Demo&Vital) +Lab+Med+Co-morbid 15.9% -3.3% -1.7% -1.7% -10.4% 1.8%
45
Adding co-morbidities:
• decreased the AUC of other classifiers by 5.3% on average.
• increased the AUC of CPXC by 21.5% on average.
Performance changes when we add more variables
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Saturated Hydraulic
Conductivity
• Collaboration with University of Texas at Austin and USDA-ARS
• Problem definition:
1. Prediction of the soil water retention curve (SWRC)
2. Prediction of Saturated Hydraulic Conductivity (SHC)
3. Investigating the effect of sample dimensions on
prediction accuracy.
• Number of predictor variables: 6-13
• Number of response variables: 10
• 32 CPXR models are developed.
46
Behzad Ghanbarian, Vahid Taslimitehrani, Guozhu Dong, Yakov Pachepsky. Sample
dimensions effect on prediction of soil water retention curve and saturated hydraulic
conductivity. Journal of Hydrology. 528 (2015): 127-137.
Ohio Center of Excellence in Knowledge-Enabled Computing
Application: Saturated Hydraulic
Conductivity
47
-4
-2
0
2
4
6
8
10
-4 -2 0 2 4 6 8 10
Predictedlog(Ksat)[cmday-1]
Measured log(Ksat) [cm day-1]
SHC2
RMSLE = 0.456
-4
-2
0
2
4
6
8
10
-4 -2 0 2 4 6 8 10
Predictedlog(Ksat)[cmday-1]
Measured log(Ksat) [cm day-1]
SHC2
RMSLE = 1.936
Model
s t 10 30 50 100 300 500 1000 1500
Linear Regression
SWRC1 0.79 0.73 0.77 0.84 0.85 0.84 0.83 0.84 0.81 0.77
SWRC2 0.79 0.72 0.77 0.85 0.84 0.84 0.84 0.83 0.80 0.78
CPXR
SWRC1 0.94 0.97 0.97 0.94 0.97 0.97 0.95 0.96 0.95 0.94
SWRC2 0.95 0.96 0.94 0.95 0.97 0.96 0.95 0.98 0.97 0.94
Ohio Center of Excellence in Knowledge-Enabled Computing
Conclusion
• A new type of highly accurate and interpretable regression and classification
models, PXR/PXC are presented.
• New techniques to build PXR and PXC models are discussed.
• Each pair of pattern-model represents a diverse predictor-response interaction.
• PXR and PXC models are more accurate, interpretable and less overfitting than
other regression and classification algorithms.
• A new method adopted from CPXC presented to handle classifying imbalanced
datasets.
• Several applications of CPXR and CPXC are discussed.
48
Ohio Center of Excellence in Knowledge-Enabled Computing
Related publications
• Guozhu Dong, Vahid Taslimitehrani, Pattern-Aided Regression Modeling and Prediction
Model Analysis. in IEEE Transactions on Knowledge and Data Engineering, vol.27, no.9,
pp.2452-2465, Sept. 1 2015.
• Vahid Taslimitehrani, Guozhu Dong. A New CPXR Based Logistic Regression Method
and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain
Injury", IEEE International Conference on Bioinformatics and Bioengineering (BIBE),
2014, On page(s): 283 – 290 (Best Student Paper)
• Behzad Ghanbarian, Vahid Taslimitehrani, Guozhu Dong, Yakov Pachepsky. Sample
dimensions effect on prediction of soil water retention curve and saturated hydraulic
conductivity. Journal of Hydrology. 528 (2015): 127-137.
• Vahid Taslimitehrani, Guozhu Dong, Naveen Pereira, Maryam Panahiazar, Jyotishman
Pathak. Develolping HER-driven Heart Failure Models using CPXR(Log) with the
probabilistic loss function. Journal of Biomedical Informatics (2016).
• Guozhu Dong, Vahid Taslimitehrani, Pattern Aided Classification, SIAM Data Mining
Conference, 2016
49
Ohio Center of Excellence in Knowledge-Enabled Computing
Acknowledgement
50

More Related Content

What's hot

32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AIDatabricks
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AIDatabricks
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsJoseph Paul Cohen PhD
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
A Self-Adaptive Evolutionary Negative Selection Approach for Anom
A Self-Adaptive Evolutionary Negative Selection Approach for AnomA Self-Adaptive Evolutionary Negative Selection Approach for Anom
A Self-Adaptive Evolutionary Negative Selection Approach for AnomLuis J. Gonzalez, PhD
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementNeuroMat
 
An efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data miningAn efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data miningijcisjournal
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reportsSaeed Mehrabi
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomicsShyam Sarkar
 

What's hot (18)

32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
A Self-Adaptive Evolutionary Negative Selection Approach for Anom
A Self-Adaptive Evolutionary Negative Selection Approach for AnomA Self-Adaptive Evolutionary Negative Selection Approach for Anom
A Self-Adaptive Evolutionary Negative Selection Approach for Anom
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
An efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data miningAn efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data mining
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
 

Viewers also liked

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Artificial Intelligence Institute at UofSC
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Artificial Intelligence Institute at UofSC
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Artificial Intelligence Institute at UofSC
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social MediaMeena Nagarajan
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 

Viewers also liked (20)

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
Automatic Emotion Identification from Text
Automatic Emotion Identification from TextAutomatic Emotion Identification from Text
Automatic Emotion Identification from Text
 
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent MiningAshutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
Mining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated ContentMining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated Content
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 

Similar to Contrast Pattern Aided Regression and Classification

Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Artificial Intelligence Institute at UofSC
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM​Iván Rodríguez
 
master_thesis_presentation_Sreenjay_Sen.pdf
master_thesis_presentation_Sreenjay_Sen.pdfmaster_thesis_presentation_Sreenjay_Sen.pdf
master_thesis_presentation_Sreenjay_Sen.pdfSreenjaySen1
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeBigMine
 
a paper reading of table recognition
a paper reading of table recognitiona paper reading of table recognition
a paper reading of table recognitionNing Lu
 
Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...German Terrazas
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...cscpconf
 
Sct2013 boston,randomizationmetricsposter,d6.2
Sct2013 boston,randomizationmetricsposter,d6.2Sct2013 boston,randomizationmetricsposter,d6.2
Sct2013 boston,randomizationmetricsposter,d6.2Dennis Sweitzer
 

Similar to Contrast Pattern Aided Regression and Classification (20)

Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
5 5 10
5 5 105 5 10
5 5 10
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
 
master_thesis_presentation_Sreenjay_Sen.pdf
master_thesis_presentation_Sreenjay_Sen.pdfmaster_thesis_presentation_Sreenjay_Sen.pdf
master_thesis_presentation_Sreenjay_Sen.pdf
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping Ye
 
a paper reading of table recognition
a paper reading of table recognitiona paper reading of table recognition
a paper reading of table recognition
 
Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...
 
P1121133727
P1121133727P1121133727
P1121133727
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
PMED: APPM Workshop: Overview of Methods for Subgroup Identification in Clini...
PMED: APPM Workshop: Overview of Methods for Subgroup Identification in Clini...PMED: APPM Workshop: Overview of Methods for Subgroup Identification in Clini...
PMED: APPM Workshop: Overview of Methods for Subgroup Identification in Clini...
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
 
Sct2013 boston,randomizationmetricsposter,d6.2
Sct2013 boston,randomizationmetricsposter,d6.2Sct2013 boston,randomizationmetricsposter,d6.2
Sct2013 boston,randomizationmetricsposter,d6.2
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Contrast Pattern Aided Regression and Classification

  • 1. Ohio Center of Excellence in Knowledge-Enabled Computing Ph.D. Dissertation Defense: Contrast Pattern Aided Regression and Classification February 19, 2016 Vahid Taslimitehrani Kno.e.sis Center, CSE Dept., Wright State University, USA Committee Members: Prof. Guozhu Dong (advisor, WSU), Prof. Amit Sheth (WSU), Prof. T.K. Prasad (WSU), Dr. Keke Chen (WSU), and Prof. Jyotishman Pathak (Cornell University) 1
  • 2. Ohio Center of Excellence in Knowledge-Enabled Computing 2
  • 3. Ohio Center of Excellence in Knowledge-Enabled Computing 3 Does Asthma decrease the mortality risk from Pneumonia?
  • 4. Ohio Center of Excellence in Knowledge-Enabled Computing Accuracy vs. Interpretability 4 Accuracy Interpretability Low High High Lasso Linear/Logistic Regression Naïve Bayes Decision Trees Splines Nearest Neighbors Bagging Neural Nets SVM Boosting Random Forest Deep Learning CPXR/CPXC Source: Joshua Bloom and Henrik Brink of wise.io *on real dataset
  • 5. Ohio Center of Excellence in Knowledge-Enabled Computing 5 Modeling Techniques Lack Accuracy and Interpretability Heterogeneity & Diversity of Given Dataset Predictors-Response Interactions Universal Model’s Assumption
  • 6. Ohio Center of Excellence in Knowledge-Enabled Computing Predictors-Response Interactions 6 Interactive effect: The effect of a variable on prediction changes and varies with changes in the values of other independent variable(s) which are interacting with the variable. It is not the genes or the environment! It is their interaction that’s important.
  • 7. Ohio Center of Excellence in Knowledge-Enabled Computing Universal Model’s Assumption & Heterogeneity What is the universal model’s assumption? 7 What are heterogeneous and diverse data points?
  • 8. Ohio Center of Excellence in Knowledge-Enabled Computing Solution 1.New type of regression & classification models called Pattern Aided Regression and Classification (PXR and PXC) 2.The new algorithms to build PXR and PXC models called Contrast Pattern Aided Regression and Classification (CPXR and CPXC) 3.The new algorithm to handle imbalanced datasets called Contrast Pattern Aided Classification on Imbalanced datasets (CPXCim) 8 Our proposed methodology has three components:
  • 9. Ohio Center of Excellence in Knowledge-Enabled Computing Preliminaries: patterns • A pattern (rule) is a set of conditions describing set of objects. • Example: "𝑨𝒈𝒆 ≥ 60" AND “History of hypertension = YES” is a pattern (rule) describing: All patients more than 60 years old AND have a history of Hypertension. • An object matches a pattern if it satisfies every condition in the pattern. 9 Patient ID Age BMI History of Hypertension Diagnosed with Heart Failure 1 75 22 YES YES 2 67 27 NO NO
  • 10. Ohio Center of Excellence in Knowledge-Enabled Computing Preliminaries: matching dataset and contrast patterns • The matching dataset of pattern 𝑃 in dataset 𝐷 or 𝑚𝑑𝑠(𝑃, 𝐷) is the set of all instances matching pattern 𝑃. • The support of pattern 𝑃 in 𝐷 is 𝑠𝑢𝑝𝑝 𝑃, 𝐷 = 𝑚𝑑𝑠(𝑃,𝐷) 𝐷 . • Contrast patterns: patterns that distinguish objects in different classes. A pattern is contrast pattern if it matches many objects in one class than in another class. • An equivalent class (EC) is a set of patterns with same matching datasets (having same behavior). 10
  • 11. Ohio Center of Excellence in Knowledge-Enabled Computing Introduction: CPXR/CPXC overview 11 𝑷: pattern 𝒇: model A pattern logically characterizes a sub- group of data. A local model represents predictor-response interactions among the data points of a sub- group of data. Regression Classification 𝒇 CPXR/CPXC (𝑷 𝟏, 𝒇 𝟏) (𝑷 𝟐, 𝒇 𝟐) Local model algorithms can be simple as linear regression.
  • 12. Ohio Center of Excellence in Knowledge-Enabled Computing Diversity of predictor-response relationships • Different pattern-model pairs emphasize different sets of variables. • Different pattern-model pairs use highly different regression/classification models. • Diverse predictor-response relationships may be neutralized at the global level. 12
  • 13. Ohio Center of Excellence in Knowledge-Enabled Computing Introduction: Thesis Statement Study regression and classification techniques to produce accurate and interpretable models capable of adequately representing complex and diverse predictor-response interactions and revealing high intra-dataset heterogeneity. 13
  • 14. Ohio Center of Excellence in Knowledge-Enabled Computing Contrast Pattern Aided Regression (CPXR) 14 Guozhu Dong, Vahid Taslimitehrani, Pattern-Aided Regression Modeling and Prediction Model Analysis. in IEEE Transactions on Knowledge and Data Engineering, vol.27, no.9, pp.2452- 2465, Sept. 1 2015
  • 15. Ohio Center of Excellence in Knowledge-Enabled Computing A pictorial illustration of a simple PXR model 15 A small dataset with 100 instances and 2 numerical predictor variables. • Different patterns can involve different sets of variables [describing data regions in different subspaces] • Matching datasets of different patterns can overlap 0 2 4 6 8 10 0 2 4 6 8 10
  • 16. Ohio Center of Excellence in Knowledge-Enabled Computing PXR concepts 16 Regression Classification 𝒇 𝒃 Given a training dataset 𝐷 = (𝑥𝑖, 𝑦𝑖) 1 ≤ 𝑖 ≤ 𝑛 , a regression model built on 𝐷 is called baseline model and given as 𝑓𝑏. (𝑷 𝟏, 𝒇 𝑷 𝟏 ) (𝑷 𝟐, 𝒇 𝑷 𝟐 ) CPXR/CPXC Given the matching dataset of pattern 𝑃, 𝑚𝑑𝑠(𝑃, 𝐷), a regression built on 𝑚𝑑𝑠 𝑃, 𝐷 is called local model and is shown by 𝑓𝑃.
  • 17. Ohio Center of Excellence in Knowledge-Enabled Computing Pattern Items Local Model Match 𝑃1 𝑓1 𝑃2 𝑓2 𝑃3 𝑓3 𝑃4 𝑓4 𝑃5 𝑓5 𝑃6 𝑓6 Pattern Aided Regression (PXR) 17 • 𝑃𝑋𝑅 = ( 𝑃1, 𝑓1, 𝑤1 , 𝑃2, 𝑓2, 𝑤2 , … , 𝑃𝑘, 𝑓𝑘, 𝑤 𝑘 , 𝑓𝑑) • The regression function of PXR as: 𝑓𝑃𝑋𝑅 = 𝑃 𝑖∈𝜋 𝑥 𝑤𝑖 𝑓𝑖(𝑥) 𝑃 𝑖∈𝜋 𝑥 𝑤𝑖 , 𝑖𝑓 𝜋 𝑥 ≠ ∅ 𝑓𝑑, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑥 𝑚𝑎𝑡𝑐ℎ𝑒𝑠 𝑃𝑖 Case 3: Case 2: Case 1:
  • 18. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR/CPXC: Quality Measures • The average residual reduction (arr) of a pattern 𝑃 w.r.t to a prediction model 𝑓 on a dataset 𝐷 is: 𝑎𝑟𝑟 𝑃 = 𝑥∈𝑚𝑑𝑠(𝑃,𝐷) 𝑟 𝑥(𝑓 𝑏) − 𝑥∈𝑚𝑑𝑠(𝑃,𝐷) 𝑟 𝑥(𝑓 𝑃) 𝑚𝑑𝑠(𝑃,𝐷) • The total residual reduction (trr) of a PXR/PXC is: 𝑡𝑟𝑟 𝑃𝑋𝑅/𝑃𝑋𝐶 = 𝑥∈𝑚𝑑𝑠(𝑃𝑆,𝐷) 𝑟𝑥(𝑓𝑏) − 𝑥∈𝑚𝑑𝑠(𝑃𝑆,𝐷) 𝑟𝑥(𝑓𝑃𝑋𝑅/𝑃𝑋𝐶) 𝑥∈𝐷 𝑟𝑥(𝑓) Where 𝑃𝑆 = 𝑃1, … , 𝑃𝑘 is the pattern set, 𝑟𝑥(𝑓) is the 𝑓’s residual on an instance 𝑥 and 𝑚𝑑𝑠 𝑃𝑆, 𝐷 = 𝑖=1 𝑘 𝑚𝑑𝑠(𝑃𝑖, 𝐷). 18
  • 19. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR Algorithm 19 Dataset D CPXR Phase1 Phase2 Phase3 Goal: A small set of cooperating patterns, where each pattern characterize a subgroup of data points. • A baseline model makes large residual errors on data points in the subgroup. • A highly accurate model is found to correct those errors.
  • 20. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR Algorithm 20 Baseline model Regression/ Classification LE SE Training Dataset 𝑃2 𝑃3 … … (𝑓2, 𝑤2) (𝑓3, 𝑤3) … … Patterns Local Models Pattern Mining [(𝑃1, 𝑓1, 𝑤1) , (𝑃4, 𝑓4, 𝑤4) , … , (𝑃𝑘, 𝑓𝑘, 𝑤 𝑘)] (𝑓1, 𝑤1) (𝑓4, 𝑤4) (𝑓𝑘, 𝑤 𝑘) 𝑃1 𝑃4 𝑃𝑘
  • 21. Ohio Center of Excellence in Knowledge-Enabled Computing • How to determine spliting point 𝜅? Minimize 𝜌 − 𝑟 𝑖>𝜅 𝑟 𝑖 𝑟 𝑖 • How to select patterns from C𝑃𝑆? Lets 𝑃𝑆 = 𝑃0 , where 𝑃0 is the pattern 𝑃 in C𝑃𝑆 with the highest 𝑎𝑟𝑟 21 0 1 2 3 4 5 6 0 50 100 150 200 SE LE CPXR Algorithm
  • 22. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR/CPXC: Filtering methods • Contrast patterns of LE with support ratio less than 1. • Patterns with tiny residual reduction (𝑎𝑟𝑟). • Patterns with Jaccard similarity more than 0.9 𝐽 𝑃1, 𝑃2 = 𝑚𝑑𝑠(𝑃1, 𝐷) ∩ 𝑚𝑑𝑠(𝑃2, 𝐷) 𝑚𝑑𝑠(𝑃1, 𝐷) ∪ 𝑚𝑑𝑠(𝑃2, 𝐷) • Patterns with the size of matching datasets less than the number of predictor variables. 22
  • 23. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR: Prediction Accuracy Evaluation • 50 real datasets and 23 synthetic datasets • Different criteria to generate synthetic datasets • Compare CPXR’s performance with 5 state-of-the-art regression methods • Overfitting and noise sensitivity • Analysis of parameters 23 𝑅𝑀𝑆𝐸 𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 = 𝑅𝑀𝑆𝐸 𝐿𝑅 − 𝐸𝑀𝑆𝐸(𝑋) 𝑅𝑀𝑆𝐸(𝐿𝑅)
  • 24. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR: Prediction Accuracy Evaluation 24 Dataset PLR SVR BART GBM CPXR Tecator 40.62 0.16 19.35 -14.15 65.1 Tree 17.68 7.92 -7.23 -10.82 61.73 Wage 12.2 9.15 25.42 11.86 38.45 Average 18.41 4.94 20.18 14.6 42.89 CPXR’s performance vs. other methods • CPXR has the highest accuracy in 41 out of 50 datasets. • CPXR’s results are more accurate than LR in all 50 datasets. • In 20% of datasets, CPXR achieved more than 60% RMSE reduction.
  • 25. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR: Overfitting and Noise Sensitivity 25 5 10 15 20 102030405060 Noise(%) Dropinaccuracycomparingtocleantestdata(%) ● ● ● ● ● Datasets BART CPXR Gradient Boosting NN SVR BART CPXR 0.00.20.40.6 NN SVR BART CPXR −0.2−0.10.00.10.20.30.4 RMSE reduction on synthetic datasets Train - Test Method Training Test Drop in accuracy PLR 37.11% 18.76% 49% SVR 7.65% 4.8% 37% BART 41.02% 20.15% 51% CPXR(LL) 51.4% 39.88% 22% CPXR(LP) 53.85% 42.89% 21%
  • 26. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR: Analysis of Parameters 26 5 10 15 20 0.350.400.450.500.550.600.65 k (Number of patterns) RMSEimprovementoverLR ● ● ● ● ● ● Datasets Fat Mussels Price 0.02 0.04 0.06 0.08 0.10 0.250.300.350.400.450.500.550.60 minSup RMSEimprovementoverLR ● ● ● ● ● Datasets Fat Mussels Price 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.350.400.450.500.550.60 r RMSEimprovementoverLR ● ● ● ● ● ● ● ● Datasets Fat Mussels Price 2% is the optimal minSup.7 patterns as average on 50 datasets.
  • 27. Ohio Center of Excellence in Knowledge-Enabled Computing Contrast Pattern Aided Classification (CPXC) 27 Guozhu Dong, Vahid Taslimitehrani, Pattern Aided Classification, SIAM Data Mining Conference, 2016
  • 28. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: PXC Concept CPXC techniques are quite similar to those of CPXR but CPXC has more challenges as well as more opportunities than CPXR 28 CPXC Confidence of Match Objective Functions Classification Algorithms Loss Functions
  • 29. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: Confidence of Match • Given 𝑃𝑋𝐶 = ( 𝑃1, ℎ 𝑃1 , 𝑤1 , 𝑃2, ℎ 𝑃2 , 𝑤2 , … , 𝑃𝑘, ℎ 𝑃 𝑘 , 𝑤 𝑘 , ℎ 𝑑), the class variable of an instance 𝑥 is defined as: 𝑤𝑒𝑖𝑔ℎ𝑡𝑑 − 𝑣𝑜𝑡𝑒 (𝑃𝑋𝐶, 𝐶𝑗, 𝑥) = 𝑃 𝑖∈𝜋 𝑥 𝑤𝑖 × 𝑚𝑎𝑡𝑐ℎ (𝑥, 𝑝𝑖) × ℎ 𝑝 𝑖 (𝑥, 𝐶𝑗) 𝑃 𝑖∈𝜋 𝑥 𝑤𝑖 × 𝑚𝑎𝑡𝑐ℎ (𝑥, 𝑝𝑖) , 𝑖𝑓 𝜋 𝑥 ≠ ∅ ℎ 𝑑, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑚𝑎𝑡𝑐ℎ 𝑥, 𝑝𝑖 > 0 and 𝑚𝑎𝑡𝑐ℎ 𝑥, 𝑝𝑖 = 𝑞 𝑖 𝜖𝑀𝐺(𝑝 𝑖) 𝑡 𝑚𝑎𝑡𝑐ℎ𝑒𝑠 𝑝 𝑖 𝑀𝐺(𝑝 𝑖) • 𝑚𝑎𝑡𝑐ℎ 𝑥, 𝑝𝑖 is the fraction of 𝑀𝐺 ‘s 𝑞 in 𝑀𝐺 𝑝𝑖 such that 𝑥 matches 𝑞. • ℎ 𝑝(𝑥, 𝐶𝑗) is the confidence score of local model ℎ on instance 𝑥 for class 𝐶𝑗. 29 Confidence of Match
  • 30. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: Loss Functions 30 0.600.650.700.750.800.850.90 ClassError AUC ● ● ● Binary Probabilistic Standardized ● Datasets ILPD Hillvalley Planning Probabilistic error loss function returns the best results. Loss Functions
  • 31. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: Base/Local Algorithms & Objective Functions • Different methods for baseline and local classifiers: – We used 6 classification algorithm for learning the baseline and local classifiers 31 Classification Algorithms • Quality measures on pattern sets – We used 𝑡𝑟𝑟, AUC, and ACC (accuracy) to measure the quality of a pattern set • Quality measures on patterns and weights on local classifiers – We used 𝑎𝑟𝑟, AUC, and ACC (accuracy) to measure the quality of a pattern: 𝑎𝑟𝑟 is the winner! Objective Functions
  • 32. Ohio Center of Excellence in Knowledge-Enabled Computing Experimental results 32 19 Public Datasets 8 Classification Algorithms Noise Sensitivity & Overfitting Running Time 7 Fold Cross Validation minSup = 0.02 rho = 0.45
  • 33. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: Performance Dataset Boosting DT NBC Log RF SVM Max CPXC (NBC-DT) Congress 0.58 0.66 0.6 0.57 0.58 0.58 0.66 0.86 Poker 0.6 0.6 0.5 0.5 0.76 0.5 0.76 0.85 HillValley 0.5 0.63 0.65 0.66 0.6 0.67 0.67 0.89 Climate 0.96 0.81 0.9 0.94 0.97 0.98 0.98 0.97 Mammography 0.94 0.91 0.94 0.94 0.93 0.93 0.94 0.98 Steel 0.96 0.88 0.91 0.95 0.95 0.94 0.95 0.99 33 • CPXC achieved average AUC of 0.886 on the 8 hard datasets. • Average AUC of the best performing traditional classifier (RF) on hard datasets is 0.638. • CPXC’s AUC is never lower than RF on the hard datasets. • CPXC achieved average AUC of 0.983 on the easy datasets while the best performing traditional algorithms obtained average AUC of 0.968.
  • 34. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: Noise Sensitivity 34 Drop of AUC vs. noise levels Method/Noise 0% 5% 10% 15% 20% Average RF 5.73 6.61 12.48 25.83 33.54 16.84 CPXC 5.87 6.79 12.92 24.7 32.7 16.6 Boosting 7.02 8.93 14.2 26.8 34.65 18.32 Log 7.04 10.56 14.63 24.7 33.94 18.17 NBC 7.06 10.58 15.26 27.89 35.1 19.18 SVM 8.6 10.34 16.28 29.59 38.02 20.57 DT 8.8 11.04 16.78 30.3 43.1 22.00
  • 35. Ohio Center of Excellence in Knowledge-Enabled Computing CPXC: Impact of Parameters 35 4 6 8 10 12 14 0.750.800.850.90 k (Number of patterns) AUC ● ● ● ● ● ● ● Datasets Blood Congress Hillvalley Planning 0.02 0.04 0.06 0.08 0.10 0.700.750.800.850.90 minSup AUC ● ● ● ● ● Datasets Blood Congress Hillvalley Planning 0.840.850.860.870.880.890.90 Objective Function AUC ● ● ● TER AUC ACC ● Datasets ILPD Hillvalley Planning 0.3 0.4 0.5 0.6 0.7 0.780.800.820.840.860.880.90 r AUC ● ● ● ● ● ● ● ● ● ● Datasets Blood Congress Hillvalley Planning
  • 36. Ohio Center of Excellence in Knowledge-Enabled Computing 36 Classification on Imbalanced Datasets • What is an imbalanced classification problem? • What are the real world applications? • Why traditional classification algorithms do not perform well on imbalanced datasets? • What is our proposed solution? Classifying minority instances might be more important that majority class.
  • 37. Ohio Center of Excellence in Knowledge-Enabled Computing LE SE 37 Baseline model Classification LE SE Training Dataset Weighting • 𝑒𝑟𝑟∗ ℎ 𝑏, 𝑥 = 𝑒𝑟𝑟 ℎ 𝑏, 𝑥 × 𝛿, 𝑖𝑓𝑥 ∈ 𝑚𝑖𝑛𝑜𝑟𝑖𝑡𝑦 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑠𝑡𝑎𝑛𝑥𝑐𝑒𝑠 𝑒𝑟𝑟(ℎ 𝑏, 𝑥), 𝑖𝑓𝑥 ∈ 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑠𝑡𝑎𝑛𝑥𝑐𝑒𝑠 New Weighting idea
  • 38. Ohio Center of Excellence in Knowledge-Enabled Computing A Filtering Method to Remove Imbalanced Local Models 38 • 𝐼𝑅 𝑚𝑑𝑠 𝑃, 𝐷 = Number of instances in the majority class Number of instances in the minority class 𝑃1 𝑃2 𝑃3 𝑃4 … … 𝑃𝑘 (𝑓1, 𝑤1) (𝑓2, 𝑤2) (𝑓3, 𝑤3) (𝑓4, 𝑤4) … … (𝑓𝑘, 𝑤 𝑘) Patterns Local Models
  • 39. Ohio Center of Excellence in Knowledge-Enabled Computing Experimental results 39 • The average AUC of CPXCim is 14% and 15.2% more than the AUC of SMOTE and SMOTE-TL, respectively. • The performance of CPXCim is always better than other imbalanced classifiers on these 10 datasets. CPXCim’s performance Dataset # of instances # of variables Imbalance ratio CPXCim SMOTE SMOTE-TL Yeast 1004 8 9.14 0.942 0.7728 0.772 Led7digit 443 7 10.97 0.978 0.8919 0.897 flareF 1066 11 23.79 0.883 0.7463 0.809 Wine Quality 1599 11 29.17 0.76 0.6008 0.59 Average - - - 0.92 0.798 0.807
  • 40. Ohio Center of Excellence in Knowledge-Enabled Computing Applications of CPXR & CPXC 40 • Vahid Taslimitehrani, Guozhu Dong. A New CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury", IEEE International Conference on Bioinformatics and Bioengineering (BIBE), 2014, On page(s): 283 – 290 (Best Student Paper) • Behzad Ghanbarian, Vahid Taslimitehrani, Guozhu Dong, Yakov Pachepsky. Sample dimensions effect on prediction of soil water retention curve and saturated hydraulic conductivity. Journal of Hydrology. 528 (2015): 127-137. • Vahid Taslimitehrani, Guozhu Dong, Naveen Pereira, Maryam Panahiazar, Jyotishman Pathak. Develolping HER-driven Heart Failure Models using CPXR(Log) with the probabilistic loss function. Journal of Biomedical Informatics (2016).
  • 41. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Traumatic Brain Injury What is Traumatic Brain Injury (TBI)? It is an important public health problem and a leading cause of death and disability worldwide. Problem definition: prediction of patients outcome within 6 months after TBI event, using the admission data. • Dataset: 2159 patients collected from a trial and 15 predictor variables • Two class variables: mortality and unfavorable outcome. 41 Vahid Taslimitehrani, Guozhu Dong. A New CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury", Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on, On page(s): 283 – 290 (Best Student Paper Award)
  • 42. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Traumatic Brain Injury Model Basic Basic+CT Basic+CT+Lab Unfavorable Specificity 0.89(0.85) 0.87(0.85) 0.91(0.84) Sensitivity 0.54(0.52) 0.65(0.6) 0.72(0.61) Accuracy 0.75(0.72) 0.79(0.75) 0.87(0.75) F1 0.63(0.59) 0.7(0.66) 0.76(0.66) AUC 0.82(0.76) 0.87(0.8) 0.93(0.81) 42 Variable set change Mortality Unfavorable CPXR(Log) Log CPXR(Log) Log Basic Basic+CT 10% 7.7% 6% 5.2% Basic+CTBasic+CT+Lab 4.5% 2.5% 6.8% 1.25% BasicBasic+CT+Lab 15% 11.1% 13.4% 6.6% 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 False positive rate Truepositiverate CPXR(Log) SLogR SVM RF AUC_CPXR(Log) = 0.87 AUC_SLogR = 0.8 AUC_RF = 0.72 AUC_SVM = 0.7 Performance changes when we add more variables CPXR(Log)’s performance
  • 43. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Heart Failure Survival Risk Models • Collaboration with Mayo Clinic • Problem definition: Heart Failure survival prediction models. • An EHR dataset on 119,749 patients admitted to Mayo Clinic. • Predictor variables are grouped in the following categories: – Demographic, Vitals, Labs, Medications and 24 major chronic conditions as co- morbidities. • Three groups of CPXC models are developed to predict survival in 1, 2 and 5 years after heart failure event. 43 Vahid Taslimitehrani, Guozhu Dong, Naveen Pereira, Maryam Panahiazar, Jyotishman Pathak. Develolping HER-driven Heart Failure Models using CPXR(Log) with the probabilistic loss function. Journal of Biomedical Informatics (2016).
  • 44. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Heart Failure Survival Risk Models Algorithm 1 Year 2 Year 5 Year Decision Tree 0.66 0.5 0.5 Random Forest 0.8 0.72 0.72 Ada Boost 0.74 0.71 0.68 SVM 0.59 0.52 0.52 Logistic Regression 0.81 0.74 0.73 CPXC 0.937 0.83 0.786 44 Variable Log f1 f2 f3 f4 f5 f6 f7 Alzheimer 1.75 1.74 0.80 1.88 1.59 1.29 1.58 0.75 Breast Cancer 0.63 1.15 1.62 2.73 1.00 1.00 2.08 0.59 Odds ratios of PXC local models Performance of difference classifiers
  • 45. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Heart Failure Survival Risk Models Variable sets CPXC Log RF SVM DT Boosting (Demo&Vital)  (Demo&Vital) +Lab 4.8% 11.5% 19% 17.3% 0% 14.7% (Demo&Vital)  (Demo&Vital) +Lab+Med 8.9% 13.4% 21.2% 21.7% 0% 5.7% (Demo&Vital)  (Demo&Vital) +Lab+Med+Co-morbid 27.8% 9.6% 19.1% 19.5% -10.4% 7.6% (Demo&Vital) +Lab (Demo&Vital) +Lab+Med 3.2% 1.7% 1.7% 3.7% 0% -9.8% (Demo&Vital) +Lab (Demo&Vital) +Lab+Med+Co-morbid 20.9% -1.7% 0% 1.8% -10.4% -8.1% (Demo&Vital) +Lab+Med (Demo&Vital) +Lab+Med+Co-morbid 15.9% -3.3% -1.7% -1.7% -10.4% 1.8% 45 Adding co-morbidities: • decreased the AUC of other classifiers by 5.3% on average. • increased the AUC of CPXC by 21.5% on average. Performance changes when we add more variables
  • 46. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Saturated Hydraulic Conductivity • Collaboration with University of Texas at Austin and USDA-ARS • Problem definition: 1. Prediction of the soil water retention curve (SWRC) 2. Prediction of Saturated Hydraulic Conductivity (SHC) 3. Investigating the effect of sample dimensions on prediction accuracy. • Number of predictor variables: 6-13 • Number of response variables: 10 • 32 CPXR models are developed. 46 Behzad Ghanbarian, Vahid Taslimitehrani, Guozhu Dong, Yakov Pachepsky. Sample dimensions effect on prediction of soil water retention curve and saturated hydraulic conductivity. Journal of Hydrology. 528 (2015): 127-137.
  • 47. Ohio Center of Excellence in Knowledge-Enabled Computing Application: Saturated Hydraulic Conductivity 47 -4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 Predictedlog(Ksat)[cmday-1] Measured log(Ksat) [cm day-1] SHC2 RMSLE = 0.456 -4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 Predictedlog(Ksat)[cmday-1] Measured log(Ksat) [cm day-1] SHC2 RMSLE = 1.936 Model s t 10 30 50 100 300 500 1000 1500 Linear Regression SWRC1 0.79 0.73 0.77 0.84 0.85 0.84 0.83 0.84 0.81 0.77 SWRC2 0.79 0.72 0.77 0.85 0.84 0.84 0.84 0.83 0.80 0.78 CPXR SWRC1 0.94 0.97 0.97 0.94 0.97 0.97 0.95 0.96 0.95 0.94 SWRC2 0.95 0.96 0.94 0.95 0.97 0.96 0.95 0.98 0.97 0.94
  • 48. Ohio Center of Excellence in Knowledge-Enabled Computing Conclusion • A new type of highly accurate and interpretable regression and classification models, PXR/PXC are presented. • New techniques to build PXR and PXC models are discussed. • Each pair of pattern-model represents a diverse predictor-response interaction. • PXR and PXC models are more accurate, interpretable and less overfitting than other regression and classification algorithms. • A new method adopted from CPXC presented to handle classifying imbalanced datasets. • Several applications of CPXR and CPXC are discussed. 48
  • 49. Ohio Center of Excellence in Knowledge-Enabled Computing Related publications • Guozhu Dong, Vahid Taslimitehrani, Pattern-Aided Regression Modeling and Prediction Model Analysis. in IEEE Transactions on Knowledge and Data Engineering, vol.27, no.9, pp.2452-2465, Sept. 1 2015. • Vahid Taslimitehrani, Guozhu Dong. A New CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury", IEEE International Conference on Bioinformatics and Bioengineering (BIBE), 2014, On page(s): 283 – 290 (Best Student Paper) • Behzad Ghanbarian, Vahid Taslimitehrani, Guozhu Dong, Yakov Pachepsky. Sample dimensions effect on prediction of soil water retention curve and saturated hydraulic conductivity. Journal of Hydrology. 528 (2015): 127-137. • Vahid Taslimitehrani, Guozhu Dong, Naveen Pereira, Maryam Panahiazar, Jyotishman Pathak. Develolping HER-driven Heart Failure Models using CPXR(Log) with the probabilistic loss function. Journal of Biomedical Informatics (2016). • Guozhu Dong, Vahid Taslimitehrani, Pattern Aided Classification, SIAM Data Mining Conference, 2016 49
  • 50. Ohio Center of Excellence in Knowledge-Enabled Computing Acknowledgement 50

Editor's Notes

  1. Reference:
  2. HF example, old and young patient
  3. We propose a methodology that addresses those challenges.