SlideShare a Scribd company logo
1 of 21
PREDICTION MODELS
BASED ON MAX-STEMS
(or harnessing imbalanced data)
Episode Two: A Combinatorial Approach
Ahmet Furkan EMREHAN
(matahmet@gmail.com)
MOTIVATION
 This study is an extension of previous study (chapter one) with combinatiorial
approach.
 In chapter one, I examine five models using distribution of stems separately.
Combination of stems with s-elements have a potential to help efficient
prediction. Because documents including comination of stems may be
semantically closer than one stem based prediction (defined in chapter one).
ADAPTATION OF COMPONENTS OF
MODELS
 𝑝, 𝐿𝑎𝑏𝑒𝑙𝑝
, 𝑛, 𝑑𝑜𝑐𝑖 𝑎𝑛𝑑 Σ𝑝
, components of general parameters, are defined in
Chapter One−Slide 8
 𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
∶ 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑡𝑒𝑚𝑠 𝑖𝑛𝑑𝑒𝑥𝑒𝑑 𝑗 𝑤𝑖𝑡ℎ 𝑠 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝑑𝑜𝑐𝑖
𝑠𝑡𝑒𝑚 𝑐𝑎𝑛 𝑏𝑒 𝑐ℎ𝑜𝑠𝑒𝑛 𝑎𝑠 𝑚𝑎𝑥 − 𝑠𝑡𝑒𝑚 𝑚𝑒𝑛𝑡𝑖𝑜𝑛𝑒𝑑 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑠𝑙𝑖𝑑𝑒𝑠.
 𝑚𝑖
𝑠
: 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
 Σ𝑖𝑗
𝑝,𝑠
: 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑛𝑐𝑙𝑢𝑑𝑒 𝑎𝑙𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
, 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑤𝑖𝑡ℎ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑤𝑖𝑡ℎ 𝑖𝑛𝑑𝑒𝑥 𝑝 𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑠𝑒𝑡
 Λ𝑖𝑗
𝑠
: = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg max
𝑝
Σ𝑖𝑗
𝑝,𝑠
 Λ𝑖
𝑝,𝑠
: counts of Λ𝑖𝑗
𝑠
𝑤ℎ𝑖𝑐ℎ 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝐿𝑎𝑏𝑒𝑙𝑝
ADAPTATION OF COMPONENTS OF
MODELS
 𝜆𝑖𝑗
𝑠
: 𝑠𝑢𝑚 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑠𝑡𝑒𝑚𝑠 𝑖𝑛 𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
 𝜌𝑖
𝑝,𝑠
≔
𝑗=1
𝑚𝑖 Σ𝑖𝑗
𝑝,𝑠
Σ𝑝 *
 ∗ 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑡ℎ𝑎𝑡 Σ𝑝=0, 𝜌𝑖
𝑝
=0
 Π𝑖𝑗
𝑝,𝑠
≔
Σ𝑖𝑗
𝑝,𝑠
𝑞=1
𝑛 Σ𝑖𝑗
𝑞
(𝑖𝑡 𝑐𝑎𝑛 𝑏𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝑎𝑠 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
, 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑤𝑖𝑡ℎ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑤𝑖𝑡ℎ 𝑝 𝑖𝑛𝑑𝑒𝑥)
 Π𝑖
𝑝,𝑠
: = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑗∗ Π𝑖𝑗∗
𝑝,𝑠
𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑎𝑙𝑙 "j∗"s 𝑚𝑒𝑒𝑡 𝑡ℎ𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 Π𝑖𝑗∗
𝑝,𝑠
> 0 *
∗ in case that Σ𝑖𝑗
𝑝,𝑠
= 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑝 = 1,2, … 𝑛, Π𝑖
𝑝,𝑠
=0
 Π𝑖
𝑝,𝑠
≔ max 𝑗(Π𝑖𝑗
𝑝,𝑠
)
General Scheme for Prediction Models
with Combinatorial Approach
X_test 𝑑𝑜𝑐𝑖
𝑑𝑜𝑐𝑖 = 𝑙𝑜𝑤𝑒𝑟(𝑑𝑜𝑐𝑖) 𝑡𝑜𝑘𝑒𝑛𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖)
𝑑𝑜𝑐𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖)
𝑠𝑡𝑒𝑚𝑠𝑖 = max _𝑠𝑡𝑒𝑚𝑠(𝑡𝑜𝑘𝑒𝑛𝑖)
𝑐𝑜𝑚𝑏𝑜𝑖𝑖1
𝑐𝑜𝑚𝑏𝑜𝑖𝑖2
𝑐𝑜𝑚𝑏𝑜𝑖𝑖𝑚𝑖
𝑠
.
.
.
Σ𝑖1
𝑝,𝑠
, Π𝑖1
𝑝,𝑠
, Λ𝑖1
𝑠
,𝜆𝑖1
𝑠
Σ𝑖2
𝑝,𝑠
, Π𝑖2
𝑝,𝑠
, Λ𝑖2
𝑠
,𝜆𝑖2
𝑠
Σ𝑖𝑚𝑖
𝑝,𝑠
, Π𝑖𝑚𝑖
𝑝,𝑠
, , Λ𝑖𝑚𝑖
𝑠
𝑠
, 𝜆𝑖𝑚𝑖
𝑠
𝑠
.
.
.
Analyze_word_s
Predictions
With
Combinatiorial
Approach
X_train, y_train
Evaluations (accuracy,
confusion etc.) y_test
𝑐𝑜𝑚𝑏𝑜𝑖
𝑠
= get_combo(𝑠𝑡𝑒𝑚𝑠𝑖)
Model 1
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏1 𝑑𝑜𝑐𝑖, 𝑠 =
𝐿𝑎𝑏𝑒𝑙𝑞 𝑖𝑓 Σ𝑖𝑗
𝑝,𝑠
= 0 < Σ𝑖𝑗
𝑞,𝑠
𝑓𝑜𝑟 𝑎𝑙𝑙 𝑝 ≠ 𝑞
𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 2𝑛𝑑 𝑚𝑎𝑥𝑝 Λ𝑖
𝑝,𝑠
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ∗
∗ 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑡ℎ𝑎𝑡 𝑞 𝑖𝑠 𝑛𝑜𝑡 𝑢𝑛𝑖𝑞𝑒, 𝑞 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑎𝑠 𝑡ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑖𝑛𝑑𝑒𝑥 𝑚𝑒𝑒𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛
4/6/2022
EMREHAN 6
Model 2
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏2 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞
𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝 𝜌𝑖
𝑝,𝑠
4/6/2022
EMREHAN 7
Model 3
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏3 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝Π𝑖
𝑝,𝑠
4/6/2022
EMREHAN 8
Model 4
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏4 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝Π𝑖
𝑝,𝑠
∗ 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑡ℎ𝑎𝑡 𝑞 𝑖𝑠 𝑛𝑜𝑡 𝑢𝑛𝑖𝑞𝑒, 𝑞 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑎𝑠 𝑡ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑖𝑛𝑑𝑒𝑥 𝑚𝑒𝑒𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛
4/6/2022
EMREHAN 9
Model 5
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏5 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞
𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝 (𝑚𝑎𝑥𝑗 𝜆𝑖𝑗 ∗
Σ𝑖𝑗
𝑝
Σ𝑝
)
4/6/2022
EMREHAN 10
A Trivial Result
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏𝑘 𝑑𝑜𝑐𝑖, 1 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑘 𝑑𝑜𝑐𝑖 𝑓𝑜𝑟 𝑘 = 1,2, … , 5
 Moreover all parameters in case s = 1, equal to corresponding parameters in
chapter one (slides 8-10) For example 𝑚𝑖
𝑠
= 𝑚𝑖, Σ𝑖𝑗
𝑝,1
= Σ𝑖𝑗
𝑝
and Π𝑖
𝑝,1
= Π𝑖
𝑝
.
Application (introduction)
 We use data of «nayn.co» a news portal in Turkish Language. Data is imported by url
«"https://raw.githubusercontent.com/naynco/nayn.data/master/classification_clean.c
sv"» as done in chapter one.
 Head of data is presented below
There are 11622 documents («Title» column) with label («DÜNYA» (World), «SPOR» (Sports),
«SANAT»(Art) and «Teknoloji»(Technology)). But data is imbalanced in favor of category «DÜNYA» such
that the counts [and percentages] of categories 9226[%79] ,1967 [%17],285 [%2] and 144 [%1]
respectively.
General Scheme for Application of Prediction
Models with Combinatorial Approach
X_test 𝑑𝑜𝑐𝑖
𝑑𝑜𝑐𝑖 = 𝑙𝑜𝑤𝑒𝑟(𝑑𝑜𝑐𝑖) 𝑡𝑜𝑘𝑒𝑛𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖)
𝑑𝑜𝑐𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖)
𝑠𝑡𝑒𝑚𝑠𝑖 = max _𝑠𝑡𝑒𝑚𝑠(𝑡𝑜𝑘𝑒𝑛𝑖)
𝑐𝑜𝑚𝑏𝑜𝑖𝑖1
𝑐𝑜𝑚𝑏𝑜𝑖𝑖2
𝑐𝑜𝑚𝑏𝑜𝑖𝑖𝑚𝑖
𝑠
.
.
.
Σ𝑖1
𝑝,𝑠
, Π𝑖1
𝑝,𝑠
, Λ𝑖1
𝑠
,𝜆𝑖1
𝑠
Σ𝑖2
𝑝,𝑠
, Π𝑖2
𝑝,𝑠
, Λ𝑖2
𝑠
,𝜆𝑖2
𝑠
Σ𝑖𝑚𝑖
𝑝,𝑠
, Π𝑖𝑚𝑖
𝑝,𝑠
, , Λ𝑖𝑚𝑖
𝑠
𝑠
, 𝜆𝑖𝑚𝑖
𝑠
𝑠
.
.
.
Analyze_word_s
Predictions
With
Combinatiorial
Approach
X_train, y_train
Evaluations (accuracy,
confusion etc.) y_test
𝑐𝑜𝑚𝑏𝑜𝑖
𝑠
= get_combo(𝑠𝑡𝑒𝑚𝑠𝑖)
Turkish
Stem List
(32001 stems)
(for application)
Step2_Preprocess_word.ipynb
Step3_Classifier_V10.ipynb
Step4_Prediction_Models_cb.ipynb
Step6_Run_Integrated_Model_cb.ipynb
13
Application (computations)
 Pandas and Sklearn libraries in Python is used for application of methods. Test size is
chosen as 0.2 and random_state parameter for partition as 57.
 Values of parameters in model are computed below
 Counts of categories: 𝑛 = 4
 Indexes and name of categories: 𝑝 = 1,2,3 𝑎𝑛𝑑 4 , 𝐿𝑎𝑏𝑒𝑙𝑝
= "DÜNYA","SPOR","SANAT
and "Teknoloji", 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦
 Counts of categories in train set : Σ1
= 7384, Σ2
= 1568, Σ3
= 229 𝑎𝑛𝑑 Σ4
= 116
4/6/2022
EMREHAN 14
Application (computations)
 Now Let’s show an example and compute its parameters (or compounds of models). We apply models to document,
used in chapter one, with index number 𝑖 = 38296 , 𝑟𝑎𝑛𝑘 𝑖𝑛 𝑡𝑒𝑠𝑡 𝑠𝑒𝑡 = 1356 (index number may not be related to
rank). We examine case 𝑠 = 2, because set of 𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
of the document is empty for s > 2 .
 𝑑𝑜𝑐𝑖: 2 𝑘𝑒𝑑𝑖 2 yıldır sanat müzesine girmeye çalışıyor. (𝑒𝑛: 2 𝑐𝑎𝑡𝑠 𝑡𝑟𝑦 𝑡𝑜 𝑒𝑛𝑡𝑒𝑟 𝑎𝑟𝑡 𝑚𝑢𝑠𝑒𝑢𝑚 𝑓𝑜𝑟 2 𝑦𝑒𝑎𝑟𝑠)
 𝑙𝑎𝑏𝑒𝑙𝑖: 𝐷Ü𝑁𝑌𝐴 (𝑒𝑛: 𝑤𝑜𝑟𝑙𝑑)
 𝑠𝑡𝑒𝑚𝑠: [′𝑘𝑒𝑑𝑖′, ′𝑦𝚤𝑙𝑑𝚤𝑟′, ′𝑠𝑎𝑛𝑎𝑡′, ′𝑚ü𝑧𝑒′, ′𝑔𝑖𝑟′, ′ç𝑎𝑙𝚤′]
 Output of analyze_doc(doc,X_train,y_train): 𝑗 = 1, … , 𝑚𝑖
2
= 7
 Output of analyze_doc_s(doc,X_train,y_train,2):𝑗 = 1, … , 7 𝑚𝑖 = 6
𝑐𝑜𝑚𝑏𝑜𝑖𝑗
2
𝜆𝑖𝑗
2
Σ𝑖𝑗
𝑝,2
Π𝑖𝑗
𝑝,2
Λ𝑖
𝑝,2 Category of
2nd max Σ𝑖𝑗
𝑝,2
(not used)
4/6/2022
EMREHAN 15
Application (computations)
 𝑖 = 38296, 𝑠 = 2
 𝑐𝑜𝑚𝑏𝑜: comboi1
2
= [′sanat′,′müze′], comboi2
2
= [′kedi′,′gir′], comboi3
2
= [′müze′,′çalı′], comboi4
2
= [′kedi′,′yıldır′],
comboi5
2
= [′gir′,′çalı′], comboi6
2
= [′yıldır′,′çalı′], comboi7
2
= [′sanat′,′çalı′]
 𝑠𝑢𝑚 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑠𝑡𝑒𝑚𝑠 𝑖𝑛 𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
∶ 𝜆𝑖1
2
= 9, 𝜆𝑖2
2
= 7, 𝜆𝑖3
2
= 8, 𝜆𝑖4
2
= 9, 𝜆𝑖5
2
= 7, 𝜆𝑖6
2
= 10, 𝜆𝑖7
2
= 9
 Σ𝑖𝑗
𝑝,𝑠
: 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑛𝑐𝑙𝑢𝑑𝑒 𝑎𝑙𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗
𝑠
, 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑤𝑖𝑡ℎ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑤𝑖𝑡ℎ 𝑖𝑛𝑑𝑒𝑥 𝑝 𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑠𝑒𝑡
 𝑓𝑜𝑟 𝑗 = 3 𝑎𝑛𝑑 𝑗 = 7 : Σ𝑖3
1,2
= 2,: Σ𝑖3
2,2
= 0, Σ𝑖3
3,2
= 1, Σ𝑖3
,4,2
= 0 , Σ𝑖7
1,2
= 1, Σ𝑖7
1,2
= 0, Σ𝑖7
1,2
= 0, Σ𝑖7
1,2
= 0
 Λ𝑖𝑗: = 𝐿𝑎𝑏𝑒𝑙𝑞
𝑤ℎ𝑒𝑟𝑒 𝑞 = arg max
𝑝
Σ𝑖𝑗
𝑝
: Λi1 = "SANAT", Λi2 = "DÜNYA", Λi3 = "DÜNYA" , Λi4="DÜNYA",
Λi5="DÜNYA", Λi6 = "SPOR", Λi7="DÜNYA"
 Λ𝑖
𝑝
: counts of Λ𝑖𝑗 𝑤ℎ𝑖𝑐ℎ 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝐿𝑎𝑏𝑒𝑙𝑝
, Λ𝑖
1
= 5, Λ𝑖
2
= 1, Λ𝑖
3
= 1, Λ𝑖
4
= 0
 Π𝑖𝑗
𝑝
for j = 1 and j = 5 ∶ Π𝑖1
1,2
= 0, Π𝑖1
2,2
= 0, Π𝑖1
3,2
= 1, Π𝑖1
4,2
= 0, Π𝑖5
1,2
= 0.75, Π𝑖5
2,2
= 0.25, Π𝑖5
3,2
= 0, Π𝑖5
4,2
= 0
4/6/2022
EMREHAN 16
Application (computations)
 𝜌𝑖
𝑝
: 𝜌𝑖
1
=
8
7384
= 0.001, 𝜌𝑖
2
=
3
1568
= 0.002, 𝜌𝑖
3
=
3
229
= 0.013, 𝜌𝑖
4
=
0
116
= 0
 Π𝑖
𝑝
: Π𝑖
1
=
1+0.67+1+0.75+1
4
= 0.884, Π𝑖
2
=
0.25+1
2
= 0.625, Π𝑖
3
=
1+0.33
2
= 0.665, Π𝑖
4
= 0
 Π𝑖
𝑝
: Π𝑖
1
= 1, Π𝑖
2
= 1, Π𝑖
3
= 1, Π𝑖
4
= 0
4/6/2022
EMREHAN 17
Application (Predictions)
 𝑓𝑜𝑟 𝑖 = 38296
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏1 𝑑𝑜𝑐𝑖, 2 = SPOR
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏2 𝑑𝑜𝑐𝑖, 2 = SANAT
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏3 𝑑𝑜𝑐𝑖, 2 = DÜNYA
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏4 𝑑𝑜𝑐𝑖, 2 = DÜNYA
 𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏5 𝑑𝑜𝑐𝑖, 2 = SANAT
Application (Results)
Confusion Matrix for Model 1 (count) s= 2 Confusion Matrix for Model 1 (percentage)
Prediction Prediction (rounded to 2 digits)
DÜNYA SANAT SPOR Teknoloji
No
Prediction
Total DÜNYA SANAT SPOR Teknoloji No Prediction Total
Observed
DÜNYA 1204 47 519 47 25 1842 0.65 0.03 0.28 0.03 0.01 1
SANAT 15 23 15 0 3 56 0.27 0.41 0.27 0 0.05 1
SPOR 201 3 182 0 13 399 0.5 0.01 0.46 0 0.03 1
Teknoloji 18 0 8 2 0 28 0.64 0 0.29 0.07 0 1
Accuracy Rate For Model 1
0.61
Confusion Matrix for Model 2 (count) s= 2 Confusion Matrix for Model 2 (percentage)
Prediction Prediction (rounded to 2 digits)
DÜNYA SANAT SPOR Teknoloji
No
Prediction
Total DÜNYA SANAT SPOR Teknoloji No Prediction Total
Observed
DÜNYA 1286 174 130 227 25 1842 0.7 0.09 0.07 0.12 0.01 1
SANAT 3 41 9 0 3 56 0.05 0.73 0.16 0 0.05 1
SPOR 30 18 324 14 13 399 0.08 0.05 0.81 0.04 0.03 1
Teknoloji 14 0 2 12 0 28 0.5 0 0.07 0.43 0 1
Accuracy Rate For Model 2
0.72
Application (Results)
Confusion Matrix for Model 3 (count) s= 2 Confusion Matrix for Model 3 (percentage)
Prediction Prediction (rounded to 2 digits)
DÜNYA SANAT SPOR Teknoloji
No
Prediction
Total DÜNYA SANAT SPOR Teknoloji No Prediction Total
Observed
DÜNYA 1755 17 25 20 25 1842 0.95 0.01 0.01 0.01 0.01 1
SANAT 39 8 6 0 3 56 0.7 0.14 0.11 0 0.05 1
SPOR 159 8 214 5 13 399 0.4 0.02 0.54 0.01 0.03 1
Teknoloji 28 0 0 0 0 28 1 0 0 0 0 1
Accuracy Rate For Model 3
0.85
Confusion Matrix for Model 4 (count) s= 2 Confusion Matrix for Model 4 (percentage)
Prediction Prediction (rounded to 2 digits)
DÜNYA SANAT SPOR Teknoloji
No
Prediction
Total DÜNYA SANAT SPOR Teknoloji No Prediction Total
Observed
DÜNYA 1805 3 7 2 25 1842 0.98 0 0 0 0.01 1
SANAT 47 2 4 0 3 56 0.84 0.04 0.07 0 0.05 1
SPOR 284 0 102 0 13 399 0.71 0 0.26 0 0.03 1
Teknoloji 28 0 0 0 0 28 1 0 0 0 0 1
Accuracy Rate For Model 4
0.82
Application (Results)
Confusion Matrix for Model 5 (count) s= 2 Confusion Matrix for Model 5 (percentage)
Prediction Prediction (rounded to 2 digits)
DÜNYA SANAT SPOR Teknoloji
No
Prediction
Total DÜNYA SANAT SPOR Teknoloji No Prediction Total
Observed
DÜNYA 977 289 189 362 25 1842 0.53 0.16 0.1 0.2 0.01 1
SANAT 3 40 10 0 3 56 0.05 0.71 0.18 0 0.05 1
SPOR 25 27 308 26 13 399 0.06 0.07 0.77 0.07 0.03 1
Teknoloji 10 0 3 15 0 28 0.36 0 0.11 0.54 0 1
Accuracy Rate For Model 5
0.58
A Note:
All predictions of 25,2,13 documents labelled with «DÜNYA», «SANAT» and
«SPOR» respectively are «No prediction». Because no combinations, with s = 2
stems,of those documents in test set are covered by a document in train set.
Trivially prediction based combinations of stems of these documents, with s > 2
stems, are «No Predidiction».
End of Chapter Two

More Related Content

What's hot

mathematical model
mathematical modelmathematical model
mathematical modelKt Silva
 
ePoster_Saunak.Amitangshu
ePoster_Saunak.AmitangshuePoster_Saunak.Amitangshu
ePoster_Saunak.AmitangshuSaunak Saha
 
Evaluation of 6 noded quareter point element for crack analysis by analytical...
Evaluation of 6 noded quareter point element for crack analysis by analytical...Evaluation of 6 noded quareter point element for crack analysis by analytical...
Evaluation of 6 noded quareter point element for crack analysis by analytical...eSAT Publishing House
 
Numerical approximation and solution of equations
Numerical approximation and solution of equationsNumerical approximation and solution of equations
Numerical approximation and solution of equationsRobinson
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Seval Çapraz
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?Smarten Augmented Analytics
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
 
Approximation and error
Approximation and errorApproximation and error
Approximation and errorrubenarismendi
 
Mathematical modelling
Mathematical modellingMathematical modelling
Mathematical modellingSadia Zareen
 
Matematika terapan week 6
Matematika terapan week 6 Matematika terapan week 6
Matematika terapan week 6 nellylawar
 

What's hot (15)

mathematical model
mathematical modelmathematical model
mathematical model
 
ePoster_Saunak.Amitangshu
ePoster_Saunak.AmitangshuePoster_Saunak.Amitangshu
ePoster_Saunak.Amitangshu
 
A comparative study of nonlinear circle criterion based observer and H∞ obser...
A comparative study of nonlinear circle criterion based observer and H∞ obser...A comparative study of nonlinear circle criterion based observer and H∞ obser...
A comparative study of nonlinear circle criterion based observer and H∞ obser...
 
Evaluation of 6 noded quareter point element for crack analysis by analytical...
Evaluation of 6 noded quareter point element for crack analysis by analytical...Evaluation of 6 noded quareter point element for crack analysis by analytical...
Evaluation of 6 noded quareter point element for crack analysis by analytical...
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Numerical approximation and solution of equations
Numerical approximation and solution of equationsNumerical approximation and solution of equations
Numerical approximation and solution of equations
 
Outlier
OutlierOutlier
Outlier
 
Mine Death Estimation
Mine Death EstimationMine Death Estimation
Mine Death Estimation
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
Ijetcas14 608
Ijetcas14 608Ijetcas14 608
Ijetcas14 608
 
Approximation and error
Approximation and errorApproximation and error
Approximation and error
 
Mathematical modelling
Mathematical modellingMathematical modelling
Mathematical modelling
 
Matematika terapan week 6
Matematika terapan week 6 Matematika terapan week 6
Matematika terapan week 6
 

Similar to PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach

APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...cscpconf
 
MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...
MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...
MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...ijaia
 
research on journaling
research on journalingresearch on journaling
research on journalingrikaseorika
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONorajjournal
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...Aboul Ella Hassanien
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
Lecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptxLecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptxxalahama3
 
A computational method for system of linear fredholm integral equations
A computational method for system of linear fredholm integral equationsA computational method for system of linear fredholm integral equations
A computational method for system of linear fredholm integral equationsAlexander Decker
 
Design of predictive controller for smooth set point tracking for fast dynami...
Design of predictive controller for smooth set point tracking for fast dynami...Design of predictive controller for smooth set point tracking for fast dynami...
Design of predictive controller for smooth set point tracking for fast dynami...eSAT Journals
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionIJTET Journal
 
BU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptx
BU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptxBU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptx
BU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptxMaiGaafar
 
PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...
PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...
PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...IAEME Publication
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEMJAGADESH
 
Symmetric quadratic tetration interpolation using forward and backward opera...
Symmetric quadratic tetration interpolation using forward and  backward opera...Symmetric quadratic tetration interpolation using forward and  backward opera...
Symmetric quadratic tetration interpolation using forward and backward opera...IJECEIAES
 

Similar to PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach (20)

APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
 
MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...
MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...
MIXTURES OF TRAINED REGRESSION CURVESMODELS FOR HANDRITTEN ARABIC CHARACTER R...
 
research on journaling
research on journalingresearch on journaling
research on journaling
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
Lecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptxLecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptx
 
A computational method for system of linear fredholm integral equations
A computational method for system of linear fredholm integral equationsA computational method for system of linear fredholm integral equations
A computational method for system of linear fredholm integral equations
 
Design of predictive controller for smooth set point tracking for fast dynami...
Design of predictive controller for smooth set point tracking for fast dynami...Design of predictive controller for smooth set point tracking for fast dynami...
Design of predictive controller for smooth set point tracking for fast dynami...
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
 
Py data19 final
Py data19   finalPy data19   final
Py data19 final
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint Compression
 
BU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptx
BU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptxBU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptx
BU_FCAI_SCC430_Modeling&Simulation_Ch05-P2.pptx
 
PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...
PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...
PATTERN SYNTHESIS OF NON-UNIFORM AMPLITUDE EQUALLY SPACED MICROSTRIP ARRAY AN...
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
 
Symmetric quadratic tetration interpolation using forward and backward opera...
Symmetric quadratic tetration interpolation using forward and  backward opera...Symmetric quadratic tetration interpolation using forward and  backward opera...
Symmetric quadratic tetration interpolation using forward and backward opera...
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach

  • 1. PREDICTION MODELS BASED ON MAX-STEMS (or harnessing imbalanced data) Episode Two: A Combinatorial Approach Ahmet Furkan EMREHAN (matahmet@gmail.com)
  • 2. MOTIVATION  This study is an extension of previous study (chapter one) with combinatiorial approach.  In chapter one, I examine five models using distribution of stems separately. Combination of stems with s-elements have a potential to help efficient prediction. Because documents including comination of stems may be semantically closer than one stem based prediction (defined in chapter one).
  • 3. ADAPTATION OF COMPONENTS OF MODELS  𝑝, 𝐿𝑎𝑏𝑒𝑙𝑝 , 𝑛, 𝑑𝑜𝑐𝑖 𝑎𝑛𝑑 Σ𝑝 , components of general parameters, are defined in Chapter One−Slide 8  𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠 ∶ 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑡𝑒𝑚𝑠 𝑖𝑛𝑑𝑒𝑥𝑒𝑑 𝑗 𝑤𝑖𝑡ℎ 𝑠 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝑑𝑜𝑐𝑖 𝑠𝑡𝑒𝑚 𝑐𝑎𝑛 𝑏𝑒 𝑐ℎ𝑜𝑠𝑒𝑛 𝑎𝑠 𝑚𝑎𝑥 − 𝑠𝑡𝑒𝑚 𝑚𝑒𝑛𝑡𝑖𝑜𝑛𝑒𝑑 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑠𝑙𝑖𝑑𝑒𝑠.  𝑚𝑖 𝑠 : 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠  Σ𝑖𝑗 𝑝,𝑠 : 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑛𝑐𝑙𝑢𝑑𝑒 𝑎𝑙𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠 , 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑤𝑖𝑡ℎ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑤𝑖𝑡ℎ 𝑖𝑛𝑑𝑒𝑥 𝑝 𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑠𝑒𝑡  Λ𝑖𝑗 𝑠 : = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg max 𝑝 Σ𝑖𝑗 𝑝,𝑠  Λ𝑖 𝑝,𝑠 : counts of Λ𝑖𝑗 𝑠 𝑤ℎ𝑖𝑐ℎ 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝐿𝑎𝑏𝑒𝑙𝑝
  • 4. ADAPTATION OF COMPONENTS OF MODELS  𝜆𝑖𝑗 𝑠 : 𝑠𝑢𝑚 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑠𝑡𝑒𝑚𝑠 𝑖𝑛 𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠  𝜌𝑖 𝑝,𝑠 ≔ 𝑗=1 𝑚𝑖 Σ𝑖𝑗 𝑝,𝑠 Σ𝑝 *  ∗ 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑡ℎ𝑎𝑡 Σ𝑝=0, 𝜌𝑖 𝑝 =0  Π𝑖𝑗 𝑝,𝑠 ≔ Σ𝑖𝑗 𝑝,𝑠 𝑞=1 𝑛 Σ𝑖𝑗 𝑞 (𝑖𝑡 𝑐𝑎𝑛 𝑏𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝑎𝑠 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠 , 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑤𝑖𝑡ℎ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑤𝑖𝑡ℎ 𝑝 𝑖𝑛𝑑𝑒𝑥)  Π𝑖 𝑝,𝑠 : = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑗∗ Π𝑖𝑗∗ 𝑝,𝑠 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑎𝑙𝑙 "j∗"s 𝑚𝑒𝑒𝑡 𝑡ℎ𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 Π𝑖𝑗∗ 𝑝,𝑠 > 0 * ∗ in case that Σ𝑖𝑗 𝑝,𝑠 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑝 = 1,2, … 𝑛, Π𝑖 𝑝,𝑠 =0  Π𝑖 𝑝,𝑠 ≔ max 𝑗(Π𝑖𝑗 𝑝,𝑠 )
  • 5. General Scheme for Prediction Models with Combinatorial Approach X_test 𝑑𝑜𝑐𝑖 𝑑𝑜𝑐𝑖 = 𝑙𝑜𝑤𝑒𝑟(𝑑𝑜𝑐𝑖) 𝑡𝑜𝑘𝑒𝑛𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖) 𝑑𝑜𝑐𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖) 𝑠𝑡𝑒𝑚𝑠𝑖 = max _𝑠𝑡𝑒𝑚𝑠(𝑡𝑜𝑘𝑒𝑛𝑖) 𝑐𝑜𝑚𝑏𝑜𝑖𝑖1 𝑐𝑜𝑚𝑏𝑜𝑖𝑖2 𝑐𝑜𝑚𝑏𝑜𝑖𝑖𝑚𝑖 𝑠 . . . Σ𝑖1 𝑝,𝑠 , Π𝑖1 𝑝,𝑠 , Λ𝑖1 𝑠 ,𝜆𝑖1 𝑠 Σ𝑖2 𝑝,𝑠 , Π𝑖2 𝑝,𝑠 , Λ𝑖2 𝑠 ,𝜆𝑖2 𝑠 Σ𝑖𝑚𝑖 𝑝,𝑠 , Π𝑖𝑚𝑖 𝑝,𝑠 , , Λ𝑖𝑚𝑖 𝑠 𝑠 , 𝜆𝑖𝑚𝑖 𝑠 𝑠 . . . Analyze_word_s Predictions With Combinatiorial Approach X_train, y_train Evaluations (accuracy, confusion etc.) y_test 𝑐𝑜𝑚𝑏𝑜𝑖 𝑠 = get_combo(𝑠𝑡𝑒𝑚𝑠𝑖)
  • 6. Model 1  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏1 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑖𝑓 Σ𝑖𝑗 𝑝,𝑠 = 0 < Σ𝑖𝑗 𝑞,𝑠 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑝 ≠ 𝑞 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 2𝑛𝑑 𝑚𝑎𝑥𝑝 Λ𝑖 𝑝,𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ∗ ∗ 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑡ℎ𝑎𝑡 𝑞 𝑖𝑠 𝑛𝑜𝑡 𝑢𝑛𝑖𝑞𝑒, 𝑞 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑎𝑠 𝑡ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑖𝑛𝑑𝑒𝑥 𝑚𝑒𝑒𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 4/6/2022 EMREHAN 6
  • 7. Model 2  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏2 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝 𝜌𝑖 𝑝,𝑠 4/6/2022 EMREHAN 7
  • 8. Model 3  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏3 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝Π𝑖 𝑝,𝑠 4/6/2022 EMREHAN 8
  • 9. Model 4  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏4 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝Π𝑖 𝑝,𝑠 ∗ 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑡ℎ𝑎𝑡 𝑞 𝑖𝑠 𝑛𝑜𝑡 𝑢𝑛𝑖𝑞𝑒, 𝑞 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑎𝑠 𝑡ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑖𝑛𝑑𝑒𝑥 𝑚𝑒𝑒𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 4/6/2022 EMREHAN 9
  • 10. Model 5  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏5 𝑑𝑜𝑐𝑖, 𝑠 = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg 𝑚𝑎𝑥𝑝 (𝑚𝑎𝑥𝑗 𝜆𝑖𝑗 ∗ Σ𝑖𝑗 𝑝 Σ𝑝 ) 4/6/2022 EMREHAN 10
  • 11. A Trivial Result  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏𝑘 𝑑𝑜𝑐𝑖, 1 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑘 𝑑𝑜𝑐𝑖 𝑓𝑜𝑟 𝑘 = 1,2, … , 5  Moreover all parameters in case s = 1, equal to corresponding parameters in chapter one (slides 8-10) For example 𝑚𝑖 𝑠 = 𝑚𝑖, Σ𝑖𝑗 𝑝,1 = Σ𝑖𝑗 𝑝 and Π𝑖 𝑝,1 = Π𝑖 𝑝 .
  • 12. Application (introduction)  We use data of «nayn.co» a news portal in Turkish Language. Data is imported by url «"https://raw.githubusercontent.com/naynco/nayn.data/master/classification_clean.c sv"» as done in chapter one.  Head of data is presented below There are 11622 documents («Title» column) with label («DÜNYA» (World), «SPOR» (Sports), «SANAT»(Art) and «Teknoloji»(Technology)). But data is imbalanced in favor of category «DÜNYA» such that the counts [and percentages] of categories 9226[%79] ,1967 [%17],285 [%2] and 144 [%1] respectively.
  • 13. General Scheme for Application of Prediction Models with Combinatorial Approach X_test 𝑑𝑜𝑐𝑖 𝑑𝑜𝑐𝑖 = 𝑙𝑜𝑤𝑒𝑟(𝑑𝑜𝑐𝑖) 𝑡𝑜𝑘𝑒𝑛𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖) 𝑑𝑜𝑐𝑖 = 𝑐𝑙𝑒𝑎𝑛(𝑑𝑜𝑐𝑖) 𝑠𝑡𝑒𝑚𝑠𝑖 = max _𝑠𝑡𝑒𝑚𝑠(𝑡𝑜𝑘𝑒𝑛𝑖) 𝑐𝑜𝑚𝑏𝑜𝑖𝑖1 𝑐𝑜𝑚𝑏𝑜𝑖𝑖2 𝑐𝑜𝑚𝑏𝑜𝑖𝑖𝑚𝑖 𝑠 . . . Σ𝑖1 𝑝,𝑠 , Π𝑖1 𝑝,𝑠 , Λ𝑖1 𝑠 ,𝜆𝑖1 𝑠 Σ𝑖2 𝑝,𝑠 , Π𝑖2 𝑝,𝑠 , Λ𝑖2 𝑠 ,𝜆𝑖2 𝑠 Σ𝑖𝑚𝑖 𝑝,𝑠 , Π𝑖𝑚𝑖 𝑝,𝑠 , , Λ𝑖𝑚𝑖 𝑠 𝑠 , 𝜆𝑖𝑚𝑖 𝑠 𝑠 . . . Analyze_word_s Predictions With Combinatiorial Approach X_train, y_train Evaluations (accuracy, confusion etc.) y_test 𝑐𝑜𝑚𝑏𝑜𝑖 𝑠 = get_combo(𝑠𝑡𝑒𝑚𝑠𝑖) Turkish Stem List (32001 stems) (for application) Step2_Preprocess_word.ipynb Step3_Classifier_V10.ipynb Step4_Prediction_Models_cb.ipynb Step6_Run_Integrated_Model_cb.ipynb 13
  • 14. Application (computations)  Pandas and Sklearn libraries in Python is used for application of methods. Test size is chosen as 0.2 and random_state parameter for partition as 57.  Values of parameters in model are computed below  Counts of categories: 𝑛 = 4  Indexes and name of categories: 𝑝 = 1,2,3 𝑎𝑛𝑑 4 , 𝐿𝑎𝑏𝑒𝑙𝑝 = "DÜNYA","SPOR","SANAT and "Teknoloji", 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦  Counts of categories in train set : Σ1 = 7384, Σ2 = 1568, Σ3 = 229 𝑎𝑛𝑑 Σ4 = 116 4/6/2022 EMREHAN 14
  • 15. Application (computations)  Now Let’s show an example and compute its parameters (or compounds of models). We apply models to document, used in chapter one, with index number 𝑖 = 38296 , 𝑟𝑎𝑛𝑘 𝑖𝑛 𝑡𝑒𝑠𝑡 𝑠𝑒𝑡 = 1356 (index number may not be related to rank). We examine case 𝑠 = 2, because set of 𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠 of the document is empty for s > 2 .  𝑑𝑜𝑐𝑖: 2 𝑘𝑒𝑑𝑖 2 yıldır sanat müzesine girmeye çalışıyor. (𝑒𝑛: 2 𝑐𝑎𝑡𝑠 𝑡𝑟𝑦 𝑡𝑜 𝑒𝑛𝑡𝑒𝑟 𝑎𝑟𝑡 𝑚𝑢𝑠𝑒𝑢𝑚 𝑓𝑜𝑟 2 𝑦𝑒𝑎𝑟𝑠)  𝑙𝑎𝑏𝑒𝑙𝑖: 𝐷Ü𝑁𝑌𝐴 (𝑒𝑛: 𝑤𝑜𝑟𝑙𝑑)  𝑠𝑡𝑒𝑚𝑠: [′𝑘𝑒𝑑𝑖′, ′𝑦𝚤𝑙𝑑𝚤𝑟′, ′𝑠𝑎𝑛𝑎𝑡′, ′𝑚ü𝑧𝑒′, ′𝑔𝑖𝑟′, ′ç𝑎𝑙𝚤′]  Output of analyze_doc(doc,X_train,y_train): 𝑗 = 1, … , 𝑚𝑖 2 = 7  Output of analyze_doc_s(doc,X_train,y_train,2):𝑗 = 1, … , 7 𝑚𝑖 = 6 𝑐𝑜𝑚𝑏𝑜𝑖𝑗 2 𝜆𝑖𝑗 2 Σ𝑖𝑗 𝑝,2 Π𝑖𝑗 𝑝,2 Λ𝑖 𝑝,2 Category of 2nd max Σ𝑖𝑗 𝑝,2 (not used) 4/6/2022 EMREHAN 15
  • 16. Application (computations)  𝑖 = 38296, 𝑠 = 2  𝑐𝑜𝑚𝑏𝑜: comboi1 2 = [′sanat′,′müze′], comboi2 2 = [′kedi′,′gir′], comboi3 2 = [′müze′,′çalı′], comboi4 2 = [′kedi′,′yıldır′], comboi5 2 = [′gir′,′çalı′], comboi6 2 = [′yıldır′,′çalı′], comboi7 2 = [′sanat′,′çalı′]  𝑠𝑢𝑚 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑠𝑡𝑒𝑚𝑠 𝑖𝑛 𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠 ∶ 𝜆𝑖1 2 = 9, 𝜆𝑖2 2 = 7, 𝜆𝑖3 2 = 8, 𝜆𝑖4 2 = 9, 𝜆𝑖5 2 = 7, 𝜆𝑖6 2 = 10, 𝜆𝑖7 2 = 9  Σ𝑖𝑗 𝑝,𝑠 : 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑛𝑐𝑙𝑢𝑑𝑒 𝑎𝑙𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝑐𝑜𝑚𝑏𝑜𝑖𝑗 𝑠 , 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑤𝑖𝑡ℎ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑤𝑖𝑡ℎ 𝑖𝑛𝑑𝑒𝑥 𝑝 𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑠𝑒𝑡  𝑓𝑜𝑟 𝑗 = 3 𝑎𝑛𝑑 𝑗 = 7 : Σ𝑖3 1,2 = 2,: Σ𝑖3 2,2 = 0, Σ𝑖3 3,2 = 1, Σ𝑖3 ,4,2 = 0 , Σ𝑖7 1,2 = 1, Σ𝑖7 1,2 = 0, Σ𝑖7 1,2 = 0, Σ𝑖7 1,2 = 0  Λ𝑖𝑗: = 𝐿𝑎𝑏𝑒𝑙𝑞 𝑤ℎ𝑒𝑟𝑒 𝑞 = arg max 𝑝 Σ𝑖𝑗 𝑝 : Λi1 = "SANAT", Λi2 = "DÜNYA", Λi3 = "DÜNYA" , Λi4="DÜNYA", Λi5="DÜNYA", Λi6 = "SPOR", Λi7="DÜNYA"  Λ𝑖 𝑝 : counts of Λ𝑖𝑗 𝑤ℎ𝑖𝑐ℎ 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝐿𝑎𝑏𝑒𝑙𝑝 , Λ𝑖 1 = 5, Λ𝑖 2 = 1, Λ𝑖 3 = 1, Λ𝑖 4 = 0  Π𝑖𝑗 𝑝 for j = 1 and j = 5 ∶ Π𝑖1 1,2 = 0, Π𝑖1 2,2 = 0, Π𝑖1 3,2 = 1, Π𝑖1 4,2 = 0, Π𝑖5 1,2 = 0.75, Π𝑖5 2,2 = 0.25, Π𝑖5 3,2 = 0, Π𝑖5 4,2 = 0 4/6/2022 EMREHAN 16
  • 17. Application (computations)  𝜌𝑖 𝑝 : 𝜌𝑖 1 = 8 7384 = 0.001, 𝜌𝑖 2 = 3 1568 = 0.002, 𝜌𝑖 3 = 3 229 = 0.013, 𝜌𝑖 4 = 0 116 = 0  Π𝑖 𝑝 : Π𝑖 1 = 1+0.67+1+0.75+1 4 = 0.884, Π𝑖 2 = 0.25+1 2 = 0.625, Π𝑖 3 = 1+0.33 2 = 0.665, Π𝑖 4 = 0  Π𝑖 𝑝 : Π𝑖 1 = 1, Π𝑖 2 = 1, Π𝑖 3 = 1, Π𝑖 4 = 0 4/6/2022 EMREHAN 17
  • 18. Application (Predictions)  𝑓𝑜𝑟 𝑖 = 38296  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏1 𝑑𝑜𝑐𝑖, 2 = SPOR  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏2 𝑑𝑜𝑐𝑖, 2 = SANAT  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏3 𝑑𝑜𝑐𝑖, 2 = DÜNYA  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏4 𝑑𝑜𝑐𝑖, 2 = DÜNYA  𝑝𝑟𝑒𝑑𝑖𝑐𝑡_𝑐𝑏5 𝑑𝑜𝑐𝑖, 2 = SANAT
  • 19. Application (Results) Confusion Matrix for Model 1 (count) s= 2 Confusion Matrix for Model 1 (percentage) Prediction Prediction (rounded to 2 digits) DÜNYA SANAT SPOR Teknoloji No Prediction Total DÜNYA SANAT SPOR Teknoloji No Prediction Total Observed DÜNYA 1204 47 519 47 25 1842 0.65 0.03 0.28 0.03 0.01 1 SANAT 15 23 15 0 3 56 0.27 0.41 0.27 0 0.05 1 SPOR 201 3 182 0 13 399 0.5 0.01 0.46 0 0.03 1 Teknoloji 18 0 8 2 0 28 0.64 0 0.29 0.07 0 1 Accuracy Rate For Model 1 0.61 Confusion Matrix for Model 2 (count) s= 2 Confusion Matrix for Model 2 (percentage) Prediction Prediction (rounded to 2 digits) DÜNYA SANAT SPOR Teknoloji No Prediction Total DÜNYA SANAT SPOR Teknoloji No Prediction Total Observed DÜNYA 1286 174 130 227 25 1842 0.7 0.09 0.07 0.12 0.01 1 SANAT 3 41 9 0 3 56 0.05 0.73 0.16 0 0.05 1 SPOR 30 18 324 14 13 399 0.08 0.05 0.81 0.04 0.03 1 Teknoloji 14 0 2 12 0 28 0.5 0 0.07 0.43 0 1 Accuracy Rate For Model 2 0.72
  • 20. Application (Results) Confusion Matrix for Model 3 (count) s= 2 Confusion Matrix for Model 3 (percentage) Prediction Prediction (rounded to 2 digits) DÜNYA SANAT SPOR Teknoloji No Prediction Total DÜNYA SANAT SPOR Teknoloji No Prediction Total Observed DÜNYA 1755 17 25 20 25 1842 0.95 0.01 0.01 0.01 0.01 1 SANAT 39 8 6 0 3 56 0.7 0.14 0.11 0 0.05 1 SPOR 159 8 214 5 13 399 0.4 0.02 0.54 0.01 0.03 1 Teknoloji 28 0 0 0 0 28 1 0 0 0 0 1 Accuracy Rate For Model 3 0.85 Confusion Matrix for Model 4 (count) s= 2 Confusion Matrix for Model 4 (percentage) Prediction Prediction (rounded to 2 digits) DÜNYA SANAT SPOR Teknoloji No Prediction Total DÜNYA SANAT SPOR Teknoloji No Prediction Total Observed DÜNYA 1805 3 7 2 25 1842 0.98 0 0 0 0.01 1 SANAT 47 2 4 0 3 56 0.84 0.04 0.07 0 0.05 1 SPOR 284 0 102 0 13 399 0.71 0 0.26 0 0.03 1 Teknoloji 28 0 0 0 0 28 1 0 0 0 0 1 Accuracy Rate For Model 4 0.82
  • 21. Application (Results) Confusion Matrix for Model 5 (count) s= 2 Confusion Matrix for Model 5 (percentage) Prediction Prediction (rounded to 2 digits) DÜNYA SANAT SPOR Teknoloji No Prediction Total DÜNYA SANAT SPOR Teknoloji No Prediction Total Observed DÜNYA 977 289 189 362 25 1842 0.53 0.16 0.1 0.2 0.01 1 SANAT 3 40 10 0 3 56 0.05 0.71 0.18 0 0.05 1 SPOR 25 27 308 26 13 399 0.06 0.07 0.77 0.07 0.03 1 Teknoloji 10 0 3 15 0 28 0.36 0 0.11 0.54 0 1 Accuracy Rate For Model 5 0.58 A Note: All predictions of 25,2,13 documents labelled with «DÜNYA», «SANAT» and «SPOR» respectively are «No prediction». Because no combinations, with s = 2 stems,of those documents in test set are covered by a document in train set. Trivially prediction based combinations of stems of these documents, with s > 2 stems, are «No Predidiction». End of Chapter Two