2. INTRODUCTION
As time progresses, technology gets more complicated. Demands, supplies, objects and
necessities get more sophisticated due to this progression. This fact over-diversifies the names
or titles of these concepts. Over-diversified naming creates a taxonomy problem difficulty for
handling (or managing) concepts.
3. MOTIVATION
My approach is based on ยซnamesยป (or ยซtitlesยป in some concepts) and ยซdescriptionsยป. If we
want to reduce the high number of overdiversified titles to a manageable level, we need a
classification algorithm. Obviously, ยซdescriptionsยป have more information than ยซnamesยป have. I
use a token set obtained from ยซdescriptionยป to reduce the number of the names.
4. STEP 1
First of all, for all objects we can focus two features, ยซtitlesยป (or ยซnameยป) and ยซdescriptionsยป
๐ก๐ = ๐ก๐๐ก๐๐ (๐๐ ๐๐๐๐) ๐๐ ๐๐๐๐๐๐ก ๐
๐๐ = ๐๐๐ ๐๐๐๐๐ก๐๐๐ ๐๐ ๐๐๐๐๐๐ก ๐
๐ก๐๐๐๐๐ = ๐ ๐๐ก ๐๐ ๐ก๐๐๐๐๐ ๐๐๐ก๐๐๐๐๐ ๐๐๐๐ ๐๐
๐ = 1,2,3, โฆ ๐ ๐๐๐๐๐ฅ ๐๐ ๐๐๐๐๐๐ก๐
๐; ๐๐ข๐๐๐๐ ๐๐ ๐กโ๐ ๐๐๐๐๐๐ก๐
๐๐ข; ๐๐ข๐๐๐๐ ๐๐ ๐ข๐๐๐๐ข๐ ๐ก๐๐ก๐๐๐ (๐๐ ๐๐๐๐๐ )
5. Step 2
We can define the taxonomy problem in the light of Step 1. ๐๐ข can be very high and then is not
manageable. Our goal is to create main categories to be easily managed. We can choose a
number m and focus on "top m" objects and their descriptions.
๐ = 1,2,3, โฆ ๐ ๐๐๐๐๐ฅ ๐๐ ๐ก๐๐ก๐๐๐ ๐๐ ๐ก๐๐ ๐ ๐๐๐ ๐ก
๐๐; ๐๐๐๐๐๐ ๐๐ ๐ก๐๐ก๐๐ ๐ ๐๐ ๐ก๐๐ก๐๐๐ ๐๐ ๐ก๐๐ ๐ ๐๐๐ ๐ก
Now we can token clouds
๐๐๐๐ข๐๐: = (๐ก๐==๐๐) ๐ก๐๐๐๐๐
6. Step 3
Now we can give new taxonomy "๐๐" for object with i index
๐๐๐ = ๐๐โ ๐คโ๐๐๐ ๐โ
: = arg ๐๐๐ฅ๐ #(๐๐๐๐ข๐๐ โฉ ๐ก๐๐๐๐๐)
#(๐ ๐๐ก) ๐๐ ๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐ก๐ ๐๐ ๐กโ๐ ๐ ๐๐ก
โ๐ = 1,2,3, โฆ ๐, #(๐๐๐๐ข๐๐ โฉ ๐ก๐๐๐๐๐)=0 then ๐๐๐ = "NA"
๐๐ ๐๐๐ = ๐ก๐ (๐๐ ๐กโ๐๐ ๐๐๐ ๐, ๐๐๐ก๐๐๐๐๐ฆ ๐๐ข๐๐๐๐๐ ๐๐๐ฃ๐๐๐ข๐ ๐๐ฆ ๐๐๐๐๐๐๐ ๐. )
7. Remarks
1) This model failed its initial experience. That is why some clouds tend to be very large, then
the bias unavoidably occurs. To overcome this problem, emaciated clouds can be employed
instead of ๐๐๐๐ข๐๐ alternatively.
๐๐๐๐ข๐๐
๐๐
: = ๐๐๐๐ข๐๐ (๐โ ๐) ๐๐๐๐ข๐๐
2) The clouds can be created by using domain knowledge if we have it.
3) Data structures of Python are suitable for the model.