3. which property goes to which section. So, for each data we will have property number and section
number. This data will be stored in computer so that we can use it when we need it to recognize test
data. As this method uses range instead of exact data, we could get multiple matches for test data, but
that number will be very small. We can use Euclidian distance method to find the closest data if we have
multiple candidates. Say, after processing the matching of section list, we have found ten data which has
the same section list as the test data. We can then do the Euclidian distance method to get the data
which is more close to the test data from these ten data. In this way, we are not doing Euclidian distance
with million data, but doing it for only few data which will save us from doing a lot of unnecessary
calculations.
The drawback of this method is that it is very slow. When the data set will have 100 million data in it,
then it will take long time to find a match. To get the section list will not take time, but to match the
section list of each training data with test data will take time. It will be much easier if we use tree to
search for the match. We will discuss that in the next method.
Method two:
This method is same as method one, but the only difference is that we will create a tree using the
sections data and use that to find a match. We know that it is very easy to search data in a well defined
tree. For example, if we have hundred properties for one data, then create a tree with hundred levels.
And for each level, use the dynamic range info to create branches as we do not know what will be the
range beforehand. Each level will have its own sections based on the given data set. We can use same
number of sections for each level or different number of sections.
For example, level one has range of 0‐100 and level two has range 0‐200. We can decide each level will
have ten branches. In that case, each section of level one will be separated by 10 and for level two, it
will be 20. Or, we can decide that each section will be separated by 10, in that case, first level will have
10 sections and second level will have 20 sections. We can try both ways and see which one works
better for a particular training set. Obviously, if we use different section number, that will be more
natural solution.
Once the tree is created, our first task will be to populate the tree by inserting data into it. As each data
has same number of properties, all the data will end up in the leaf node. This tree will help us to find the
match quickly. After coming down from top to bottom of the tree, we will find the match. If we have
multiple data at the leaf node for a test data, then we can use Euclidian distance method to get the
appropriate match.
Method three:
4. This method is same as method two, just has a small difference in implementation of the tree. We know
that implementation of tree is a difficult task for programmers, especially because it takes a lot of
memory. We can reduce the use of memory by doing some tricks. The trick we will apply is not to create
all the nodes for whole tree, just the ones which are going to be used by training data. It will help us to
save space in memory. We do not need to create the nodes which are not getting used. There will be
many paths which are not used by training data. For example, first property has ten sections and range
is 0 to 100. In the training data, none of the data has value from 70 to 80 for first property or first level
of the tree. In that case, do not create node for this section. We can even get rid of the section as no
data will go there. But, it will be wise decision to keep it if we decide to add data in the tree dynamically
at runtime. Keep it null until we have data for it. It will only take space in the memory if we initialize it
after we know that we are going to use it.
Method four:
This method proposes multiple properties for each level along with equal number of properties for each
level. If we create a tree with one property in each level, then we will have two problems. The First
problem is, we do not know which properties should be in top levels. Properties in top levels will be
encountered first while traversing the tree to find a match. Usually, all the features do not weight the
same to find a data. Suppose, a data has ten properties and three of them are very important. In that
case, we want to place them at the top of the tree so that we do not miss them, in other words, we
want to give them more priority. So, the properties in top levels should be the important ones. If all the
features are equally important to identify the object, then we have no problem to go with previous
methods. But, in reality, we will see that some features are important than others. And the second
problem is, if we miss one property, we will be in wrong branch of the tree which will cause us to find
wrong data or we will not find any data. For example, we are looking for a person and we have
populated a tree with 30 properties of a person. Our test person misses one property for some practical
reason at the top level of the tree, and then program cannot find him. Real‐time data is always little off
than training data because of many types of disturbance in real world. For example, usually we take
picture of someone at home and keep that in the training set. But, when we get his picture outside of
his house, especially outside in the sun with dirt and dust, his face data will be little off.
So, it is good idea to keep couple of properties in each level so that the weight of each level becomes
high. And if we miss one property, we will not be directed to wrong path or at least we will have a
chance to stay in the main path. Next question is, which properties should be on top levels? By getting
random training data set, we do not know which properties are important and should be kept at the
top. For a particular task, we might know it and use it in classifier. But, in general case, we will not know
that. Say, we want to use this classifier for any type of data, we will have to encounter situations about
which we do not have any prior knowledge, in that case, we will not know which properties are
important. For example, a robot using it to train itself about data we have seen for the first time, say for
mars land data. Then, it needs to find the important properties by working with the data. If we do not
6. more workable fashion. All these information are dynamic and real‐time. This way, we can use it for
training any type of data which need k nearest neighbor. We do not need to guide it, it will find the best
solution for it by itself. Which property will mix with which one, how many properties should be in each
level, which property mixture works best for the given training data‐ all these questions can be
answered properly by using VBCOM.
If we decide to do GA, then we will test many of the VBCOM and keep 10% of the best performer and
produce children from these to fill the new population and run the process again. We will keep running
it with predefined number of generations and get the best performer.
Method eight:
This method is same as method six, but the difference is, we will use k means clustering to get branches
in each level. At first, we have to add the values of all properties in each level or process any other
mathematical calculations and after that, we will get a total value. Then, instead of getting the highest
and lowest value and dividing it by some number to get each branch in the level, we will use k means
clustering. We will get the cluster centers and radius of each cluster. When a test data will be provided,
for each level, we will find out in which cluster it belongs, then use that branch. This way of branching a
level’s data is very dynamic and effective. We do not need to decide section length or number of
branches for a level, we will use more natural solution for it, clustering the data. By this method, each
level will have different number of branches according to the variation of data available for that level.
Method nine:
This method is same as method seven, but the difference is, we will use k means clustering to get
branches in each level. At first, we have to add the values of all properties in each level or process any
other mathematical calculations and after that, we will get a total value. Then, instead of getting the
highest and lowest value and dividing it by some number to get each branch in the level, we will use k
means clustering. We will get the cluster centers and radius for each cluster. When a test data will be
provided, we will find out in which cluster it belongs, then use that branch.
Method ten:
This method is same as method 8, but the difference is we will not use tree here. At first, we have to add
the values of all properties in each level or process any other mathematical calculations and after that,
we will get a total value. Then, instead of getting the highest and lowest value and dividing it by some
number to get each branch in the level, we will use k means clustering. We will get the cluster centers
7. and radius for each cluster. When a test data will be provided, we will find out in which cluster it
belongs, then use that branch. If the data has hundred levels, then for each level we will have some
clusters of data. For a test data, at first we need to break it down to level structure and after that find
out which cluster is more appropriate for which level. When we will know that, then we need to find the
data which has same cluster result or closest result. That will be the match for the test data.
Method eleven:
This method is same as method 9, but the difference is we will not use tree here. At first, we have to add
the values of all properties in each level or process any other mathematical calculations and after that,
we will get a total value. Then, instead of getting the highest and lowest value and dividing it by some
number to get each branch in the level, we will use k means clustering. We will get the cluster centers
and radius. When a test data will be provided, we will find out in which cluster it belongs, then use that
branch.
By clustering we can know that which level has more variation in data and which ones have less. We can
keep less variation levels at the top and more variation at the bottom. We want the test data to have
fewer options to get divided at the top and more at the bottom. In this way, we will have less chance to
go astray if we have little difference in test data then training data. As we will pass each level, we will
have less data remain for matching. At the top level, we will have the whole training set, but after
passing each level, we will have less data remaining for consideration for a match. We can become more
specific at the bottom as we will be close to the leaf node.
While search for data in the tree, if we do not find any match for a certain level, we can go back to
parent node and try other path and mark mismatch in one level. In that way, we will still go for match
even we mismatch one level’s data which will occur many times in real situations. Say, we have a tree of
ten levels. After passing five levels, we have found a mismatch, meaning no similar data on sixth level. In
that case, we will try the child nodes of fifth levels to continue without matching for sixth level and see
how many matches we can get by going through the child nodes to the leaf node. Whenever you have
mismatch, we will mark it and use it to find the percentage of match and try other paths to reach leaf
node. If, any one of the sixth level nodes does not go to next level, we will try the child nodes of forth
level. Basically, we will try to reach leaf node by getting max matching and also by trying alternative path
when match is not available. When we will reach leaf node, we will examine in how many levels we have
found match. For example, if we have found match on five out of ten levels, then it is a fifty percentage
match.
How to create children:
8. Method 1: by only flipping the levels is one of the ways to create child from parent. In this method, we
need to flip the levels randomly. Say, level one becomes level three and level three becomes level one.
As we are working with dynamic data, we might find a better result with this as well. We would not
know about it beforehand whether this will work or not. We just have to try it and find out the result.
Method 2: Flip the levels with feature changes, meaning, create the feature list again for each level. This
is same as the previous method, this time, not only flip the levels but also change the data in the level.
For example, level one has three properties and level three has five properties. Once level three
becomes level one, it will still have five properties, but different ones. We have to change the properties
for this level. We can get it by randomly choosing from the properties which are available and not taken
by other levels.
Method 3: in this method we will keep some levels unchanged while change the rest with remaining
features. For example, we have ten levels. So, keep five levels unchanged and change the structure and
properties of the remaining levels. Say, level six had three properties earlier, now randomly do selection
and say we have randomly decided to keep four properties for level six. Now, get these four properties
randomly from the remaining properties.
Method 4: in this method, we will keep the number of levels and number of features in the levels same
as before, but we will just change the feature combination. In this method, keep the level number and
number of features for each level same, but change the properties of each level. For example, level one
has three properties‐ property number 3, 5 and 7. But, in the new structure we will choose randomly to
select the properties for level one. Say, we got 4,6 and 10 in random selection, then these three
numbers will be the properties for level one.
Method 5: Randomly choose what percentage of the tree to remain unchanged. Change the remaining,
meaning, for example, randomly we have found that we want to keep 50% of it. So, keep half of the
levels of the tree and change the remaining of the tree. For the levels which will be restructured,
implement it in the same way when we have created the tree for the first time. Randomly choose how
many properties will be held in each level and which ones. Only the properties which are not used by
the upper part will be used in new levels.
How to test:
We can process the testing in many ways. One way could be to change a portion of the test sample and
keep the remaining same as before and see which ones still can recognize it. For example, change 10%
of the data of the test sample randomly, meaning randomly select which part of the data will get
changed and only change up to 10% of the original data and examine which ones work or still be able to
recognize the data. Select the portions to change randomly so that we have better results as which
portions are getting changed, we will not know that beforehand. To reduce the number of successful
structures, we can test it again with more percent changed, for example 20%. We can keep doing this
until we get good result. This is how we can perform automatic testing. This will be a great thing for
robots if they know how to test sample data. In our sleep, our brain probably does the same thing,
9. simulate events to get better weights for the classifiers and also to reorganize the links which it created
on a rush. I mean, while awake and working, brain does not always select the best possible way to
perform its task. It just finds a workable way in a hurry, but, when we go to sleep, it fixes those links,
improve weights for classifiers and find better weights and links for them. That is why we need sleep so
frequently. It is like, two friends went to shopping and one friend was buying things and giving the begs
to the other person. The other person was just adding bugs in his hand in a rush and finally he got tired.
He said to his friend to wait for a moment and he did reorganization of the bugs so that he could carry
them more efficient way.
It is better to keep size bigger at the top so that we do not slip for slight difference in the training data
and test data. If we keep top layers large that means we will consider many properties at the beginning,
not specific ones. If we go for specific property, we are asking direct question and we have good
possibility to go to wrong path if data is slightly different. It is better to become more specific at the
bottom. We can also find out about variation in the data using clustering and use that information in
creating tree structure. Before creating the tree, at first we can find out which feature has how much
variation using clustering that feature’s data in the training set. Then, we can keep less variation
features at the top and more variation features at the bottom. In some scenarios, we might also want to
keep more variation ones at the top according to the demand of the assigned task.
Using these techniques, k nearest neighbor will be an intelligent one. It will find the best tree structure
which will work best with many types of training data in various fields of artificial intelligence. It will
work for large dataset – more than million of data.