3. This breaking down of image is itself can act as a classifier. We do not need to use HMM or neural net
for them. Each image will have different information on each block. We will use genetic algorithms on it.
Meaning, we will take different combinations and weights, and train the data and test those and take
only the 10% of best performers and produce remaining data of the new population from these 10%.
We will keep doing this until we have a satisfactory result. We will eventually move towards the patterns
in the image.
Method three:
In this method we will discuss single layer with VBCOM. VBCOM means variable length block
combination. It is same as BCOM but the only difference here we have is each block will have different
size. For example, one block will be 16 by 16, another could be 32 by 32, another one 64 by 64 or any
other size. Using different size will be more powerful as it will dynamically find the patterns. We do not
know what is the size of any particular pattern. We do not know what is the area where a specific
pattern lays. It will be very dynamic and frequently changing in natural or real time situation. That is why
VBCOM will produce more accurate result. Because, as we will do GA on VBCOM, it will slowly move
towards the structure of the pattern. In GA, any structure close to the real pattern will produce better
result than others. In every generation in GA, we will move closer to real patterns in any type of dataset
including computer vision and speech recognition. In BCOM, we are restricting the size by assuming all
patterns in the data will occupy same size which will be wrong for most of the real time data. But, in
VBCOM, we have an option to have pattern of any size and GA will help us to find the real patterns. It is
like a blind person finding the best route to reach somewhere by himself. It is a natural way to find logic
or patterns inside a given data set. We will use genetic algorithms on it. Meaning, we will take different
combinations and train the data and test those and take the best 10% and produce remaining data from
these 10%. We will keep doing that until we have a satisfactory result. We will eventually move towards
the patterns in the image.
Method four:
In this method, we will discuss about two layers, first layer for BCOM and second layer for least square
for classifiers. That means genetic algorithm will be used in the first layer and least square method will
be used in the second one. In first layer, BCOM will be used to get the best block combination of the
training data. In the second layer, least square method will be used with one from these classifiers,
HMM, neural net, SVM and K nearest neighbor. When one combination will be chosen in first layer,
second layer will use least square method using that BCOM to get the best result out of some
predefined number of different choices in least square method. In second layer, we will use different
structure for NN or HMM or SVM or K nearest neighbor. If we choose NN, then we will try different
input layer size, hidden layer size and number of hidden layers. First layer will use GA and keep 10% of
best performer and produce children to fill the population and run the process again with new
population. Finally, it will get the best BCOM along with best HMM or neural net or SVM or K nearest
4. neighbor. When we are done with processing the data, then we will have the best BCOM from first layer
and also the best structure for classifiers with best weights from second layer. For any BCOM, we will
have best classifier structure with weights as we will get that after trying many of those and choosing
the best performer.
Method 5:
In this method, we will discuss two layers, BCOM in first layer and neural net in second layer. That
means two layers of genetic algorithm will be used. In first layer, BCOM will be used to get the best
block combination for the training data. In the second layer, GA will be used to get the best neural net
for a particular BCOM. When one combination will be chosen in first layer, second layer will use that
BCOM to get the best neural net for that BCOM. First layer will use GA and keep 10% of best performer
and produce children to fill the population and run the process again with new population. Whereas, on
receiving the BCOM, second layer will use GA to get best NN. It will keep 10% of it and produce the
remaining of the population from this best 10%. And run the process again with new population. So,
each BCOM will get a run of GA in second layer. Finally, we will get the best BCOM along with best
neural net.
Method 6:
In this method we will discuss about two layers, BCOM in the first layer and HMM in the second layer.
That means two layers of genetic algorithm will be used. In first layer, BCOM will be used to get the best
block combination for the training data. In the second layer, GA will be used to get the best HMM for
that BCOM. When one combination will be chosen in first layer, second layer will use that BCOM to get
the best HMM for that BCOM. First layer will use GA and create new population from best performer of
previous generation. Whereas, on receiving the BCOM, second layer will use GA to get best HMM. It will
keep 10% of it and produce the remaining of the population from this 10%. And run the process again.
So, each BCOM will get a run of GA in second layer. Finally, it will get the best BCOM along with best
HMM.
Method 7:
In this method we will discuss about two layers, BCOM in the first layer and SVM in the second layer.
That means two layers of genetic algorithm will be used. In first layer, BCOM will be used to get the best
block combination of the training data. In the second layer, GA will be used to get the best SVM for that
BCOM. When one combination will be chosen in first layer, second layer will use that BCOM to get the
best SVM for that BCOM. First layer will use GA and create new population from best performer of
previous generation. Whereas, on receiving the BCOM, second layer will use GA to get best SVM. It will
keep 10% of it and produce the remaining of the population from this 10%. And run the process again
9. layer will use that structure to get the best VBCOM for that. First layer will use GA and keep 10% of best
performer and produce children to fill the population and run the process again with all new structures
in new population. Whereas, on receiving the structure, second layer will use GA to get best VBCOM. It
will create new population from best performer of previous generation. And run the process again with
new population. So, each structure will get a run of GA in second layer. Finally, it will get the best
VBCOM along with best classifier.
Method 19:
In this method we will discuss about two layers, neural network as classifier in the first layer and VBCOM
in the second layer. That means two layers of genetic algorithm will be used. In first layer, neural
network will be used to get the best NN structure of the training data. And in the second layer, GA will
be used to get the best VBCOM for that particular NN. When one NN structure will be chosen in first
layer, second layer will use that structure to get the best VBCOM for that NN. First layer will use GA and
create new population from best performer of previous generation and run the process again with all
structures in new population. Second layer, on receiving the NN structure, will use GA to get best
VBCOM for that. It will create new population from best performer of previous generation and run the
process again with new population. So, each NN will get a run of GA in second layer. Finally, it will get
the best VBCOM along with best neural net.
Method 20:
In this method, we will discuss about two layers, HMM in the first layer and VBCOM in the second layer.
That means two layers of genetic algorithm will be used. In first layer, HMM will be used to get the best
HMM structure of the training data. In the second layer, GA will be used to get the best VBCOM for that
HMM. When one HMM will be chosen in first layer, second layer will use that HMM to get the best
VBCOM for that HMM. First layer will use GA and create new population from best performer of
previous generation and run the process again with all HMM structures in new population. Second
layer, on receiving the HMM, will use GA to get best VBCOM. It will create new population from best
performer of previous generation. And run the process again with new population. So, each HMM will
get a run of GA in second layer. Finally, it will get the best VBCOM along with best HMM.
Method 21:
In this method we will discuss about two layers, SVM on first layer and VBCOM in the second layer. That
means two layers of genetic algorithm will be used. In first layer, SVM will be used to get the best SVM
structure of the training data. In the second layer, GA will be used to get the best VBCOM for that SVM.
When one SVM will be chosen in first layer, second layer will use that SVM to get the best VBCOM for
that SVM. First layer will use GA and create new population from best performer of previous generation
and run the process again with all structures of SVM in new population. Second layer, on receiving the
10. SVM, will use GA to get best VBCOM. It will create new population from best performer of previous
generation. And run the process again with new population. So, each SVM will get a run of GA in second
layer. Finally, it will get the best VBCOM along with best SVM.
Method 22:
In this method, we will discuss about two layers, K nearest neighbor on first layer and VBCOM in the
second layer. That means two layers of genetic algorithm will be used. In first layer, K nearest neighbor
will be used to get the best K nearest neighbor structure of the training data. In the second layer, GA will
be used to get the best VBCOM for that K nearest neighbor. When one K nearest neighbor will be
chosen in first layer, second layer will use that K nearest neighbor to get the best VBCOM for that K
nearest neighbor. First layer will use GA and create new population from best performer of previous
generation and run the process again with all structures of K nearest neighbor in new population.
Second layer, on receiving the K nearest neighbor, will use GA to get best VBCOM. It will create new
population from best performer of previous generation. And run the process again with new population.
So, each K nearest neighbor will get a run of GA in second layer. Finally, it will get the best VBCOM along
with best K nearest neighbor.
Once an image is given, create inner blocks and outer blocks using VBCOM or BCOM. Whole image will
be divided into many outer blocks and each outer block will have many inner blocks. In tree, we can
keep one outer block in each level. As many images will have the same outer block, we will have some
cluster for these data. Each cluster will be a branch for that level. Each cluster will have cluster center
and radius.
We can also keep couple of outer block in each level. In that case, cluster the data using combined result
on data for all the outer blocks of a level. While searching for a match in the tree, test data will take the
appropriate branches and reach leaf node where it will find the match. For doing two layers GA using
BCOM or VBCOM on top, on receiving each BCOM from first layer, do GA to get the best tree structure
for that BCOM. And for doing two layers GA using BCOM or VBCOM on bottom, different tree structures
need to be created on top layer. When each tree structure will be provided to second layer, it will create
many BCOM for that tree and do GA to find the best BCOM for that tree.
Different techniques for neural net:
We can try new different types of structure for NN. Usually, every node in hidden layer is connected
with every node in input layer. Try this that while training ( fixing weights), input layer is not connected
to every node in hidden layer, but a portion of it and input layer randomly selects its connection in
hidden layer, same for hidden to output. This is good for maze, like where for a given input, we want
different output every time. It could be used in games, or online tests like GRE. Each node in input layer
11. is randomly connected to a portion of hidden layer node. It will create more variation and will help for
pattern recognition. Because different objects will not have the result unless they are same or close.
In another method, every node in input layer is not connected to every node in hidden layer. It has a list
to which it is connected and we will create that randomly. For example, we have 100 node in input layer
and 100 in hidden, then each node in input is connected to 10 nodes in hidden. Same for hidden to
output. Another idea we could implement is that the list of connection varies randomly. Like, first node
in input is connected to 10 nodes in hidden and second node in input connects to 15 nodes in hidden,
like that. This way, we are creating more variation or patterns.
In this method, in input layer for every node, we have a list of values. Each value is connected to
different nodes in hidden layer and also the number of connection for everyone in the list varies. Like
first one is connected to 10 hidden nodes and second one is connected to 15 different hidden nodes
(gets selected randomly). This way, we do not need to use recursive NN to recognize a shape. We give
the image of the shape at once and it is recognized with one big NN. Each row of the image is given to
each input node with a list. We could use 8 by 8 blocks in the image. The structure will be, for example,
10 nodes in input layer, 100 nodes in 1st hidden layer, 100 nodes in 2ed hidden layer, 50 nodes in 3rd
hidden layer, 25 nodes in 4th hidden layer and 2 nodes in output layer. Each hidden layer node has
different number of connection to next layer. For example, a node in 2ed hidden layer is connected to
10 nodes in 1st hidden layer and it is connected to 5 nodes in 3rd hidden layer.
In this method, for different range of values in input list, it connects to different nodes in hidden. Say,
for values between 10 to 20, it connects to 10 different hidden nodes, and for 20 to 30, it connects to 10
different hidden nodes. So, every input node has a list of hidden nodes to connect and different value
range connects to different hidden nodes from connection list. for example, 1st input node has list of 10
items and it connects to (1,5,9,7,53,62,14,16,25,31) hidden nodes. items valued from 10 to 20 connects
to (1,7,9,14) and items valued from 10 to 20 connects to (53,31,5,25).
For back propagation, get random weights and tune it with back propagation and store the weights.
Again, take random weights and fine tune with back propagation, keep doing this for a pre defined
number of times, like 1000, and take the weight set which gives best result. This way, it will not stick
with local minima. The result will not be global maxima, but still better then single try.
We can also go by column instead of having one NN for each row, we can have one NN for each column.
Going by column might give good results on some scenarios. We can also go by diagonals. Scene is
always rectangular. So, start from one diagonal, then decrease the top left and increase the bottom
right, at the end, top left will be bottom left and bottom right will be top right. Another idea will be to
divide the screen into four parts (or any number of parts). And select random number of values from
each part to gather input for NN. Like, we take 5 values from each of the parts to make it 20. NN input
takes 20. We take the same value from each part for all training set and for testing one. We can take
mixed number of values from each part or randomly take 20 values. We have to remember the locations
of values for each NN. These values will not be the same if the image is same or similar. We take values
randomly for the first training image, for the rest, we will take the values of same location in rest of the
training set. To recognize an image from a training set, we can also select the pixel value or block value
14. Instead of creating blocks of same size like 64 by 64, we can also create blocks with varying size. In the
area where we have found more variations, we can create small block like 32 by 32 and in area's with
less variations, we can create large block like 128 by 128. Find out the variations by analyzing the
training data. Start with same size blocks like 64 by 64 and do the clustering. Now analyze the number
of clusters and number of items on it. It could be a large number of clusters or less clusters. If large
number of clusters, then decrease the size, just break the block. And if less cluster, then combine it with
its neighbor (which neighbor is most appropriate). If neighbors are not willing to combine, meaning,
they are in standard form, do not combine it. Or it could be normal number of clusters, but some of the
clusters could have many items on it, break down those blocks. And if there are fewer items, try to
combine it. To use varying size of block is a very good idea. While using clustering, we can use mean and
st. dev of the input values of the block. While clustering one block data, use the mean and st.dev of each
image to create the clusters. Or divide the input numbers by a big number and add the values and then
create clusters. For each spot in the 64 input arrays, include some random weight. Say, for position 1, it
is 100, for position 2, it is 1000, for position 3, it is 50 and so on. Then multiply the weight with value and
add the values and create clusters.
Instead of one NN per block, we can cluster the block data from all training set. In that case, we will have
one NN per cluster. So, we might have multiple NN per block, at least one. If the block data varies too
much, we have to train a NN to handle all these data. While clustering find how many similar set of data
available for the block. Say, training set has 100 images. So, we have 100 blocks for position 1. From
these 100 block data, for each block, find the mean and st.dev value. Strategy is close ones combine to
one cluster, and they will have one NN (set of weights) for them. As their values have similarity, it will be
easy to create a set of weights which can handle or recognize these values.
If we can separate all the patterns in the image, then use one NN for the pattern. One way to go is save
all the input for each NN, when test input will be given, find it in the saved inputs, if match found then
yes, NN has seen it. Let say the input is 1234567890 in an input layer of 10 values. Save all these values.
It will be very expensive. Say training has 50,000 data. So, each image has 256 blocks. For each block we
have 50,000 inputs stored. (We can reduce it by checking unique value). In worst case, we have to check
12.8 million inputs. But, this will for sure recognize an image if it is in the training. Create something with
these inputs to make it less expensive. Keep them sorted and grouped. Store them in a number tree.
Each level of the tree will have 0‐9 branches. We can store many patterns there. This will make little less
expensive. Like, group them by starting digit. NN has limit on how many variety of patterns it can
handle, but this one can take as many patterns as needed with wide variety. This is good as a recognizer.
But, variation will not work. Our goal here is to create something which will recognize the training data
with slight variation in it, because in real data, we will get variation instead of exact data. What could be
done with those 64 sequential numbers? Do mathematical operations, like arithmetic, create a circle
and save the center and radius. Or create bounding box. When more precision is needed, use all the
NNs, like NNs by row, by column, by diagonal and by Block. Because, if the image has noise or
deformation in certain area of the image, then the blocks in that area will not match. So, we need to
match with some variety.
In another method, we can get a single block formation info and train and test NN for the block and
return the success rate to get the best weights for a training set, we can use GA. say for a training set of
1000 images create around 10000 set of weights and take the best weight set which gives best results
15. for training set, use GA to find out best one. Like, use genetic mutation and other concepts of GA of the
weights. After getting best set, we can also use back propagation on it to fine tune it. Keep the best one
of the current population.
To recognize a face with different facial expression, pose, lighting condition and other issues, we need to
create one NN for each face. Each face has a NN of its own. Train the NN with different pose of the face
with different angel and facial expression, lighting condition and others with the same face and train the
NN with all these different kind of images. This is a good idea to recognize a face with different
expressions and poses. Similarly, for different shape recognition, we can create one NN for one shape
with different angel, pose and view, same for other objects. For example, to recognize a bucket, we can
train a NN with different picture of bucket from different angel, view and pose. It will not solve for size,
we have to make different NN for different size. For human organ or real world materials, those do not
get changed in size. While creating NN, input from first node list are combined with some random
element from another input node's list. For example, 1st element of first node in input layer and 5th
element of 12th node of input layer gets connected to 7th hidden node. This is good for exact match
instead of shape like face. This is like little alternative of nearest neighbor.
Another important task is to find out how many clusters are there in the given data and cluster them.
Usually, number of clusters is given before clustering. But, if that number is not given beforehand, then
the strategy is, try to get with some random number for using as number of clusters and then process
clustering on the data. If the data is clustered properly without having clusters with no elements, then
keep increasing the number of clusters until we find any cluster with no element in it. Then the final
number of cluster is total clusters with some elements in it.
Use ransac for initial weights. Try around 1000 or more random initial weights and use the one which
has given correct result more. Then, use back propagation on this weight. In this way, the chance of
getting entangled with local minima is reduced.
Instead of one NN per block, we can cluster the block data from all training set. Then, we can use one
NN per cluster. So, we might have multiple NN per block, at least one. If the block data varies too much,
we need to train a NN to handle all these data. It is better to use one NN per pattern. While clustering,
find how many similar set of data available for the block. Say, training set has 100 images. So, we have
100 blocks for position 1. From these 100 block data, for each block, find the number of gradient points,
their value, angel, direction, mean and st.dev of pixel values. Close data combine to one cluster, and
they will have one NN (set of weights) for them. As their value has similarity, it will be easy to create a
set of weights which can handle or recognize these values. If we can separate all the patterns in the
image, then use one NN for the pattern.
In another method, get a fixed structure and a fixed BCOM or VBCOM info and create basic block nets
and return the success rate. weights got by training back propagation to get the best weights for a
training set, we can use GA. Say for a training set of 1000 images, create around 10000 set of weights
and take the best weight set which gives best results for training set, use GA to find out best one. Like,
use genetic mutation and other concepts of GA for the weights. After getting best set, we can also use
16. back propagation on it to fine tune it. Keep the best one of the current population. We can replace the
whole population with children or replace a portion of it. Different one will be useful for different
applications. Replace half of the population with children from couple of best parents and take new
random weights as another half of the population. That’s how we are not stuck with local minima as
new random weights are coming to population.
We can also take the best function to calculate the input weight and output weight. Usually, people use
a predefined function to do that. They try one function after others to see which one fits best for the
training data. We could use GA to get the best function for the given data. Genetic algorithm could be
used to find an object. Say, each object has an NN. Given an image, we need to find out which object it
is. Instead of looking every object one by one, we can use genetic algorithm to find it. Say, we choose
some objects from all objects NNs in DB and apply genetic algorithm to those. If not found keep taking
objects randomly from population and discarding failed ones. And while checking an NN, we can use
partial checking, like randomly check some of its layers, if significant match found, stays otherwise
discard. This is good for search in big space. Instead of searching the whole DB, it is good idea to search
in a random collection from DB. There is a good chance that the desired data could be found in the
collection. If not, we will collect some more and search again in new collection. In worst case, we will
find it in last collection. But, as we will do random selection for collection, there is a good chance to get
the data in early collections. Say, we have one million data to search for an object to match. Select five
thousand random data from this big data and search in it. If not found, take another five thousand
random data from remaining data and search again.
Different techniques for HMM:
For HMM, at First we need a state sequence. HMM deals with sequence. It is very difficult to deal with
every number. That is why we can use a range of numbers as a separate state. For example, if we have
only gradient points and normal points in the block, we can introduce two states, gradient or normal.
We can consider values from 0 to 124 as normal and 125 to 255 as gradient point. It is like fuzzy logic.
We do not have to work with exact value, we will use approximate values. We can introduce as many
states as we want. And give each state same priority. For example, we can create state for a range of 10.
Then, 1‐10 will be state number 1, 10‐20 will be state 2 and so on. We can also introduce overlap. That
means, we will give some state more preference than others. For example, 0‐30 will be state 1, 30‐40
will be state 2. Here state 1 is getting bigger range, getting preference more than state 2. We can do
that based on the training data we get.
So, block data will be some array of double or integer. First we have to convert it to state sequence. We
have to convert each number to a state. Then, we will get a sequence of state using the block
information. After that, we have to find the state probability. Meaning, what is the probability of going
to state 2 from state 1. In the training data, find out how many times it goes from state 1 to state 2. That
will be the state probability. We will have to use location information as well. That means, what is the
probability of going to state 2 from state 1 while current position number is 1. It is a sequence, so we
have to consider the position, emission probability. We will also need the probability of any particular
number in specific position of the sequence. For example, what is the probability of getting 100 in
position two? Then, when we will give the test sequence, it can tell us if it has seen this pattern or not. If
19. proper weight set for that test data. If the test data stays within the range that means it is similar with
training data, otherwise not.
It trains recursively with all rows sequentially. We have one SVM for each row. For each row data train
some weights using positive and negative training data. Use different type of blocks, like by row, by
column, by diagonal, by small parts (BCOM, VBCOM). The weights will give positive result on positive
training data and negative result on negative training data. Training data must have to have both
positive examples and negative examples. For example, to recognize hand, training data should have
images of hand of different people with different size, color, male and female hands. Similarly, we also
need some images which are not hand to train the weights. Weights will help the result to become
positive for positive ones and negative for negative ones. We can use 8 by 8 block. So create 8 by 8 block
in the image. After that, each row is trained with different weights.
We can also do this: get some weights, like one for each block, for negative one and for positive one.
The weights for positive one should give a value with some range. Like it will be between 5 to 15, same
for negative. So, when test image will be given, try the positive weights and see the value. Then try the
negative weights and examine the value. If the value fall in positive range, then positive image and vice
versa. Use clustering if it is hard to find single weight set for a block. From training images, create as
many clusters as possible and create weights for each cluster. When testing, try all cluster weights for
positive to recognize it as positive and do same for negative.
Given two sets of images for binary classification, find the points where we get two different results or
distinguish result using radial basis function or sigmiodal function (kernel function). It means, at this
point, training images give two different results. It means, at this point, two image sets differs. Binary
classifier can recognize objects by using these points. These points are called support vector. Find all
support vectors in the train images, this is for binary classifier, but for class classifier, like recognize
character, find the points where all the values of the training images stay within a limit. It will be difficult
to go to every pixel, create block and also weights. Multiply block data with weights, then send it to
kernal function. Find the blocks which are support vector. Use random block with genetic algorithm
to create children, get the best 10 % of last population, create 90% new children. Randomly choose a
combination from the best 10%, and then randomly select how much of the combination to keep
unchanged and how much to change. Then randomly get combination for the changing part. Say, we got
10 combinations as best ones and we have to create 90. Each time, select a data randomly from 10 data
and randomly select how much to keep. Say, it came out 50%. So keep 50% of the selected combination,
and change the other 50%. Say, a combination has 256 blocks creation info. So, keep the first 128 blocks
creation info unchanged. And for the remaining 128, randomly get the block creation info from the
remaining inner blocks available to be used. Some of the blocks are already used for the first 128 blocks
creation info, so use the remaining ones.
Different techniques for KNN:
We can use one outer block for each level of the tree and use clustering to get the branches for a
particular level. Add all the values of that block and process clustering using this added total value.
Other option is, instead of using one feature per level of tree, we can use couple of features at the top
20. levels, and as the levels increases, number of features per level decreases. Let say, we have 128 feature
vector. So, use 10 features for first level, 9 for 2ed level, 8 for 3rd level. Now, it gets number of partition
from configuration file or from clustering, and creates that number of partition for that level. Once we
get 128 feature vector, find out which feature has more variety in the training data set. More variety
features could be in the top or bottom and have many partitions and less variety data will get less
partition. In some cases, placing them at the top will be appropriate and in some cases at the bottom.
Another idea is having multiple features for one level. Let say 5 features for first level, 4 for 2ed and 3
for 3rd and so on. That’s how we will not go astray when the data misses a feature. We could have same
data just changes in one feature. We will not get that if that feature is on top. We will be directed to
wrong branches of tree. Combining couple of features at top will reduce that type of errors. This type of
tree could be of two types. One is normal, we will have multiple features on top level and less on down
levels but the number partitions per level stays same. Another could be number of partitions will vary.
At the top, less partitions and more at the bottom. How do we know which features should be together
in a level? It could be sequential, like first 5 features in 1st level, next 4 in 2ed level and so on. Or it could
be less varied features together at the top. Or it could be to find which features have same items in their
partitions. We could partition the data with uniform partitions. After that, see which item is on same
partition on different features. Combine the features in a level which has more similar items in the
partitions. For example, item A is in 1st partition of feature number 1 and 2 and item B is in 1st partition
in 1 and 2 feature. Let say, out of 10 items, 6 items stays same as A and B. so, it is more probable to
combine feature 1 and 2 at top level. Put the largest combination of features at the top and decrease
downward.
Think about tree structure. In real tree, number of branches is not fixed, it gets changed with
environment, data in our case. Every level has different number of branches. This is good if we have the
data to populate the tree beforehand. What about online data, meaning we do not know which data will
come, there tree gets populated as new data comes. That is a different scenario. But most of the AI
application where people do training first, there they have the data beforehand always. Image
processing, speech recognition, natural language processing and some others also do the same way. Or,
we could use more branches at the top and less at the bottom.
Let’s think about clustering for KNN instead of tree search. Let say, we have given a training data with
128 feature vector or 128 outer blocks in each image. Simple way to classify them will be to create 128
clusters, one cluster for each feature. And find out which items are in the same clusters as the test one.
Now, find the item on the common test. Like, 100 items are exactly same cluster as the test one. Then,
find the square distance or Euclidean distance of the test item with each of these 100 item (sum the
distance of each feature) and choose the closest one. Let say, 100 items are in 50 same clusters and 100
items are in 40 clusters, go with 50 same cluster items. Use k means clustering and save the centers for
each clusters. When test item is given, see in each feature cluster, where the test item belongs, find the
closest among the centers of the cluster level. Because, clustering will give proper partition for each
level. Keep few items in a cluster to reduce cluster size.
We can think each block of the image as a feature. For example, we have been given 1024 by 1024 sized
images. We need to divide the image by 64 by 64 blocks. Gather the same block from all training images
and create as many clusters as possible for each one of the blocks. Say, training set has 1000000 face
21. images. Create 256 block of each image and gather same block from all images and create clusters for
that block’s data. So after training, we will have 256 set of clusters. When testing an image, create 256
blocks of the test image, and see which block belongs to which cluster. Find the items which have most
similar cluster list for each level as test image. Then use the shortest square distance or Euclidean
distance method (sum of all features) to get the match. The one which gives shortest distance will be
the match. If not enough cluster match found, in that case result will be no match found.
Another method could be to find the variation of given data on a particular feature. Find out how close
the data is. Start with small partitions and see how it performs on the partitions. Are all the partitions
have close number of items. If does increase the partition number and repartition it and check again.
Keep doing this until we find weak partitions like partisan with very less items comparing to other
partitions.
Use clustering for each feature and use KD tree. Clustering will tell us how many partitions we have per
level and the partition boundary. We need to use couple of features for top levels. Say, 1st level has 5
features. Each feature has 5 clusters. So, when a data comes, use 5 features to cluster the data for 1st
level. In that way, we will not miss it if one feature does not match. The cluster is produced from
combined effort of 5 features. Do this for all combined feature levels. Use less variation clusters for top.
At First, cluster by each feature and find which features have fewer clusters. Sort them by cluster
numbers. Use less clustered features in the top level. Also try to re‐cluster them with multiple features
together.
We can use genetic algorithm for this. We do not know which features to combine together and also
how many to combine for which level. Create many KD trees with different random combination and
choose the best one. Genetic algorithm or least square method, both will work in this kind of situation.
Try some 1000 combinations and choose the best performer. But, GA will be more powerful. Start with
1000 or 5000 random combinations same as least square. Separate the training set by 2 parts. One for
training and other for testing the intermediate steps. The testing ones should be tagged.
Randomly select how many features will be used in which level and which features for which level. Like,
for 1st level randomly select between 1 to 128 (128 is total features here in this example) for number of
feature for 1st level and say 5 came out from random number generator. Now, choose 5 features
randomly from 128 features. Selected features are out from getting selected again and select number of
features from remaining feature list. Keep an unassigned feature list. Use clustering for each feature and
use KD tree. Clustering will tell us how many partitions are available per level and the partition
boundary. Use couple of features for top levels. Say, 1st level has 5 features. Each feature has 5 clusters.
So, when a data comes, use 5 features to cluster the data for 1st level. In that way, we will not miss it if
one feature does not match. The cluster is from 5 features. Do this for all combined feature levels. Use
less variation clusters at the top. At First, cluster by each feature and find which features have fewer
clusters. Sort them by cluster numbers. Use less clustered features in the top level. Try to re‐cluster
them with multiple features together if needed.
How to create population for BCOM:
22. Inner blocks are simple and small blocks of an image, for example, 8 by 8 block. And outer blocks are
created using inner blocks. We could use it differently in different kind of situations. We can make it
sequential data in an image, each row and column will come sequentially from top to bottom. Or we
could take each row or column or diagonal line as an inner block and combine those to get outer blocks.
Or we could get random points for inner blocks and random combination of inner blocks for outer
blocks. Outer blocks could be of same size or different size. While collecting data for inner block, we
could only use pixel data or only gradient data or mixture of those. We can also use mean and standard
deviation of pixel values of inner block points. In shape recognition or where we use silhouette, we will
only need outer contour of an object, so in that case we are interested only on few gradient points with
some range of values. In that case, we can only use those gradient points. Instead of giving all the
gradient points into a classifier, we could divide those using blocks and take those to classifiers. In some
case, we are interested more on exact data of the image, for example, face, finger print, palm
recognition and others. In those cases we can use inner block and outer blocks as BCOM or VBCOM.
Randomly choose which points will be in which block. Create many block combinations using this
method. We will test which combination goes well with the training data. And we will do GA, so
eventually we will get a very strong combination which will work well for recognition.
We will use inner block and outer block in BCOM. Inner block is a simple small block, for example, 8 by 8
blocks in an image and outer block consists of inner blocks. If we have 64 points in an outer block, then
each of them represent one inner block. To represent inner block, we can use various information
available inside that block. For example, mean value of pixel value of points inside the block, standard
deviation of those points, number of gradient points, values of those gradient points and some other
information. Using all these information, we can derive single number mathematically which will
represent that block. In some scenarios, we will only need gradient information, in some case, we will
need pixel information if we are trying to match exact color values and in some cases, we will need both.
Whichever is the case, we can use appropriate info to represent inner block. We can even get the inner
block info from adjacent rows and columns of that block or we can select the points randomly while
using BCOM or VBCOM. It is totally depend on the situation. In some cases, direct block info will be
helpful and in some cases, random data for inner block will be helpful. We should keep all these options
open while programming, so that we can choose any option by just changing values in configuration
files.
If the training data has too many variations, then clustering the data is necessary to produce better
result. This means that we have to find where data has separate clusters in it and divide the data
accordingly into many clusters and consider those separately. So, when a test image is given, we will find
out in which cluster it belongs and use that particular classifier or separate weights for that cluster. This
is for huge training set where data will vary a lot. That is why we need to keep separate clusters to see in
which cluster test sample belongs. We can use k means clustering to cluster the large amount of training
data. It simply finds the proper cluster center of the training data and brings the adjacent data close to
the nearest center.
23. It takes one BCOM’s data or one VBCOM’s data. It takes one block’s data from each image in the
training set and gathers them and finds out how much these data varies. Training set with more than a
million should have some good clusters in the data. It depends on the nature of the training data set. We
cannot tell that beforehand, we will have to find that out dynamically. If we know that our training data
will have large amount of data or if we know that the data will vary a lot, then cluster method is
appropriate for that kind of situation. Also if non cluster version does not produce desired result, then
cluster version should be used. On non cluster version, there will be one classifier for each BCOM, but in
cluster version, there will be many classifiers for each BCOM. At first we have to find out a specific
BCOM of test data belongs to which cluster data of same BCOM’s training data, and then use the proper
classifier for that cluster. If we have many data in training set and we find it difficult to fit one classifier
for each BCOM, then we should use cluster method to separate training data into many clusters and
train those separately. We will have to choose for each BCOM which cluster to use. In this way, even we
have millions of data in the training set, still we will have less difficulties to find patterns. If there is not
much variation in the training data, then we can use non cluster version.
Use mean and standard deviation for inner block. Usually, we will provide one weight for each inner
block. Use that with standard deviation and mean of the block. If we have cluster in outer block, then
one weight for each inner block for each cluster. If we use only gradient points or feature points in
outer block, then one weight for each point. For large training set, we will have cluster in outer block
info. Use mean, st.dev, gradient point info for inner block and weight. In cluster mode, each cluster will
have its own weights.
How to create children for BCOM:
Keep a portion unchanged and change the remaining. Randomly choose what percent to keep
unchanged and then change the remaining. Only the available points, which are not used in unchanged
part, will be considered for changing blocks.
How to create population for VBCOM:
VBCOM also has inner and outer blocks. But, here outer block’s size is different for different outer block.
Randomly choose which points will be in which block. Create many block combinations using this
method. We will test which combination goes well with the training data. And we will do GA, so
eventually we will get a very strong combination which will work well for recognition.
How to create children for VBCOM:
24. Keep a portion unchanged and change the remaining. Randomly choose what percent to keep
unchanged and then change the remaining. Only the available points will be considered for changing
blocks.
How to create population for SVM:
Create weights for each block. One weight for each member in the block. So, we will need weight set for
one particular outer block. Get the range of values after multiplying each value of outer block with its
weigh and add those to get total. Now use this weight set to all the same blocks from all images in
training set and get the highest value and lowest value and subtract those to get range for that weight
set. We can try many weight sets and select the one which gives better result. Now examine how many
data fit in the range? If it becomes very hard to fit block data with weights into small range, then we
should cluster the block data and create as many clusters as possible. And use separate weights for each
cluster. We can use different mathematical functions to get the single value for inner block data.
How to create children for SVM:
Keep some part of it unchanged and make changes on the remaining. Get new weights for changing
parts. We can also change the block combination of the changing part. If we are using clustering, we can
process clustering again for the newly created parts.
How to create population for K nearest neighbor:
We need to create random structure of KD tree with different combination of levels and features. If we
are using BCOM or VBCOM, we can provide each BCOM or VBCOM (one outer block) as one level of
tree. Or we can combine couple of outer blocks for each level. This is for when we want to use KNN as
classifier. In case of just finding match for array of data, we can use the feature vector of each data and
create block using a portion of feature vector for each level. We can use each feature as a level or
combined features as one level. We can also introduce more levels on top and less on bottom. Features
with less variation at the top and features with more variation at the bottom. We can find out about
variation using clustering. Clustering will tell us how much variation we have for a particular feature or
outer block or combination of features.
How to create children for K nearest neighbor:
Method 1: by only flipping the levels is one of the ways to create children from parent. In this method,
we need to flip the levels randomly. Say, level one becomes level three and level three becomes level
one. As we are working with dynamic data, we might find a better result with this as well. We would not
know about it beforehand whether this will work or not. We just have to try it and find out the result.
Method 2: Flip the levels with feature changes, meaning, create the feature list again for each level. This
is same as the previous method, this time, not only flip the levels but also change the data in the level.
For example, level one has three properties and level three has five properties. Once level three
25. becomes level one, it will still have five properties, but different ones. We have to change the properties
for this level. We can get it by randomly choosing from the properties which are available and not taken
by other levels.
Method 3: in this method we will keep some levels unchanged while change the rest with remaining
features. For example, we have ten levels. So, keep five levels unchanged and change the structure and
properties of the remaining levels. Say, level six had three properties, now randomly do selection and
say we have randomly decided to keep four properties for level six. Now, get these four properties
randomly from the remaining properties.
Method 4: in this method, we will keep the number of levels and number of features in the levels same
as before, but we will just change the feature combination. In this method, keep the level number and
number of features for each level same, but change the properties of each level. For example, level one
has three properties‐ property number 3, 5 and 7. But, in the new structure we will choose randomly to
select the properties for level one. Say, we got 4,6,10 in random selection, then these three numbers
will be the properties for level one.
Method 5: Randomly choose what percentage of the tree to remain unchanged. Change the remaining,
meaning, for example, randomly we have found that we want to keep 50% of it. So, keep half of the
levels of the tree and change the remaining of the tree. For the levels which will be restructured,
implement it in the same way when we have created the tree for the first time. Randomly choose how
many properties will be held in each level and which ones. Only the properties which are not used by
the upper part will be used in new levels.
How to create population for neural net:
Create NN with many layers with different nodes in each layer. Randomly create different layers with
different node numbers. We will test and find out which ones work best and also do GA so that we can
eventually get to the best one. In NN, input layer will be same as block size and output layer will have
two nodes for yes and no. we can create as many layers in between. Create random weights for each
node at the beginning so that we can start the process. We will create some models and test and find
out which ones work better. We can process the training by Back propagation method and weight based
method. We will create some random weights and see which one works better.
Every node in left layer is connected with every node in right side layer. Every node in right side has a
connection with every node in the left. And every connection has a weight. So, every node in the right
has many input connection from left nodes and it has an output connection to every node in the far
right nodes. So, it calculates all the input connection with its weight and decides an output value for
that. It also has many weights for nodes in the right. So, each node has an input weights and output
weights.