The document discusses using binary classifiers to improve speech recognition of unknown persons. It proposes using a series of binary classifiers that sequentially reduce variations in speech data by distinguishing factors like gender, age, location, etc. This prepares the data for a main classifier by narrowing the variations, making recognition of words and conversion to text easier compared to directly training a single classifier on all variations. Binary classifiers are also suggested for other pattern recognition problems involving large datasets.
Introduction to natural language processing, history and originShubhankar Mohan
This document provides an introduction to natural language processing, including its history, goals, challenges, and applications. It discusses how NLP aims to help machines process human language like translation, summarization, and question answering. While language is complex, NLP uses techniques from linguistics, machine learning, and computer science to develop tools that analyze, understand, and generate human language.
This document proposes a method for storing visual, audio, and memory data in a nonlinear network of sections and stations. Data would be inserted randomly into different sized sections. Stations would be connected to sections and have individual rankings. When searching for data, stations with higher rankings would be queried first. If a station finds data, its ranking increases, and if not, its ranking decreases. This network aims to allow frequently searched data to be found quickly through higher ranked connections.
The document discusses several methods for efficiently using classifiers in computer vision tasks that involve large datasets. It proposes using multiple classifiers together by dividing the data and assigning each classifier a subset, such as individual columns, rows, blocks, or variable blocks of pixels from images. This allows large datasets to be broken down into smaller pieces that fit individual classifiers. The methods also involve using genetic algorithms to help determine optimal combinations and structures of classifiers.
This document is a curriculum vitae for Akinfe Akinwale Adekunle. It includes personal details like name, date of birth, and contact information. It outlines his educational history including degrees in medicine, epidemiology, computer science, and business administration. It lists his work experience in hospitals and medical organizations in Nigeria, The Gambia, and the UK. It also provides details of his skills, qualifications, memberships, and interests which include ophthalmology, public health, epidemiology, and medical informatics.
This document proposes a new "sine wave theory of pixels" that models pixels as undulating sine waves rather than fixed rectangles. The theory claims this better simulates how light travels as waves to the eye and would provide more useful visual information than the current pixel format. It argues this could improve computer vision tasks by bringing performance closer to that of the human eye. The document requests experts review this theory and considers it a significant discovery that could revolutionize the field of computer vision.
- The document proposes a new type of tree structure to recognize spoken words from sound files containing multiple speakers, without needing to parse the sound data.
- The tree would insert sound file data as numbers and examine each number/bundle of numbers to find words. Each level of the tree would have up to 20 branches corresponding to numbers 0-19.
- Words from a training set would be inserted into the tree by their number sequences. The tree could then be used to find words within long number strings, even if they start in any position.
This document discusses pixel-based designs using sine wave theory. It shows more than a single row of pixel samples that undulate in random directions with varying wave lengths and amplitudes. The document includes figures of 79 pixel design samples and information about the author.
This document discusses pixel-based designs using sine wave theory. It shows more than a single row of pixel samples that undulate in random directions with varying wave lengths and amplitudes. The document includes figures of 79 pixel design samples and information about the author.
Introduction to natural language processing, history and originShubhankar Mohan
This document provides an introduction to natural language processing, including its history, goals, challenges, and applications. It discusses how NLP aims to help machines process human language like translation, summarization, and question answering. While language is complex, NLP uses techniques from linguistics, machine learning, and computer science to develop tools that analyze, understand, and generate human language.
This document proposes a method for storing visual, audio, and memory data in a nonlinear network of sections and stations. Data would be inserted randomly into different sized sections. Stations would be connected to sections and have individual rankings. When searching for data, stations with higher rankings would be queried first. If a station finds data, its ranking increases, and if not, its ranking decreases. This network aims to allow frequently searched data to be found quickly through higher ranked connections.
The document discusses several methods for efficiently using classifiers in computer vision tasks that involve large datasets. It proposes using multiple classifiers together by dividing the data and assigning each classifier a subset, such as individual columns, rows, blocks, or variable blocks of pixels from images. This allows large datasets to be broken down into smaller pieces that fit individual classifiers. The methods also involve using genetic algorithms to help determine optimal combinations and structures of classifiers.
This document is a curriculum vitae for Akinfe Akinwale Adekunle. It includes personal details like name, date of birth, and contact information. It outlines his educational history including degrees in medicine, epidemiology, computer science, and business administration. It lists his work experience in hospitals and medical organizations in Nigeria, The Gambia, and the UK. It also provides details of his skills, qualifications, memberships, and interests which include ophthalmology, public health, epidemiology, and medical informatics.
This document proposes a new "sine wave theory of pixels" that models pixels as undulating sine waves rather than fixed rectangles. The theory claims this better simulates how light travels as waves to the eye and would provide more useful visual information than the current pixel format. It argues this could improve computer vision tasks by bringing performance closer to that of the human eye. The document requests experts review this theory and considers it a significant discovery that could revolutionize the field of computer vision.
- The document proposes a new type of tree structure to recognize spoken words from sound files containing multiple speakers, without needing to parse the sound data.
- The tree would insert sound file data as numbers and examine each number/bundle of numbers to find words. Each level of the tree would have up to 20 branches corresponding to numbers 0-19.
- Words from a training set would be inserted into the tree by their number sequences. The tree could then be used to find words within long number strings, even if they start in any position.
This document discusses pixel-based designs using sine wave theory. It shows more than a single row of pixel samples that undulate in random directions with varying wave lengths and amplitudes. The document includes figures of 79 pixel design samples and information about the author.
This document discusses pixel-based designs using sine wave theory. It shows more than a single row of pixel samples that undulate in random directions with varying wave lengths and amplitudes. The document includes figures of 79 pixel design samples and information about the author.
1. The document discusses various methods for implementing K-nearest neighbors algorithm for pattern matching large datasets efficiently.
2. Method one involves dividing each property into equal sections based on the property's dynamic range in the dataset, and assigning data to sections. Test data can then be quickly matched to training data based on matching section numbers rather than exact values.
3. Method two improves on method one by creating a tree structure using the section assignments, allowing even faster matching by traversing the tree to find matching training data.
This document proposes a colony-based data storage system with multiple layers containing ranked colonies, towns, and groups. Data is stored in simple containers within groups. As data is searched for more frequently, its ranking increases and it can move to higher layers and towns. This models how frequently accessed data rises in the brain's storage layers. Data not in use remains accessible in lower layers.
The document proposes a method called MB Predefined K nearest neighbor to improve K nearest neighbor classification when some property values may be incorrect. It divides properties randomly into small packets to create multiple trees, assigns unique IDs to leaf nodes, and builds a master tree from the leaf nodes. During training, it analyzes data routes through the master tree to create match lists for each leaf node. When classifying a new data point, it uses the data point's leaf node IDs to quickly retrieve the best match list and perform classification, improving accuracy over standard KNN even if some property values differ from the training data.
The document proposes a method for storing data without containers by assigning numeric values to letters and using the numeric sequence of words to calculate their memory location, removing the need for indexing structures. It suggests creating a new operating system that can directly access any memory location as needed to implement this approach. The method aims to allow words to provide clues to find their stored data by deriving a memory location directly from their sequence of letters.
The document proposes an alternative technique to pixels for extracting objects from images. The author suggests using curved lines of the same or varying sizes instead of the standard rectangular pixels. This would make it easier to cluster data and obtain natural objects from images, as the screen data is more curvilinear than rectangular in real life. More research is needed to determine the optimal shape, size, and characteristics of the curved lines for different applications. The author believes this non-linear approach would be more effective than the current pixel method that restricts image analysis to a linear framework.
The document discusses a new "Sine Wave theory of Pixel" proposed by Mutawaqqil Billah. The theory proposes designing pixels as sine waves rather than rectangles to better match how light comes into the eye as waves. This new pixel design could eliminate 90-95% of unnecessary processing points and greatly improve visual quality for visual devices, computer vision, and robot vision. The document contains comments from discussions on this theory, with Billah providing additional details on the potential benefits and applications of the sine wave pixel design.
The document proposes an alternative to using pixels for image analysis. The author suggests using curved lines of the same or varying sizes instead of uniform pixels to more easily identify objects and extract logical information from images. Curved lines could capture adjacent area information without calculating gradients over the whole image and help cluster data. Pixels restrict analysis to a linear representation whereas the real world is non-linear, so a non-linear technique using curved lines may perform better by mimicking natural shapes. Further research is needed to determine optimal line curvatures, lengths, and whether lines should be uniformly or variably shaped for different applications.
This document proposes a new "sine wave theory of pixels" that models pixels as undulating sine waves rather than fixed rectangles. The theory claims this better simulates how light travels as waves to the eye and would provide more useful visual information than the current pixel format. It argues this could improve computer vision tasks by bringing performance closer to that of the human eye. The document requests experts review this theory and considers it a significant discovery that could revolutionize the field of computer vision.
1. The document discusses various methods for implementing K-nearest neighbors algorithm for pattern matching large datasets efficiently.
2. Method one involves dividing each property into equal sections based on the property's dynamic range in the dataset, and assigning data to sections. Test data can then be quickly matched to training data based on matching section numbers rather than exact values.
3. Method two improves on method one by creating a tree structure using the section assignments, allowing even faster matching by traversing the tree to find matching training data.
This document proposes a colony-based data storage system with multiple layers containing ranked colonies, towns, and groups. Data is stored in simple containers within groups. As data is searched for more frequently, its ranking increases and it can move to higher layers and towns. This models how frequently accessed data rises in the brain's storage layers. Data not in use remains accessible in lower layers.
The document proposes a method called MB Predefined K nearest neighbor to improve K nearest neighbor classification when some property values may be incorrect. It divides properties randomly into small packets to create multiple trees, assigns unique IDs to leaf nodes, and builds a master tree from the leaf nodes. During training, it analyzes data routes through the master tree to create match lists for each leaf node. When classifying a new data point, it uses the data point's leaf node IDs to quickly retrieve the best match list and perform classification, improving accuracy over standard KNN even if some property values differ from the training data.
The document proposes a method for storing data without containers by assigning numeric values to letters and using the numeric sequence of words to calculate their memory location, removing the need for indexing structures. It suggests creating a new operating system that can directly access any memory location as needed to implement this approach. The method aims to allow words to provide clues to find their stored data by deriving a memory location directly from their sequence of letters.
The document proposes an alternative technique to pixels for extracting objects from images. The author suggests using curved lines of the same or varying sizes instead of the standard rectangular pixels. This would make it easier to cluster data and obtain natural objects from images, as the screen data is more curvilinear than rectangular in real life. More research is needed to determine the optimal shape, size, and characteristics of the curved lines for different applications. The author believes this non-linear approach would be more effective than the current pixel method that restricts image analysis to a linear framework.
The document discusses a new "Sine Wave theory of Pixel" proposed by Mutawaqqil Billah. The theory proposes designing pixels as sine waves rather than rectangles to better match how light comes into the eye as waves. This new pixel design could eliminate 90-95% of unnecessary processing points and greatly improve visual quality for visual devices, computer vision, and robot vision. The document contains comments from discussions on this theory, with Billah providing additional details on the potential benefits and applications of the sine wave pixel design.
The document proposes an alternative to using pixels for image analysis. The author suggests using curved lines of the same or varying sizes instead of uniform pixels to more easily identify objects and extract logical information from images. Curved lines could capture adjacent area information without calculating gradients over the whole image and help cluster data. Pixels restrict analysis to a linear representation whereas the real world is non-linear, so a non-linear technique using curved lines may perform better by mimicking natural shapes. Further research is needed to determine optimal line curvatures, lengths, and whether lines should be uniformly or variably shaped for different applications.
This document proposes a new "sine wave theory of pixels" that models pixels as undulating sine waves rather than fixed rectangles. The theory claims this better simulates how light travels as waves to the eye and would provide more useful visual information than the current pixel format. It argues this could improve computer vision tasks by bringing performance closer to that of the human eye. The document requests experts review this theory and considers it a significant discovery that could revolutionize the field of computer vision.
2. Speech recognition is a challenging task. It is not difficult if we can train a computer with a person’s
voice data and then recognize it. But, that is not a normal situation, in normal situation we talk to many
people whom we do not know, but still can understand what they are saying. We need to make
computer or robot up to this level that it can understand anybody’s voice. Sometime we need it to
understand voice from anybody, male, female, young and old. For example, if we want to convert all the
voice data of a mobile company to text so that it takes less space to store all voice data, then, we will
need to understand voice from any random person. Or, if we want to use a voice enabled computer in a
public place, for example, subway or cybercafé, where we will have different users all the time and the
program need to understand them.
To recognize speech, we need to train a classifier which will understand the speech. We need to find the
patterns from speech data and train the classifier with these data. For example, if we want to train the
computer with single person’s voice, at first we need to collect his speech data. We need to know how
does he speak different letters, tone in different situations and other information. People’s voice gets
changed in different situations. People’s voice usually does not remain same all the time. Many people’s
voice gets changed when they get sick or when they wake up early in the morning or when they are
tired. We need to collect all these data and train the computer with it so that it can understand his voice
whenever he speaks. We need to break down the sound file data into small patterns and train the
classifier with these small patterns.
HMM, neural network, SVM, K nearest neighbor and other classifiers can work with small data
variations. That means, they will work only if training data are not vastly diversified. In case of speech
recognition, data variation is largely diversified than other scenarios. For example, if we want to
recognize a small word like “hello world”, then people of different ages will pronounce it differently.
Man and woman, people of different age group and people of different location will pronounce it
differently because of their local accent and other differences. The background noise is also a big factor.
People can talk from inside a building or from outside of a building where there will be many noises. In
these types of various situations, background noise will be different. So, if we want to train a classifier to
recognize this word with so many variations, classifier will be overloaded and could not work properly.
So, we can use some smart trick to resolve this issue.
To load a classifier, which will be trained with people of all ages, gender, environment and location, will
be a difficult task. That might not work properly as well. We can resolve this issue very easily. We have
to use binary classifiers before the main classifier to remove these huge variations. For example, we
want to record any phone conversation to text. Mobile companies have huge amount of phone voice
data for one day and it is very difficult to store those data. They might have to delete some of the data
after some months. In that case, they might want to store the voice data to text file which will take small
space in computer hard disk and they will be able to store data for many years with simple hard disk
with normal capacity.
3. To accomplish that, we need to use some binary classifiers before the main classifier. Our aim is to
reduce load from main classifier so that it becomes easy for it to recognize people’s voice. Using binary
classifiers before using main classifier, our aim is to reduce the variation of data significantly. Each of the
binary classifier will divide the variation by two. And eventually it will lead us to a position where the job
of main classifier will be very easy. For example, first question we might ask is it a human voice or
machine noise? This will be answered by a binary classifier which will be trained with human voice and
machine noise. As its function is to only answer a question is it a human voice or not, we can train it
nicely with only two types of data, human voice and machine noise. If we know that it is human voice,
then, next question will be is it a voice of man or woman? A binary classifier will be trained with both
men and women voices and it can recognize which type of voice it is. If the answer is male, then, we can
ask which age group this data belongs, Is it a young person’s voice or old one? And if answer is female,
we will ask the same question for next step. After that we can ask from which location it belongs, north
part of the country or south. After passing through all these classifiers, we will come to a very small
variation of data which will be left for main classifier. For example, we will know that this data belongs
to woman from 0‐20 age group from north part of the country. We can easily train a classifier with this
small data variation and recognize the words and convert those to text.
Binary classifiers will be trained with two types of training data. For example, if we want to distinguish
between male and female image, we have to train the classifier with male and female images. We have
to find the common female feature points which are not available in male and vice versa. We have to
find the gradient points where difference is very high. Basically, we have to find the difference between
these two types. Classifier only need to answer one question, is it male or female? Using proper training
set, we can make it very strong so that the chances of making mistakes become very small.
This is the flow of data:
Speech data ‐ noise or human voice male or female young or old main classifier for that
category
5. Say, we want to find a face from 100 million peoples face database. We can use many binary classifiers
before using the main classifiers so that our job to search for the data become very easy and less
calculation is required to accomplish that.
Every type will have its own classifier trained with very specific group of people with little data variation.
For example, young female from north part of the country will have its own dataset and separate
classifier. Similarly, elderly male from south will have their own dataset and classifier.
We can use binary classifiers in this way not only for speech recognition, but also in other sections of
artificial intelligence. Whenever we see our classifiers are getting loaded with too many data and it is
not working properly or we need to do many calculations to get it working, binary classifiers will be
worth of try. If we place these classifiers before main classifier and separate the training data to be more
specific, our main task will be easier than before. Pattern recognition will be easier in this way. Using the
binary classifiers, we will lead the data to the appropriate classifiers with less variation.
In binary classifier used in computer vision, we will have two types of images, positive and negative
images. We need to train the positive images and negative images. Binary classifiers could be used to
recognize male and female, Chinese and American, black and white people, left and right hand, left and
right leg, all left and right body parts, male voice and female voice and many other situations as well. Let
say, we are doing, male vs. female. Collect around 1000 images of male face and 1000 images of female
faces. Convert each image to 8 by 8 blocks. Take one male image and compare with all female images
and save the locations where we can see large difference of values between two images. Do this for all
male images and find out the points which are common in all difference lists or at least close. Use the
close ones when we do not have enough on common points. Give a range for each point to be male or
female. Say, at point (112,204) value 0 ‐ 100 means male and 200 ‐ 256 means female. When given any
image to recognize, convert it to 8 by 8 block. Find out its value of the common points and decide if it is
male or female or could not distinguish.
We can also associate some random weight to each data and get the L (function summation value) and
examine which set of weights give very close L for all training examples and select those weights. Use
Radial basis function as kernal if the data is non linear or not linearly separable. We can also use this
idea in the above case, just do not include the points where both has same value. The above one is
strictly binary classifier, but the later can be used to recognize shape. Like text recognition. Create one
class for each letter (upper case, lower case, digits). Train the same object files to one class and get
some weights. So, each class will have its own weight list. Given any new image, compare it to all
classes, and see to which class it is close.
Another idea is to get a, b, c as ax + by > c for positive and ax + by < c for negative to recognize male or
female voice. We can use SVM first to find out human voice or other sound. Next step, we can
differentiate from it as male or female if it is human voice. It is like recursively recognition. In my
6. opinion, many of human pattern recognition is done this way. From top to down, layered, done
recursively. Unfold one mystery at a time. Binary classifiers are easy and accurate.
This method will work when we need to recognize unknown person’s voice. There are numerous
sections of artificial intelligence where it could be used. Naturally we need computer or robot to
understand what people are saying if we want to use those in public places. Placing binary classifiers in
front of main classifiers will make pattern recognition very easy.