Senior Project Paper

Kurtz1
Computer Vision: Optical Character Recognition
Mark Kurtz
Fontbonne University
Department of Mathematics and Computer Science
ABSTRACT
Humans are very visual creatures. With everything we do or think about, our visual
system is generally involved. The activities that incorporate our visual systemspan everything
from reading to driving. Computer Vision is the study of how to implement the human visual
system and how we perform visual tasks into machines and programs. My studies and this
paper revolve around this topic, specifically Optical Character Recognition. Here OCR (Optical
Character Recognition) tries to recognize characters or words inside of images. My focus
revolved around implementing OCR through custom algorithms I wrote after studying
techniques used in the computer vision field. I successfully created a program with a few
algorithms with the rest of the algorithms untested but documented. The results for the tested
algorithms are documented in this paper.

Kurtz2
1.1 Introduction
Computer Vision has slowly implemented itself into our lives in limited aspects in the
fields of automotive drive assistance, eye and head tracking, filmand video analysis, gesture
recognition, industrial automation and inspection, medical analysis, object recognition,
photography, security, 3D modeling, etc. (Lowe) These applications and the algorithms behind
them are extremely specific. Because the programs created from the algorithms are so specific,
much of the code does not successfully transfer among different applications. Since much of
the code does not transfer among applications, there is no master code or algorithm for the
computer vision field. Hense even the visual systemof a two-year old cannot be replicated—
computer programs still cannot successfully find all the animals in a picture. The reasons for this
are many which simplify down to one point: the human visual systemis extremely hard to
understand and replicate. The process becomes an inverse problem where we try to form a
solution that resembles what our eyes process. This seems easy, but there are many different
hidden layers between the input images on our eyes to what we perceive. With numerous
unknowns, much of the focus in the computer vision field has resorted to physics-based or
statistical models to determine potential solutions. (Szeliski)
Because of how well our visual systemhandles images, I underestimated the complexity
of computer vision. It seems easy to select different objects in an image in our everyday world.
Looking around you can readily distinguish objects in your surroundings, what they are, how far
away they are, and their three-dimensional shape. To determine all of this our brains transform
the images taken in by our eyes in many different steps. As an example, we may perceive colors
darker or lighter than what the actual color value is. This is how we see the same shade of red

Kurtz3
on an apple throughout the day despite the changing colors of light reflecting through the
atmosphere. An example is in the picture that follows. The cells A and B are the exact same
color, but our visual systemchanges the color we perceive based on the surrounding colors:
Current algorithms have yet to effectively replicate the human color perception system.
(McCann) Other visual tricks our eyes perform range from reconstructing a three-dimensional
reality from two-dimensional images in our retinas to perceiving lines and edges from missing
image data. An article titled What Visual Perception Tells Us about Mind and Brain explains this
in more detail. “What we see is actually more than what is imaged on the retina. For example,
we perceive a three-dimensional world full of objects despite the fact that there is a simple
two-dimensional image on each retina. … Thus perception is inevitably an ambiguity-solving
process. The perceptual systemgenerally reaches the most plausible global interpretation of
the retinal input by integrating local cues”.
Despite the complexity of the human visual system, many people have tried to replicate
it and implement it in machines and robotics. Many algorithms have been developed for
specific applications across different fields and disciplines. One of the most prevalent in
consumer applications is facial recognition. In fact, it has nearly become a part of everyday use
because of its use in images in social networks such as Facebook, images processed in digital

Kurtz4
cameras, and images processed in photo editing software such as Picasa. The most successful
algorithm creates Eigenfaces, which are a set of eigenvectors, and then looks for these inside of
an image. The eigenvectors are derived from statistical analysis of many different pictures of
faces. (Szeliski) In other words, it creates a template from other pre-labeled faces and searches
through an image to see where they occur. This way seems surprising to me since the algorithm
never tries to break apart the constituents of the image or even determine what is contained
within the image. The algorithms never process corresponding shapes, three-dimensional
figures, edges, etc. from the image data. With no other processing performed other than a
simple template search, the program has no clues about context leading it to label faces within
an image that we may consider unimportant. For example, I used Picasa to sort my images by
who was in each picture. It does this through facial recognition. However, in some images it
recognized faces that were far off in the background, faces that were out of focus, and even a
face on the cover of a magazine someone was reading in the background of the image.
Not only does facial recognition suffer from a lack of context, but also the exact method
of template matching by using Eigenfaces in facial recognition makes the algorithm extremely
specific. This speaks volumes about the methods used in computer vision. The methods and
algorithms for facial recognition cannot readily be used to identify animals in an image without
completely retraining the algorithm. By extension, the specificity of algorithms developed for
each application in computer vision cannot transfer over to another application. I see this as a
huge problem since we cannot possibly have thousands of different image analysis processes in
our brain running all at once to look specifically for certain objects. I believe the way our brain
determines there is a person in an image is the same as how it determines there are animals in

Kurtz5
an image. The current field of computer vision is moving towards this template matching. While
this may work in specific situations, I cannot see how this will ever replicate human vision
because of the reasons stated before. To break away from the specificity of the computer vision
field I took a few ideas from lower level processing techniques, developed my own algorithms
in place of the ones used, and then explained a new matching technique.
1.2 New Techniques NeededinComputer Vision
Over the decades that computer vision has been studied and developed, it has not
progressed as well as most predicted or hoped for. In fact it has led some who contributed and
pioneered the field to form the extreme view that computer vision is dead. (Chetverikov) While
I do not believe the study of computer vision is dead, I think new algorithms are needed in the
field. The algorithms need to move away from the specific applications and involve more ideas
from neuroscience and biological processes.
Jeff Hawkins, a pioneer of PDAs (Personal Digital Assistant, essentially the precursor to
the smart phone) and now an artificial intelligence researcher, spoke at a TED (Technology,
Entertainment, and Design) conference in 2003. His speech focused on artificial intelligence. In
it he states not only is computer vision moving in the wrong direction, but also the entire field
of artificial intelligence. He explains that this comes from an incorrect, but strongly held belief,
that intelligence is defined by behavior--we are intelligent because of the way we do things. He
counters that our intelligence comes from experiencing the world through a sequence of
patterns that we store and recall to match with reality. In this way we make predictions about
what will happen next, or what should happen next. An example he gives is our recognition of

Kurtz6
faces. As observed through study when humans view a person, we first look at an eye, then the
next eye, then at the nose, and finally at the mouth. This is simply explained by predictions
happening and then being confirmed as we observe our world. We expect an eye, then an eye
next to it, then a nose, then a mouth. If we see something different it will not match up with
our predictions. Here learning or more concentrated analysis will occur. (How Brain Science Will
ChangeComputing)
I developed my research around the direction of prediction as explained by Jeff
Hawkins. In this way patterns, sequences, and predictions can be used in object recognition and
classification. Logically pattern matching seems to make more sense than specific template
matching. By looking for exact matches to templates, we will never be able to reproduce visual
tricks such as when we see shapes in clouds. Also, template matching often produces false
results if an object is shaded differently or partially hidden. (Ritter, G. X., and Joseph N. Wilson)
Pattern matching seems to offer a fix for the different lighting conditions that occur in images.
Also, pattern matching may have a better chance of identifying objects which are slightly
obscured or are missing some data. This is something that must be studied further through
experimentation, however.
1.3 Simplifying Things to Optical Character Recognition
In my beginning research, I hoped to apply the ideas I developed to full images for
classification and recognition of objects. Unfortunately time constraints limited my research
and implementation. Thus I decided to focus on OCR (Optical Character Recognition). This field
is a subset of computer vision which focuses on text recognition in images. In OCR there is a

Kurtz7
limited number of objects that can occur within the images. Also, the images processed in OCR
are much simpler than the full images processed in computer vision. In most cases it is a 2-
dimensional binary image such as an image of a page within a book. While it is simpler than
computer vision, the algorithms I created can be implemented within OCR since it is a field
within computer vision.
OCR is used by many industries and businesses. It is often packaged with Adobe PDF
software and document scanning software. Also, the United States Post Office has heavily
implemented OCR to recognize addresses written on packages and letters. OCR is generally
divided into two methods. The first is matrix matching where characters are matched pixel by
pixel to a template. The second is feature extraction where the program looks for general
features such as open areas, closed shapes, diagonal lines, etc. This method is applied much
more than matrix matching, and is much more successful. (What’s OCR?)
Feature extraction has become the default for OCR software. It works by analyzing
characters to determine strokes, lines of discontinuity, and the background. From here the
program builds up from the character to the word and then assigns a confidence level after
comparing to a dictionary of words. This seems to work well for images converted straight from
computer documents with a 99% accuracy rate. For older, scanned papers the accuracy rate
drops and varies wildly from 71% to 98%. (Holley,Rose)
The method I have developed follows the feature extraction method. The ideas of
feature extraction have worked very well within OCR so far and seem to resemble the idea of
prediction matching described earlier. The problem is defining what features are and how to
decode them inside of an image.

Kurtz8
1.4 Describing the Overall Idea
The general idea of my algorithms in computer vision and OCR is to seperate an image
into constituent objects and then define those objects by their surprising features. In doing this,
the algorithm builds a definition of an object rather than a template for an object. It seems
more natural to describe objects by a definition rather than a template, so I pursued algorithms
that defined objects by definitions. For example, when we describe a face we do not build an
exact template from thousands of faces we have seen in the past. Instead we describe a face as
having two eyes, a nose, and a mouth. These are features that distinguish a face from anything
else. If there were no features that protruded from the interior of the face, we would have a
hard time distinguishing it. Definitions work the same for outlines of an object, too. We can
easily draw the outline of a dolphin because we know the border points that are most
memorable and stick out from a regular oval or other shape. For a dolphin it is the tail, dorsal
fin, and the mouth.
While definition building seems to be a more natural way at understanding images, it
also may offer reasons as to why we see objects in clouds or ink blobs. I believe we see images
in these objects because certain features resemble patterns we have seen before in other
objects. We may see a face in an ink blob because there are two dots to represent eyes and a
nose all in the correct relation to each other. We may see a dolphin in the clouds because a part
of it resembles the dorsal fin and the nose. In both of these examples general objects portray
specific objects because certain key features match up.

Kurtz9
1.5 Lower Level Processing
Lower level processing for computer vision defines finding key points, features, edges,
lines, etc. in an image. At the lower level no matching takes place. The focus is to decode the
image into basic points, lines, and shapes for processing later on. (Szeliski, Richard) One of the
many lower level processes is edge detection. Edges define boundaries between regions in an
image and occur when there is a distinct change in the intensity of an image. There are tens if
not hundreds of algorithms written for edge detection ranging from the simple to the
extremely complex. (Nadernejad, Ehsan)
The most widely used algorithms for edge detection are the Marr-Hildreth edge
detector and the Canny edge detector. (Szeliski, Richard) The general algorithm for the Marr-
Hildreth edge detector first applies a Gaussian smoothing operator (a matrix which
approximates a bell-shaped curve) and applies a two dimensional Laplacian to the image
(another matrix which is equivalent to taking the second derivative of the image). The Gaussian
reduces the amount of noise in the image simply by blurring it. This has the unwanted effect of
losing fine detail in the image, though. The Laplacian is applied to take the second derivative of
the image. The idea is that if there is a step difference in the intensity of an image, it is
represented by a zero crossing in the second derivative. The Canny edge detector also begins by
applying a Gaussian smoothing operator. It then finds the gradient of the image at each point to
indicate the presence of edges while suppressing any points that are not maximum gradient
values in a small region. After all this has been performed, thresholding is applied by using
hysteresis which applies a high and low threshold. Again, the Gaussian loses detail as it tries to
reduce the amount of noise in an image. Both the Marr-Hildreth and the Canny edge detectors

Kurtz10
are very expensive in terms of computation time because of the operations that are involved. I
stepped away from these approaches and tried to look at edge detection in a simpler way that
could be reproduced by artificial neural networks.
Artificial neural networks were inspired by the way biological nervous systems process
information. The neural networks are composed of a large number of processing elements that
work together to solve specific problems. Neurons inside the neural networks are
interconnected and feed inputs into each other. If an operation applied to all of the inputs into
a neuron is above a certain threshold then the neuron sends an input to other neurons. The
way the neurons are connected, the thresholds set for each, and the operations performed on
each all can change and are adapted to better the performance of an algorithm. This replicates
the learning process in biological brains and nervous systems. Artificial neural networks have
been effectively applied in pattern recognition and other applications because of their ability to
change and adapt to new inputs. (Stergiou, Christopher, and Dimitrios Siganos)
Neural networks have a downside, though. They need many sets of training data in
order to achieve accurate results. These sets of training data take a lot of time to make. So,
instead of abandoning neural networks I tried to combine the best of traditional computing and
neural networks for my edge detection algorithms. I built upon using arrays as inputs and
outputs to form the neurons of the neural networks. Next I predetermined what the operations
would be, and then applied a threshold which could be changed to maximize the accuracy of
the algorithm.
My first idea built upon the previously described combination of neural networks and
predefined algorithms. I explored the ideas that edges are step changes (non-algebraic) in light

Kurtz11
intensity in an image. I figured out that I could calculate the change in light intensity along a
specific direction in the image. By doing this, I could approximate the derivative at each point in
an image. After this I could approximate the second derivative, the third derivative, and so on.
With the derivatives approximated, the algorithm can then work backwards and figure out
what the next pixel value should be for an algebraic equation. The following image explains this
further:
This systemworks perfectly for predicting the next value in algebraic equations such as y = 2x +
20 or y = 5x3 + 3x – 10. If the array is expanded to include more pixel values, it can work with
even higher order equations. The algorithm is able to do this because eventually the
approximate derivative is a constant value or 0. Surprisingly, the algorithm also was able to
reasonably approximate the next value in y = cos(x) and y = ln(x). I then designed a program to
apply the algorithm to images. For each pixel it would calculate a predicted value from the
surrounding pixels. If the predicted value and the actual value were off by a certain threshold,

Kurtz12
then the pixel would be marked as an edge. The algorithm did not work as well in practice,
though.
Small variations in light intensity in an image would create large changes in values
higher up in the array. This led to extreme predicted values which did not match with what a
human might expect the next value to be. Another problem appeared whenever edges
occurred between the values used for the prediction. For example, if an edge occurred at Pixel
2 in the image shown above, then the value of PR4 became an extreme value. Noise such as a
bad pixel or spec in the image also created the same problem as an edge occurring in the values
used for the prediction. All three of these problems created false edges and spots inside of the
generated images. Here are the results of this algorithm (the top pictures are gray scale, and
the bottom are the edge detection):
After days of trying and several algorithms, I derived an algorithm that was based off of
the previously described neural network and predefined algorithm combination. Essentially it
approximates the derivative of order n on each side of the pixel being tested. This includes local
pixels inside of the reasoning for finding edges. The benefit of including these local pixels is to
eliminate noise that might be present in the image already. It also approximates the first

Kurtz13
derivative on each side of the pixel to be tested. The purpose of the first derivative is to make
sure the edge occurred at the pixel being tested instead of in the general region of the pixel
being tested. Next, instead of predicting the next values, my algorithm simply compared the
values of the approximated derivatives. The following picture explains the algorithm in a visual
way.
The new algorithm seemed to work better, and fairly quickly since all operations are performed
on integer values. I believe it is faster than the Canny and Marr-Hildreth edge detector, but that
is only speculation. To speed up the algorithm and use less memory, I figured out the relation of
each successive approximate derivative. It follows Pascal’s Triangle with alternating signs. For
example, to approximate only the first derivative you take PL1 – P1 (when referring back to the
image above). To approximate the second derivative you would normally calculate the first and
then find the difference between these first derivatives. This equation can be simplified from
DLA2-DLA3 to PL2 – 2*PL1 + P1 (again when referring to the image above). To approximate the

Kurtz14
third derivative, the equation simplifies to PL3 – 3*PL2 + 3*PL1 – P1 (when referring to the
image above).
After the program finds the necessary approximate derivatives, it then calculates the
difference between these derivatives. If the difference is more than a specific threshold, then it
records the results as an edge. The thresholds are adjustable, but I was not able to experiment
with many thresholds or design the program to choose the appropriate thresholds. Despite the
limited testing, the results seempromising. Here are the results of this algorithm (the top
pictures are gray scale, and the bottom are the edge detection):
As shown in the above image, the algorithm seems to work fairly well for single objects
within a landscape. The algorithm has problems decoding edges when there are a lot of objects
within the same image such as with the picture of trees. However, I believe all edge detectors
have this problem. It also has problems with texture such as the water in the dolphin picture or
the fur on the kangaroo. Some of these problems with texture occurred because the algorithm
only uses the gray scale version of the images.
To fix the problem of only being able to analyze gray scale images, I decided to make a
three dimensional color mapping to plot the points of every possible color combination. I

Kurtz15
worked on the color mapping using two main facts. The first is that the lowest possible color is
black and the highest possible color is white. All other colors can be considered a tint of black or
a shade of white. The fact that there are two colors that all others build from gave me two
limiting points to build a mapping off of at each end of the map. This gave me a basis for where
to put colors when related to the Z-axis in the three dimensional mapping. The sum of the red,
blue, and green pixel values would determine where the color was on the Z-axis. Black (with
RGB pixel values 0,0,0) occurs at Z=0. White (with RGB pixel values of 255,255,255) occurs at
Z=765. Red, green and blue then occur at Z=255 (Red has an RGB value of 255,0,0 for example)
and yellow, cyan, and magenta occur at Z=510 (yellow has an RGB value of 255,255,0 for
example). The second fact I used comes from color theory where the colors red, blue, and
green are all separated by an angle of 120 degrees on a color wheel. By using this color wheel
and the separation of the three primary colors by 120 degrees, I created an equilateral triangle
mapping for color values with the maximum red values at the top corner of the triangle, the
maximum green values in the left corner of the triangle, and the maximum blue values in the
right corner of the triangle. Using this triangle, I mapped an XY axis over it with the Y axis
aligned with the red corner of the triangle and the origin at the centroid of the triangle. Also,
the Z-axis was shifted so that 382 (the approximate center of the range 0 to 765) occurs at Z =
0. With all of this explained, here are the equations for the mapping and a pictorial diagram:
𝑋 = (𝐵𝑙𝑢𝑒 ∗ cos
−π
6
) + (Green ∗ cos
7π
6
)
𝑌 = 𝑅𝑒𝑑 + (𝐵𝑙𝑢𝑒 ∗ sin
−𝜋
6
) + (𝐺𝑟𝑒𝑒𝑛 ∗ sin
7𝜋
6
)
𝑍 = ( 𝑅𝑒𝑑 + 𝐺𝑟𝑒𝑒𝑛 + 𝐵𝑙𝑢𝑒) − 382

Kurtz16
Computation time was decreased by rounding and increasing the values so that
everything was kept as an integer. The formulas then became:
𝑋 = ( 𝐵𝑙𝑢𝑒 − 𝐺𝑟𝑒𝑒𝑛) ∗ 173
𝑌 = 𝑅𝑒𝑑 ∗ 100 − ( 𝑏𝑙𝑢𝑒 + 𝑔𝑟𝑒𝑒𝑛) ∗ 50
𝑍 = ( 𝑟𝑒𝑑 + 𝑔𝑟𝑒𝑒𝑛 + 𝑏𝑙𝑢𝑒 − 382) ∗ 100
The first algorithm written with this new color mapping worked very well despite its
simplicity. The algorithm compared the distances between the color values of pixels on either
side of the pixel being tested. It did this to all pixels within a predetermined distance away from
the pixel being tested along the horizontal and vertical directions in the image. If the calculated
magnitudes between the color values of pixels on either side of the pixel being tested exceeded
a threshold, then the pixel being tested was marked as an edge. Here is an image diagram of
the algorithm, the results, and the results compared with other edge detectors:

Kurtz17
(Images from Eshan Nadernejad’s Edge Detection Technique)

Kurtz18
The edge detection algorithms listed so far are all the ones that have been developed
and tested. I still have two more I would like to develop and test, so the general ideas behind
the algorithms are included here. The first builds further on the color mapping I developed. The
algorithm uses the color mapping to parse through an image and group pixels together
according to which pixels are closest in value to each other. It does this recursively by searching
matching every pixel in an image to neighboring pixels that are closest in value. I created an
algorithm to attempt this methodology, but it took several minutes to go through one image
and did not produce accurate results. The second algorithm yet to be developed and tested can
be added onto any of the algorithms listed in this paper. Essentially the algorithm checks to
make sure a pixel marked as an edge is part of a line and not just random values or single pixels
in an image. Also, the algorithm will be able to fill in missing pixels based on the presence of
surrounding edge values.
The results of the edge detection were easiest to see in regular images, but the
algorithms are fully applicable in OCR. Here is an example of text in an image that has been run
through the color mapping edge detection algorithm:
The edge detection algorithms will help with recognition in the next section.

Kurtz19
1.6 Dopamine Maps
With the lower level processing resolved, I still needed a way to match characters and
objects. The general ideas for the algorithms to match characters and objects come from a
prediction and pattern approach. Generally humans define objects by their features. For
example, a face contains two eyes, a nose, and a mouth. The letter “A” is made up of two
slanted lines which meet at a point at the top with a horizontal line in the middle. I followed
this methodology rather than trying to match objects by templates like many algorithms before.
My way of matching assumes smooth transitions between areas and lines. In this way it can
predict what the next value would be much like the first algorithm I described for edge
detection. If the predictions the algorithm makes do not match up, then it marks the points as
surprises or unexpected values. In the same way that dopamine in your brain is release by
unexpected events, the algorithm marks points that it did not expect. I will explain these
algorithms in detail, but the programs behind the algorithms are not working yet.
There are three types of dopamine maps I have developed and plan to implement. The
first map is called the boundary dopamine map which describes the outside shape of an object.
The algorithm traces along the boundary of an object predicting where the next point should be
based off of previous points and their slopes. If the point predicted by the algorithm is not part
of the boundary of the object, then that point is marked in the dopamine boundary map. A
good example is the letter “A” again. Here the algorithm explores the boundary of the letter
and marks anything it did not expect. In this case it marks the ends of two slanted lines since it
expected the lines to continue. It also marks the point at the top and the intersections of the
horizontal and the slanted lines because the change in slope is very extreme. It did not mark

Kurtz20
those points because they were corners. Any extreme change in slope will create a surprise
point according to the algorithm. The image below shows the final output from this type of
algorithm.
The points marked from the boundary dopamine algorithm are only relational to each other. In
other words, these points can show up in any orientation and will still be matched as long as
their distance and orientation towards each other are the same.
The next boundary map is called an interior dopamine map. Again, the algorithm makes
predictions about what it expects to see and marks anything that does not match up. The
interior dopamine maps algorithm assumes smooth transitions between surfaces. It tracks
along the image and continues predicting what the next values should be based on the previous
values. For example, we defined a face earlier as being made up of two eyes, a nose, and a
mouth. Those objects distinguish a face, and that is exactly what this type of algorithm would
mark. The features that show up inside of an object and their relation to each other define
what an object is. It only makes sense to design an algorithm to do try this. Also, once the
algorithm has been trained on what a nose, eyes, and mouth are it can define a face in the
same way. So, instead of saying there should be feature points in this position relative to this

Kurtz21
position, it can build from the bottom up. If the algorithm finds an eye, it would expect another
eye and a nose to be present for a face.
The final map is called the background dopamine map. It is defined by using the border
and interior methods. The only difference is every object stores the types of backgrounds it can
be found on. This is to avoid confusion between objects that look like other objects. If the
algorithm saw a marble that looked like an eye, the above algorithms would try to label it as an
eye. By including the background, the algorithm can figure out that it is not an eye since it does
not occur on a face.
1.7 Generating Dopamine Maps and their Importance
The maps are generated from a pre-labeled data set. Dopamine maps would be
generated for each image in the pre-labeled data set. An algorithm would then find the
correlation among the maps and create a new map taking into account the differences between
the pictures. If a dopamine map is sufficiently different from other maps, then learning will take
place and a new map would be created alongside the old one. The need for learning becomes
apparent when we look at the letter “a” in different fonts which may be represented as “a” or
“a”. These correlation maps allow a general description for an object that is self relational and
definition based so that it can recognize different types of objects it has not seen before.
These maps seemto be a much more human way of thinking about things than
template matching. When we are asked to describe what an object looks like, we are not
drawing an exact object from a template stored in our brain. Instead we draw an object based
off of its definition. For the letter “E” we understand that it is a vertical line with horizontal lines

Kurtz22
that extends from the top and bottom a certain distance and another horizontal line that
extends from the middle. These maps allow learning and adjustments to be made within the
program constantly.
1.8 Smart Algorithms
The focus so far has been on algorithms that allow for learning and prediction of values.
These are present in the human cognitive process and provide what should be a much more
dynamic object recognition process. Also, all the implementations have been simple
mathematical operations which can be easily and quickly performed. I intentionally stayed
away from complex mathematics such as Gaussian smoothing, using Eigen vectors, or statistical
analysis because it’s hard to see how our brains neural cells could implement these operations.
Also, the methods I have written so far have all had customizable values or thresholds. This
allows the program to learn and adapt so that it runs efficiently and outputs the best results.
1.9 Conclusionand Results
I have very few results to report at this time because of time constraints. I can report
results on the edge detection algorithm, though. It appears to perform at the same level as
more complicated algorithms as shown in the included pictures. Also, the implementation in
OCR will help out greatly. I ran the program on sample text and it performed better that I could
have expected. The text images were full of different text colors, text sizes, and background
colors. The program was able to successfully separate out the characters into a binary black and
white image. The next step will be to implement the border prediction algorithm.

Kurtz23
I plan to continue this research outside of class since it looks promising. It looks
promising, and I look forward to working on it. The working code used so far is attached.
2.0 Acknowledgments
I would like to thank Dr. Abkemeier and the Fontbonne Math Department for letting me
pursue this area of interest. This paper was submitted to the faculty of Fontbonne University’s
Department of Mathematics and Computer Science as partial requirement for the degree of
Bachelor of Science in Mathematics.

Kurtz24
2.1 References
Chetverikov, Dmitry. Is Computer Vision Possible? Rep. N.p.: n.p., n.d. Print.
Holley, Rose. "How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale
Historic Newspaper Digitisation Programs." D-Lib Magazine N.p., n.d. Web. 5 Apr. 2013.
<http://www.dlib.org/dlib/march09/holley/03holley.html>.
How Brain Science Will Change Computing. Dir. Jeff Hawkins. TED, n.d. Web.
Lowe, David. The Computer Vision Industry. N.p., n.d. Web. 6 Apr. 2013.
<http://www.cs.ubc.ca/~lowe/vision.html>.
McCann, John J. Human Color Perception. Cambridge: Polaroid Corporation, 1973. Print.
Nadernejad, Ehsan. "Edge Detection Techniques: Evaluations and Comparisons." Mazandaran
Institute of Technology, n.d. Print.
Ritter, G. X., and Joseph N. Wilson. Handbook of Computer Vision Algorithms in Image Algebra.
Boca Raton: CRC, 1996. Print.
Shimojo, Shinsuke, Michael Paradiso, and Ichiro Fujitas. What Visual Perception Tells Us about
Mind and Brain. Rep. N.p., n.d. Web. 5 Apr. 2013.
<http://www.pnas.org/content/98/22/12340.full>.
Stergiou, Christopher, and Dimitrios Siganos. "Neural Networks." Neural Networks. N.p., n.d.
Web. 5 Apr. 2013.
<http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html>.
Szeliski, Richard. Computer Vision: Algorithms and Applications. London: Springer, 2011. Print.
"What's OCR?" Data ID. N.p., n.d. Web. 5 Apr. 2013. <http://www.dataid.com/aboutocr.htm>.

Senior Project Paper

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Senior Project Paper

Similar to Senior Project Paper (20)

Senior Project Paper