Bachelor Thesis

6,231 views

Published on

This thesis describes a method to find a part of online data in an offline
document. This method is able to find the offline document that belongs
to the online data from a set of offline documents, or vice versa. In order to
optimize the mapping between the online and the offline data, an optimal rotation
and resizing of the online data is calculated. This is useful since it produces
a better mapping between online and offline data, which makes several methods
that are only applicable for online data available for offline data, and vice
versa.

Results show that this method can be used for finding the offline document that
belongs to certain online data, since it succeeded in 98.07% of the cases for
the used dataset. The results also show that computing the optimal rotation and
resize factor significantly improves the mapping between online and offline
data. This improvement is 6.56% for the used dataset.

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
6,231
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bachelor Thesis

  1. 1. Mapping online data on offline documents Artificial handwriting recognition Stefan Kennedie
  2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Method </li></ul><ul><li>Experiment </li></ul><ul><li>Discussion </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>Online data </li></ul><ul><ul><li>Tablet </li></ul></ul><ul><ul><li>Sequence of coordinates and pressure </li></ul></ul><ul><li>Offline data </li></ul><ul><ul><li>Digital scanner </li></ul></ul><ul><ul><li>Bitmap with greyvalue or color </li></ul></ul>
  4. 4. Properties of online and offline data Known Unknown Width of ink Unknown Known Location of pen known when not on paper Unknown Known Velocity and acceleration Unknown Known Order of writing Offline Online
  5. 5. Advantages of combining <ul><li>Applying online techniques to offline data and vice verca. </li></ul><ul><ul><li>e.g. segmentation </li></ul></ul><ul><li>Recognition improves </li></ul>
  6. 6. Combining – what’s the problem? <ul><li>Difference in resolution between tablet and digital scanner </li></ul><ul><li>Allignment of paper on tablet and digital scanner </li></ul><ul><li>Movement of paper (not investigated) </li></ul><ul><li>Pen angle (unsolved) </li></ul>
  7. 7. Pen angle <ul><li>Difference in contact point </li></ul><ul><li>Unsymmetric magnetic field </li></ul>
  8. 8. Goal <ul><li>Find correct match between offline and online data </li></ul><ul><li>Create a match (mapping) between online and offline data that is as high as possible using: </li></ul><ul><ul><li>Rotation </li></ul></ul><ul><ul><li>Resizing </li></ul></ul>
  9. 9. Method – Overview <ul><li>Search for a query (created from online data) in the offline document </li></ul><ul><ul><li>In low (12.5%) resolution </li></ul></ul><ul><ul><li>Mark locations with a good match </li></ul></ul><ul><li>Investigate these locations in detail, find best match using: </li></ul><ul><ul><li>Rotation </li></ul></ul><ul><ul><li>Resizing </li></ul></ul>
  10. 10. Method (1) – Query creation <ul><li>Create a query from online data </li></ul><ul><ul><li>Unique identification code </li></ul></ul><ul><ul><li>Bresenham’s line algorithm </li></ul></ul><ul><ul><li>12.5% of original resolution of offline data </li></ul></ul>
  11. 11. Method (1) – preprocessing offline data <ul><li>Upper half of first quadrant </li></ul><ul><li>Remove noise </li></ul><ul><ul><li>Using Otsu Threshold algorithm </li></ul></ul><ul><li>Resize to 12.5% of original resolution </li></ul>
  12. 12. Method (1) – searching for query <ul><li>Find location with optimal match between query and offline document </li></ul><ul><li>Compare query with offline document at all possible locations using: </li></ul><ul><ul><li>Sliding window </li></ul></ul><ul><ul><li>Euclidean Distance Mapping for Matching </li></ul></ul><ul><li>Output: Match error </li></ul><ul><ul><li>Good match  small error </li></ul></ul><ul><ul><ul><li>Visualized as light pixel </li></ul></ul></ul><ul><ul><li>Bad match  big error </li></ul></ul><ul><ul><ul><li>Visualized as dark pixel </li></ul></ul></ul>
  13. 13. Method (1) – euclidean distance mapping example <ul><li>Offline document </li></ul><ul><li>Query </li></ul><ul><li>XOR image </li></ul>
  14. 14. Method (1) – processing match errors <ul><li>Remove locations with high error </li></ul><ul><li>Remove error values </li></ul><ul><li>Add surrounding locations </li></ul>1 2 3
  15. 15. Method (2) – investigation in detail <ul><li>Find location, rotation and resize factor with best match. </li></ul>
  16. 16. Experiments <ul><li>Retrieval experiment – mapping in document level </li></ul><ul><li>Optimal rotation and resizing experiment – mapping on pixel level </li></ul><ul><li>Both done for three writers </li></ul>
  17. 17. Experiment 1 <ul><li>Find correct match between the online and offline data </li></ul><ul><ul><li>Uses only part one of the method, no rotation and resizing </li></ul></ul><ul><li>Compare all online data with all offline data (of one writer) </li></ul><ul><ul><li>Pairs with smallest match error belong to each other? </li></ul></ul><ul><ul><ul><li>Visual exploring the results </li></ul></ul></ul>
  18. 18. Experiment 1 – results <ul><li>In total 207 data pairs </li></ul><ul><li>Algorithm found 203 correct matches (98.07%) </li></ul><ul><li>% correct pixels for ‘match’ group: 81.08% </li></ul><ul><li>% correct pixels for ‘no match’ group: 38.15% </li></ul><ul><li>This difference of 42.93% is significant (p=.000) </li></ul>
  19. 19. Experiment 2 <ul><li>Compute the maximum % of correct pixels for all rotatoins and resizing factors for all data pairs. </li></ul>
  20. 20. Experiment 2 – results <ul><li>% of correct pixels without rotating and resizing: 80.72% </li></ul><ul><li>% of correct pixels with optimal rotation and resizing factor: 87.28% </li></ul><ul><li>Difference of 6.56% is significant (p=.000) </li></ul>
  21. 21. Discussion <ul><li>The method is very slow </li></ul><ul><ul><li>Alternative approach like Fast Foourier transformation </li></ul></ul><ul><li>The significance of the result depends on the data </li></ul><ul><li>Large black regions in offline data </li></ul><ul><ul><li>Edge detection </li></ul></ul>
  22. 22. Conclusion <ul><li>The method is useful for finding online-offline data pairs </li></ul><ul><ul><li>Succesfull in 98.07% </li></ul></ul><ul><li>The method is able to provide a better mapping between online and offline data </li></ul><ul><ul><li>6.56% improvement </li></ul></ul><ul><ul><li>Better results can be achieved using the pen angle </li></ul></ul>
  23. 23. Questions <ul><li>? </li></ul>

×