Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Unlocking the Handwritten Content in  Document Images  Venu Govindaraju [email_address]
Handwritten Documents Relevance Scanner Storage OCR Noisy Text Newton Kinematics Notes Query Forms Letters Notes
Outline <ul><li>Recognition </li></ul><ul><ul><li>Postal Applications </li></ul></ul><ul><ul><li>Paradigms </li></ul></ul>...
Challenge of Handwriting
Input Output 20187 + 2246 Handwriting Recognition
Postal Context  (138 mil records) <ul><li>ZIP Code </li></ul><ul><li>30% of ZIP Codes contain a single street name </li></...
Paradigms Lexicon Driven OCR LDR Lexicon Free  OCR LFR Context Ranked Lexicon Segmentation Recognition Post-processing
Lexicon Free (LFR) i[.8], l[.8] u[.5], v[.2] w[.6], m[.3] w[.7] i[.7] u[.3] m[.2] m[.1] r[.4] d[.8] o[.5] <ul><li>Image fr...
Lexicon Driven (LDR) Find the best way of accounting for  characters  ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8...
Grapheme Models (LFR) Writer Specific Modeling Holistic Features grapheme pos orientation angle Down cusp 3.0 -90 o Up loo...
<ul><li>Amherst  </li></ul><ul><li>Buffalo  </li></ul><ul><li>Boston   </li></ul><ul><li>None of the above </li></ul>ABLE ...
Interactive Models (LDR) Phrase Level  T-crossings, loops, ascenders, descenders, length West Central Street West Main  St...
Interactive Models Character Recognition <ul><li>Adaptive feature selection </li></ul><ul><li>Adaptive number of features ...
Active Recognition
Results 10 class digit recognition 25656 training and 12242 test  (Postal +NIST) Active Model Neural  Net KNN Top 1% 95.7 ...
Fusion   Identification Task Verification Task LDR LFR
Fusion of Recognizers Type III LDR 5.6 7.4 … LFR .52 .81 … Identification task: Amherst Buffalo … Verification task: 5.6 ....
Traditional Fusion Rules <ul><li>Sum rule  </li></ul><ul><li>Weighted  sum rule </li></ul><ul><li>Product rule </li></ul><...
Likelihood Ratio Verification Tasks <ul><li>2 classes: imposter and genuine </li></ul><ul><li>Pattern classification task ...
Optimal Combination functions Identification Task Results Top choice correct rate Verification Task Results ROC LFR is cor...
Independence of Scores In a single trial Amherst 5.6 7.4 … Buffalo .52 .81 … LDR LFR … … . … .
Lexicon1 Lexicon  i Lexicon N Independence of Scores In a single trial Recognizer 1 Recognizer  M Tulyakov & Govindaraju, ...
Optimal  Combination  ? Correlated Scores Dependent on input signal Set size LFR LDR Both correct Either correct LR Weight...
Optimal Trainable Combination Function  Minimizing misclassification cost: Classify as  rather than Assume that scores ass...
Combination Methods  Identification Tasks No!  Traditional Training mixes the genuine and imposter scores from different t...
Combination Methods  Identification Tasks Model  Training MUST process scores from one identification trial as a  single t...
Iterative Methods <ul><li>Initialize a combination function </li></ul><ul><li>Get scores from the same identification tria...
Outline <ul><li>Recognition </li></ul><ul><ul><li>Postal Applications </li></ul></ul><ul><ul><li>Paradigms </li></ul></ul>...
Search for Handwritten Documents <ul><li>Lexicons are typically large: >5K </li></ul><ul><li>Need around 70% accuracy </li...
Search Engine Handwritten Forms <ul><li>Pre Hospital Care Report </li></ul><ul><ul><li>WNY: 250,000 filed a year </li></ul...
Search Engine for Medical Forms <ul><li>Find all people who reported asthma problems in NY </li></ul><ul><li>How many peop...
Topic Categorization  Lexicon Reduction Lex Free Large Lexicon > 5K Handwritten Medical Documents ICR Features ~33% word R...
ICR Features Index
DIGESTIVE-SYSTEM  FQ  CHSN   PHRASE 30  0.72    PAIN INCIDENT 5  0.31    PAIN TRANSPORTED 42  0.54    PAIN CHEST 52  0.81 ...
(Chu-Carroll, et al., 1999) Topic Categorization
Results C: complete lexicon R: reduced lexicon A: category given S: features synthetic T: truth present CLT to RLT CL to R...
Outline <ul><li>Recognition </li></ul><ul><ul><li>Postal Applications </li></ul></ul><ul><ul><li>Paradigms </li></ul></ul>...
Urgent Issue of our Times <ul><li>Vast, irreplaceable, culturally vital legacy collections of historical documents are com...
What is possible today? <ul><li>View Document Images </li></ul>
Document Enhancement [Shi, Setlur, and Govindaraju 2008]
Transcript-Mapping 1787 Thomas Jefferson letter and its transcript  Image Transcript + +
What is not possible today?
 
Crosslingual Retrieval Multilingual Document Corpus Retrieved Documents  English Hindi Sanskrit Translations of “strength”
SEARCH Handwritten Documents Image – Based  Use Image Based Features OCR - Based Use OCR Recognition Results Query rendered
Image Based Methods (Rath 07 IJDAR)  Poor performance in multiple writer scenarios
SEARCH Handwritten Documents Image – Based  Use Image Based Features- OCR - Based Use OCR recognition results
Indexing Retrieval Handwriting  Recognition
Vector IR Model (TF-IDF) <ul><li>Set of terms {t i };  </li></ul><ul><li>Set of documents {d j } of length {L j } </li></u...
Modifications to VM <ul><li>Classic VM: computes the  tf  and  IDF  from the OCR ’ ed text ( top-1 ) </li></ul><ul><li>Mod...
<ul><li>Required Inputs </li></ul><ul><ul><li>Word segmentation result  </li></ul></ul><ul><ul><li>Word recognition likeli...
Estimating Term Frequency
Estimating Segmentation <ul><li>Word Segmentation  </li></ul><ul><li>Gap between adjacent  connected components above a th...
<ul><li>Top-Rank (Top-S candidates involved) </li></ul><ul><li>Weighted Top-Rank </li></ul><ul><li>Empirical </li></ul>Wor...
<ul><li>Thank you! </li></ul>
Upcoming SlideShare
Loading in …5
×

Trivandrum

274 views

Published on

Venu-talk at TCS

Published in: Education
  • Be the first to comment

  • Be the first to like this

Trivandrum

  1. 1. Unlocking the Handwritten Content in Document Images Venu Govindaraju [email_address]
  2. 2. Handwritten Documents Relevance Scanner Storage OCR Noisy Text Newton Kinematics Notes Query Forms Letters Notes
  3. 3. Outline <ul><li>Recognition </li></ul><ul><ul><li>Postal Applications </li></ul></ul><ul><ul><li>Paradigms </li></ul></ul><ul><ul><li>Fusion </li></ul></ul><ul><li>Search </li></ul><ul><ul><li>IR Models </li></ul></ul><ul><ul><li>Word Spotting </li></ul></ul>
  4. 4. Challenge of Handwriting
  5. 5. Input Output 20187 + 2246 Handwriting Recognition
  6. 6. Postal Context (138 mil records) <ul><li>ZIP Code </li></ul><ul><li>30% of ZIP Codes contain a single street name </li></ul><ul><li>5% of ZIP Codes contain a single primary number </li></ul><ul><li>2% of ZIP Codes contain a single add-on </li></ul><ul><li><ZIP Code, primary number> </li></ul><ul><li>Maximum number of records returned is 3,071 </li></ul><ul><li><ZIP Code, add-on> </li></ul><ul><li>Maximum number of records returned is 3,070 </li></ul>LDR Lex Top 1 Top 2 10 96.5 98.7 100 89.2 94.1 1000 75.3 86.3
  7. 7. Paradigms Lexicon Driven OCR LDR Lexicon Free OCR LFR Context Ranked Lexicon Segmentation Recognition Post-processing
  8. 8. Lexicon Free (LFR) i[.8], l[.8] u[.5], v[.2] w[.6], m[.3] w[.7] i[.7] u[.3] m[.2] m[.1] r[.4] d[.8] o[.5] <ul><li>Image from 1 to 3 is a in with 0.5 confidence </li></ul><ul><li>Image from segment 1 to 4 is a ‘w’ with 0.7 confidence </li></ul><ul><li>Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence </li></ul>Find the best path in graph from segment 1 to 8
  9. 9. Lexicon Driven (LDR) Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6 w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] 1 2 3 4 5 6 7 8 9 o[6.1]
  10. 10. Grapheme Models (LFR) Writer Specific Modeling Holistic Features grapheme pos orientation angle Down cusp 3.0 -90 o Up loop Down arc
  11. 11. <ul><li>Amherst </li></ul><ul><li>Buffalo </li></ul><ul><li>Boston </li></ul><ul><li>None of the above </li></ul>ABLE TRIP TRAP A T N Words Letters Features Interactive Models (LDR) 1-way activation [McClelland and Rumelhart 1981] 2-way interaction
  12. 12. Interactive Models (LDR) Phrase Level T-crossings, loops, ascenders, descenders, length West Central Street West Main Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1 Lexicon 2 Lexicon 3 Interactive Model features image 2-way interaction
  13. 13. Interactive Models Character Recognition <ul><li>Adaptive feature selection </li></ul><ul><li>Adaptive number of features </li></ul><ul><li>Adaptive resolutions </li></ul>Gradient (4) and Moment (5) Features 0 1 0 1 1 1 0 0 1 [Park and Govindaraju, IEEE CVPR 2000]
  14. 14. Active Recognition
  15. 15. Results 10 class digit recognition 25656 training and 12242 test (Postal +NIST) Active Model Neural Net KNN Top 1% 95.7 % 96.4% 95.7% Temp 612 976 3,777 Msec 1.45 11.5 384 Training hrs 1 24 1 Lex size LDR % GM % 10 96.86 96.56 100 91.36 89.12 1000 79.58 75.38 (Top 50) 98.00 98.40 20000 62.43 58.14 (Top 100) 93.59 93.39
  16. 16. Fusion Identification Task Verification Task LDR LFR
  17. 17. Fusion of Recognizers Type III LDR 5.6 7.4 … LFR .52 .81 … Identification task: Amherst Buffalo … Verification task: 5.6 .52 Amherst Question: if we find optimal and , is it necessarily ? Accept Reject
  18. 18. Traditional Fusion Rules <ul><li>Sum rule </li></ul><ul><li>Weighted sum rule </li></ul><ul><li>Product rule </li></ul><ul><li>Max rule </li></ul><ul><li>Rank-based methods </li></ul>
  19. 19. Likelihood Ratio Verification Tasks <ul><li>2 classes: imposter and genuine </li></ul><ul><li>Pattern classification task </li></ul>Minimum risk criteria: optimal decision boundaries coincide with the contours of likelihood ratio function: Metaclassification with NN, SVM, etc. also possible [Prabhakar, Jain 02] [Nandkumar, Jain, Das 08] Impostor Genuine Recognizer score 2 Recognizer score 1
  20. 20. Optimal Combination functions Identification Task Results Top choice correct rate Verification Task Results ROC LFR is correct 54.8% LDR is correct 77.2% Both are correct 48.9% Either is correct 83.0% Likelihood Ratio 69.8% Weighted Sum 81.6% <ul><li>LR combination is worse than single matcher </li></ul>
  21. 21. Independence of Scores In a single trial Amherst 5.6 7.4 … Buffalo .52 .81 … LDR LFR … … . … .
  22. 22. Lexicon1 Lexicon i Lexicon N Independence of Scores In a single trial Recognizer 1 Recognizer M Tulyakov & Govindaraju, TIFS 2009 Independent? Dependent Dependent
  23. 23. Optimal Combination ? Correlated Scores Dependent on input signal Set size LFR LDR Both correct Either correct LR Weighted sum 54.8% 77.2% 48.9% 83.0% 69.8% 81.6% 6147 3366 4744 3005 5105 4293 5015 2 nd choice 3 rd choice 4 th choice Mean LFR .4359 .4755 .4771 .1145 LDR .7885 .7825 .7673 .5685
  24. 24. Optimal Trainable Combination Function Minimizing misclassification cost: Classify as rather than Assume that scores assigned to different classes are independent : Tulyakov & Govindaraju IJPRAI 2009
  25. 25. Combination Methods Identification Tasks No! Traditional Training mixes the genuine and imposter scores from different trials. Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer Score 2 Recognizer score 1
  26. 26. Combination Methods Identification Tasks Model Training MUST process scores from one identification trial as a single training sample . BRecognizer score 2 Recognizer score 1 Impostor Genuine Rexcognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Biometric score 1
  27. 27. Iterative Methods <ul><li>Initialize a combination function </li></ul><ul><li>Get scores from the same identification trial (for all trials) </li></ul><ul><li>Update function so Genuine score better than any impostor score </li></ul>Best Impostor Function <ul><ul><li>Sum of Logistic Functions </li></ul></ul>Likelihood Ratio Weighted sum Best Impostor Likelihood Ratio Logistic Sum Neural Network LFR & LDR 69.84 81.58 80.07 81.43 81.67 li & C 97.24 97.23 97.01 97.34 97.39 li & G 95.90 95.47 95.99 96.17 96.29
  28. 28. Outline <ul><li>Recognition </li></ul><ul><ul><li>Postal Applications </li></ul></ul><ul><ul><li>Paradigms </li></ul></ul><ul><ul><li>Fusion </li></ul></ul><ul><li>Search </li></ul><ul><ul><li>Lexicon Reduction </li></ul></ul><ul><ul><li>Word Spotting </li></ul></ul><ul><ul><li>IR Models </li></ul></ul>
  29. 29. Search for Handwritten Documents <ul><li>Lexicons are typically large: >5K </li></ul><ul><li>Need around 70% accuracy </li></ul><ul><li>Strategy </li></ul><ul><li>Reduce lexicon size using topic categorization (DAS 06;08) </li></ul><ul><li>Use Top-N choices returned by OCR (ICDAR 07) </li></ul>Lexicon Good Quality 10K 1K Historical 10K 1K Medical 4K Top 1 (%) 57 67 12 28 20 Top 3 (%) 69 72 22 44 27 Top 10 (%) 74 75 32 72 42
  30. 30. Search Engine Handwritten Forms <ul><li>Pre Hospital Care Report </li></ul><ul><ul><li>WNY: 250,000 filed a year </li></ul></ul><ul><ul><li>NYC: 50,000 filed in a day </li></ul></ul><ul><ul><li>PDAs not popular </li></ul></ul><ul><li>OHR issues </li></ul><ul><ul><li>Loosely constrained writing style </li></ul></ul><ul><ul><li>Large lexicons </li></ul></ul><ul><ul><li>Heterogeneous data </li></ul></ul><ul><li>6,700 carbon forms stored at 300 DPI </li></ul><ul><li>1000 PCR forms ground truthed </li></ul>
  31. 31. Search Engine for Medical Forms <ul><li>Find all people who reported asthma problems in NY </li></ul><ul><li>How many people with high blood pressure are on medication X? </li></ul><ul><li>Is there an epidemic breaking? </li></ul>
  32. 32. Topic Categorization Lexicon Reduction Lex Free Large Lexicon > 5K Handwritten Medical Documents ICR Features ~33% word Recognition rate (10 points gain) Topic Categorization Select Reduced Lexicon ~2.5K Lex Driven
  33. 33. ICR Features Index
  34. 34. DIGESTIVE-SYSTEM FQ CHSN PHRASE 30 0.72 PAIN INCIDENT 5 0.31 PAIN TRANSPORTED 42 0.54 PAIN CHEST 52 0.81 STOMACH PAIN 9 0.25 HOME PAIN 6 0.43 VOMITING ILLNESS Topic Features
  35. 35. (Chu-Carroll, et al., 1999) Topic Categorization
  36. 36. Results C: complete lexicon R: reduced lexicon A: category given S: features synthetic T: truth present CLT to RLT CL to RL CLT to ALT CLT to SLT HR  7.48%  7.42%  17.58%  7.42% Error Rate  10.78%  10.88%  24.53%  10.21%
  37. 37. Outline <ul><li>Recognition </li></ul><ul><ul><li>Postal Applications </li></ul></ul><ul><ul><li>Paradigms </li></ul></ul><ul><ul><li>Fusion </li></ul></ul><ul><li>Search </li></ul><ul><ul><li>Lexicon Reduction </li></ul></ul><ul><ul><li>Word Spotting </li></ul></ul><ul><ul><li>IR Models </li></ul></ul>
  38. 38. Urgent Issue of our Times <ul><li>Vast, irreplaceable, culturally vital legacy collections of historical documents are competing ineffectively for attention with billions of digital documents </li></ul><ul><li>Thus historical archives are threatened with neglect, perceived irrelevance, …. & eventually, oblivion? </li></ul>Threat: ‘If it’s not in Google, it doesn’t exist!’ Baird 2003
  39. 39. What is possible today? <ul><li>View Document Images </li></ul>
  40. 40. Document Enhancement [Shi, Setlur, and Govindaraju 2008]
  41. 41. Transcript-Mapping 1787 Thomas Jefferson letter and its transcript Image Transcript + +
  42. 42. What is not possible today?
  43. 44. Crosslingual Retrieval Multilingual Document Corpus Retrieved Documents English Hindi Sanskrit Translations of “strength”
  44. 45. SEARCH Handwritten Documents Image – Based Use Image Based Features OCR - Based Use OCR Recognition Results Query rendered
  45. 46. Image Based Methods (Rath 07 IJDAR) Poor performance in multiple writer scenarios
  46. 47. SEARCH Handwritten Documents Image – Based Use Image Based Features- OCR - Based Use OCR recognition results
  47. 48. Indexing Retrieval Handwriting Recognition
  48. 49. Vector IR Model (TF-IDF) <ul><li>Set of terms {t i }; </li></ul><ul><li>Set of documents {d j } of length {L j } </li></ul><ul><li>Term Frequency (TF) </li></ul><ul><li>Inverted Document Frequency-IDF </li></ul><ul><ul><li> </li></ul></ul><ul><li> </li></ul><ul><li>Query TF </li></ul><ul><li>Similarity </li></ul>[Baeza-Yates99]
  49. 50. Modifications to VM <ul><li>Classic VM: computes the tf and IDF from the OCR ’ ed text ( top-1 ) </li></ul><ul><li>Modified VM: computes the tf and idf from the top-n choices of word recognition </li></ul>
  50. 51. <ul><li>Required Inputs </li></ul><ul><ul><li>Word segmentation result </li></ul></ul><ul><ul><li>Word recognition likelihoods </li></ul></ul>Estimation : word images 0.02 0.01 0.2 0.01 0.01 … Doc d j [Rath 04, Howe 05]
  51. 52. Estimating Term Frequency
  52. 53. Estimating Segmentation <ul><li>Word Segmentation </li></ul><ul><li>Gap between adjacent connected components above a threshold D </li></ul><ul><li>Generate multiple hypotheses with multiple D </li></ul><ul><li>If hypothesis I w overlaps </li></ul><ul><li>m other hypotheses, then </li></ul>d > D 3 hypotheses
  53. 54. <ul><li>Top-Rank (Top-S candidates involved) </li></ul><ul><li>Weighted Top-Rank </li></ul><ul><li>Empirical </li></ul>Word Recognition
  54. 55. <ul><li>Thank you! </li></ul>

×