Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / Daniel Bekesi

41 views

Published on

Despite the fast pace of digitalization happening in the modern world, core processes in the banking area are still based on printed documents to a large extent. Document processing, therefore, consumes a significant amount of manpower and processing time, as well as an increasing operating risk level of the bank by being prone to human errors. In this session, you will learn how automated document processing can create a great opportunity to modernize and simplify the way modern banks work, reduce associated operation risk level, as well as reduce time and costs spent within a given process area.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / Daniel Bekesi

  1. 1. NLP in a Bank: Automated Document Reading Yevgen Kolesnyk / Patrik Zatko / Daniel Bekesi Vienna Data Science Group Meetup 29.10.2019
  2. 2. Bank as process organization 900+ end-to-end processes … CRM … Account maintenance … Payment transactions … Lending operations … Trade finance operations … Human resources 29.10.2019Advanced Analytics (AA) Tribe 2 600+ external document types … Customer documents … Legal contracts … Financial reports … Extracts from registry … Payment slips … Request forms Typical tasks per document … Register document … Validate document … Understand context … Retype data … Trigger downstream process
  3. 3. 29.10.2019Advanced Analytics (AA) Tribe 3
  4. 4. NLP potential in a bank 29.10.2019Advanced Analytics (AA) Tribe 4 Immediate benefits Reduction of process time Reduction of process cost Reduction of operational risk Full potential Automated product sales Automated customer service Automated operations
  5. 5. Use case overview 29.10.2019 5Advanced Analytics (AA) Tribe Scanned Documents Expected Data (LohnHub) Create Training Data Train ML Models Extract Customer & Property Information WE HAVE WE WANT WE NEED TO DO Image Recognition Data Tagging Relevant Line Classification Named Entity Recognition
  6. 6. DATA PREPARATION 29.10.2019Advanced Analytics (AA) Tribe 6
  7. 7. Image pre-processing 29.10.2019Advanced Analytics (AA) Tribe 7 Open Source Computer Vision Library State-of-the-art tool Used by: Google, Yahoo, Microsoft, Intel, IBM, Sony, Honda, … Contains: a comprehensive set of both classic and state-of-the-art computer vision algorithms Applied in current use case Rescaling, rgb-to-gray conversion, denoising to remove scanning artifacts, thresholding to remove background shadows, color weighting, …
  8. 8. Optical text recognition 29.10.2019Advanced Analytics (AA) Tribe 8 Tesseract Optical Character Recognition Engine State-of-the-art tool Developed by Hewlett Packard, sponsored by Google Contains: multi-lingual support, multi-column text recognition, stamp recognition, equations support Applied in current use case Document is recognized as text based on a LSTM neural network pre-trained on Slovak language Keeps track of syntactic structure of a text and word positions
  9. 9. Data annotation challenges 29.10.2019Advanced Analytics (AA) Tribe 9 #2: SAME WORDS ARE USED IN DIFFERENT CONTEXTS WITHIN THE DOCUMENT #1: NO EXACT MATCH BETWEEN WHAT WE SEARCH FOR AND THE DOCUMENT Typos (“Yevgen” vs. “Evgen”), OCR artifacts (“128/003” vs. “12%/003”), naming conventions (“31.12.2018” vs. “2018-Dec-31”), number rounding (“501,15 m2” vs. “501 m2”), … Different word roles (“Kolesnyk Yevgen” as recipient vs. “Kolesnyk Yevgen” as owner), different entity (“Land area, parcel no 123” vs. “Flat #10 on land area, parcel no 123”)
  10. 10. Data annotation algorithm 29.10.2019Advanced Analytics (AA) Tribe 10 cadastral_area = Zilina, list_id = 6179, type: podiel, ownership_share = 690/33387, parcel_no = 7975, parcel_area = 611 m2 Target Searching algorithm Matching algorithm Based on Damerau-Levenstein distance
  11. 11. PRE-CLASSIFICATION 29.10.2019Advanced Analytics (AA) Tribe 11
  12. 12. 29.10.2019Advanced Analytics (AA) Tribe 12 Overview of modelling Link Data To Documents Train Statistical Models WE NEED TO DO Image Recognition Data Tagging Relevant Line Classification Named Entity Recognition Image Recognition Data Tagging
  13. 13. 29.10.2019Advanced Analytics (AA) Tribe 13 Overview of modelling Link Data To Documents Train Statistical Models WE NEED TO DO Image Recognition Data Tagging Relevant Line Classification Named Entity Recognition Image Recognition Data Tagging
  14. 14. Labels 29.10.2019Advanced Analytics (AA) Tribe 14
  15. 15. fastText - Developed by Facebook‘s AI Research team - Open-source library that allows to learn text representations and text classifiers - Pretrained embeddings for 157 different languages can be downloaded - +/-: + Fast (classify 0.5M sentences among 312K classes in < 1min*) + Covers CEE languages + Can learn words not-seen in a train data - Works only on CPU *Joulin, A., et al. (2016). Bag of tricks for efficient text classification. https://arxiv.org/abs/1607.01759 29.10.2019Advanced Analytics (AA) Tribe 15
  16. 16. Word embeddings A representation understandable for machines. E.g. vlastnik (owner): 0.0126 0.0350 -0.0260 -0.0084 -0.0316 0.0076 0.0137 0.0152 -0.0299 -0.0081 0.0037 0.0287 -0.0362 -0.0010 0.0579 -0.0131 -0.0029 -0.0244 0.0586 0.0291 -0.0505 0.0212 -0.0055 - 0.0321 0.0322 0.0132 0.0264 0.0191 0.0363 0.0145 -0.0145 0.0469 -0.0317 0.0445 0.0187 0.0354 -0.0524 0.0390 0.0426 0.0102 -0.0857 -0.0353 -0.0044 -0.0112 -0.0511 0.0042 0.0426 0.0036 -0.0061 -0.0251 -0.0626 0.0101 -0.0366 -0.0465 -0.0099 -0.0311 -0.0092 -0.0355 -0.0124 0.0151 -0.0034 -0.0283 0.0126 0.0142 0.0096 0.0158 -0.0635 0.0271 0.0036 0.0085 0.0400 -0.0144 -0.0457 0.0132 -0.0201 0.0260 0.0014 -0.0177 0.0532 0.0108 -0.0508 0.0485 -0.0373 -0.0292 -0.0107 0.0408 -0.0059 0.0030 -0.0228 -0.0114 0.0031 0.0297 -0.0388 0.0274 0.0085 -0.0205 0.0083 -0.0247 -0.0010 0.0308 -0.0116 0.0215 -0.0192 0.0009 -0.0727 0.0473 0.0129 0.0360 0.0529 -0.0077 -0.0326 0.0434 -0.0063 0.0192 0.0192 -0.0240 -0.0140 -0.0222 0.0578 0.0361 -0.0042 -0.0216 -0.0157 0.0270 0.0087 0.0106 0.0046 -0.0099 -0.0313 -0.0150 -0.0037 0.0501 0.0106 0.0015 0.0279 -0.0177 0.1055 0.0024 0.0226 -0.0281 -0.0819 -0.0091 0.0177 -0.0009 -0.0705 -0.0076 -0.0153 -0.0165 0.0210 0.0083 -0.0212 0.0213 0.0609 0.0094 0.0035 -0.0095 -0.0224 -0.0100 0.0032 -0.0006 -0.0422 0.0207 0.0268 0.0217 - 0.0101 -0.0192 0.0025 0.0074 -0.0159 0.0112 -0.0188 -0.0085 0.0217 -0.0163 0.0016 0.0073 0.0047 0.0477 0.0033 -0.0176 -0.0269 -0.0247 0.0209 -0.0616 -0.0217 -0.0008 -0.0194 0.0480 0.0415 0.0097 -0.0055 -0.0333 0.0129 0.0995 -0.0098 0.0466 0.0177 -0.0202 -0.0275 -0.0536 -0.0323 -0.0124 0.0053 0.0498 0.0247 0.0372 0.0030 -0.0385 0.0077 0.0307 -0.0180 - 0.0082 0.0197 0.0187 -0.0044 -0.0489 0.0330 -0.0142 -0.0208 0.0126 0.0064 -0.0038 -0.0058 -0.0307 0.0234 -0.0106 0.0241 0.0534 0.0184 -0.0142 0.0062 -0.0456 0.0246 -0.0187 0.0078 -0.0115 0.0260 0.0016 0.0182 -0.0469 -0.0037 0.0287 -0.0268 -0.0424 -0.0059 -0.0292 -0.0190 0.0221 0.0063 -0.0859 0.0184 0.0140 0.0159 0.0233 0.0164 0.0356 -0.0518 -0.0250 0.0321 0.0294 0.0082 0.0088 0.0059 0.0004 -0.0325 -0.0219 0.0080 -0.0196 -0.0384 0.0121 0.0151 0.0536 0.0077 -0.0001 -0.0127 -0.0021 -0.0036 0.0588 -0.0165 -0.0077 -0.0096 -0.0150 - 0.0421 -0.0015 -0.0319 0.0055 -0.0471 -0.0709 0.0114 0.0133 -0.0212 -0.0235 0.0194 -0.0188 0.0740 0.0130 0.0227 -0.0015 0.0234 0.0195 29.10.2019Advanced Analytics (AA) Tribe 16
  17. 17. Word embeddings 29.10.2019Advanced Analytics (AA) Tribe 17
  18. 18. Text embeddings 29.10.2019Advanced Analytics (AA) Tribe 18 * Zolotov, V., Kung, D. (2017): Analysis and Optimization of fastText Linear Text Classifier. (https://arxiv.org/abs/1702.05531)
  19. 19. Model results - relevant lines 29.10.2019Advanced Analytics (AA) Tribe 19
  20. 20. Document classification 29.10.2019Advanced Analytics (AA) Tribe 20 Cadaster decision Cadaster decision Loan Application CategoryDocuments Document 1 Document 2 Document 3 Process of assigning tags or categories to text according to its content
  21. 21. NAMED ENTITY RECOGNITION 29.10.2019Advanced Analytics (AA) Tribe 21
  22. 22. Challenges of NER Named Entity Recognition (NER) is the process of identifying specific groups of words which share common semantic characteristics. ➢ Words can have multiple meanings: • I go to the bank. → I sit on the river bank. • Thus context matters. ➢ Slovak is a small language. ➢ New words the model has never seen. ➢ Non-ML approaches – like regex – already tried by TBSK • need to evolve from a rule-based system to ML models 29.10.2019Advanced Analytics (AA) Tribe 22 Named Entity Recognition
  23. 23. Language Modelling approaches (previous vs state-of-the-art) 29.10.2019Advanced Analytics (AA) Tribe 23
  24. 24. Proposed solution (pretraining + downstream classification) BERT (Bidirectional Encoder Representations from Transformers) Google AI (2018)*: pretrained for 104 languages (incl. all of RBI) 29.10.2019Advanced Analytics (AA) Tribe 24 Named Entity Recognition *J., Devlin et al. (2018): Bert: Pre-training of deep bidirectional transformers for language understanding. (https://arxiv.org/abs/1810.04805) http://jalammar.github.io/illustrated-bert
  25. 25. BERT´s Masked Language Modelling (MLM) LM is essentially predicting words in a blank (Cloze task). ➢ Transformer is a sequence model that forgoes RNNs. ➢ Jointly conditioned on both left and right context. ➢ Self-attention encodes a token as the weighted sum of its context 29.10.2019Advanced Analytics (AA) Tribe 25 http://jalammar.github.io/illustrated-bert https://jalammar.github.io/illustrated-transformer
  26. 26. Classification: downstream fine-tuning The word representations can then be fed into a classifier. • Task-agnostic (NER, Question answering, sentence pair comparison) • The classifier and BERT can be fine-tuned jointly. • A single-layer NN might be sufficient. 29.10.2019Advanced Analytics (AA) Tribe 26 J., Devlin et al. (2018)
  27. 27. Model validation 29.10.2019Advanced Analytics (AA) Tribe 27 http://nlpprogress.com/english/named_entity_recognition.html
  28. 28. Demo 29.10.2019Advanced Analytics (AA) Tribe 28
  29. 29. Thanks a lot! 29.10.2019Advanced Analytics (AA) Tribe 29
  30. 30. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T., (2018). Learning Word Vectors for 157 Languages. arXiv:1802.06893. https://jalammar.github.io/illustrated-bert/ https://jalammar.github.io/illustrated-transformer/ Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. Vaswani, A et al. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). Zolotov, V., Kung, D. (2017): Analysis and Optimization of fastText Linear Text Classifier. arXiv:1702.05531. 29.10.2019Advanced Analytics (AA) Tribe 30 References
  31. 31. APPENDIX 29.10.2019Advanced Analytics (AA) Tribe 31
  32. 32. Appendix: Damerau Levenstein Distance 29.10.2019 32Advanced Analytics (AA) Tribe It is the minimum number of operations* required to change one string into the other * Operations include insertion, deletion, substitution and transposition of two adjacent characters FOR EXAMPLE: distance(“Yevgen”, “Yevgen”) = 0 distance(“Yevgen”, “Evgen”) = 1 distance(“128/001”, “12%/001”) = 1 distance(“Peter”, “Peterzalka”) = 5
  33. 33. Appendix: FastText Performance 29.10.2019Advanced Analytics (AA) Tribe 33Source: https://arxiv.org/pdf/1607.01759.pdf
  34. 34. Appendix: Skipgram 29.10.2019Advanced Analytics (AA) Tribe 34
  35. 35. Appendix: CBOW 29.10.2019Advanced Analytics (AA) Tribe 35
  36. 36. Line classification: model results 29.10.2019Advanced Analytics (AA) Tribe 36 % N
  37. 37. Appendix: Cosine Similarity It is normalized dot product of 2 vectors and this ratio defines the angle between them. The same orientation has a similarity of 1; 90° == 0, and if the vectors are diametrically opposed, i.e. independent == -1. 29.10.2019Advanced Analytics (AA) Tribe 37 Given two vectors of attributes, A and B, the cosine similarity, cos(θ), is represented using a dot product and magnitude as:
  38. 38. Appendix: Attention Mechanism 29.10.2019Advanced Analytics (AA) Tribe 38

×