Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Named Entity Recognition Models Efficiently using NERDS

150 views

Published on

Named Entity Recognition (NER) is foundational for many downstream NLP tasks such as Information Retrieval, Relation Extraction, Question Answering, and Knowledge Base Construction. While many high-quality pre-trained NER models exist, they usually cover a small subset of popular entities such as people, organizations, and locations. But what if we need to recognize domain specific entities such as proteins, chemical names, diseases, etc? The Open Source Named Entity Recognition for Data Scientists (NERDS) toolkit, from the Elsevier Data Science team, was built to address this need.

NERDS aims to speed up development and evaluation of NER models by providing a set of NER algorithms that are callable through the familiar scikit-learn style API. The uniform interface allows reuse of code for data ingestion and evaluation, resulting in cleaner and more maintainable NER pipelines. In addition, customizing NERDS by adding new and more advanced NER models is also very easy, just a matter of implementing a standard NER Model class.

Our presentation will describe the main features of NERDS, then walk through a demonstration of developing and evaluating NER models that recognize biomedical entities. We will then describe a Neural Network based NER algorithm (a Bi-LSTM seq2seq model written in Pytorch) that we will then integrate into the NERDS NER pipeline.

We believe NERDS addresses a real need for building domain specific NER models quickly and efficiently. NER is an active field of research, and the hope is that this presentation will spark interest and contributions of new NER algorithms and Data Adapters from the community that can in turn help to move the field forward.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Building Named Entity Recognition Models Efficiently using NERDS

  1. 1. Building Named Entity Recognition Models efficiently using NERDS Sujit Pal, Elsevier Labs December 2019.
  2. 2. About me • Work at Elsevier Labs • (Mostly self-taught) data scientist • Mostly work with Deep Learning, Machine Learning, Natural Language Processing, and Search. • Got interested in Named Entity Recognition (NER) and NERDS as part of Search and Knowledge Graph development. 2 I am NOT the author or maintainer of NERDS! • Originally built by Panagiotis Eustratiadis. • See CONTRIBUTORS.md for list of contributors. • Open sourced by Elsevier July 3, 2018.
  3. 3. Agenda • What can NER do for you? • Evolution of NER techniques • NERDS Architecture. • NERDS Usage. • Future Work. 3
  4. 4. Agenda • What can NER do for you? • Evolution of NER techniques • NERDS Architecture. • NERDS Usage. • Future Work. 4
  5. 5. What can NER do for you? • In general… • Foundational task for NLP pipelines. • Good NERs available OOB for “standard” named entities. • Topic Modeling, Co-reference Resolution, etc. • Information Retrieval (IR) • Chunk Entities into meaningful multi-word phrases. • Understanding query intent. • Automated Knowledge Graph Construction (AKBC) • NER extracts entities from incoming text. • Relationship Extraction extracts relationships between entity pairs. • Entity Relationship triple inserted into Knowledge Graph. 5 ConceptSearch!
  6. 6. Agenda • What can NER do for you? • Evolution of NER techniques • NERDS Architecture. • NERDS Usage. • Future Work. 6
  7. 7. Evolution of NER Techniques • Rules • Regular Expressions • Gazetteers 7 • Word-based models – PMI, log-likelihood. • Sequence models – Conditional Random Fields • Bi-LSTM • Bi-LSTM+CRF • Transformer based Models Traditional Statistical Neural
  8. 8. Input Format – BIO Tagging • BIO – Begin In Out. • Barack/B-PER Obama/I-PER is/O 44th/O United/B-LOC States/I-LOC President/O ./O • BILOU – a tagging variant: • U – Unit token (for single token entities) • L – Last token in sequence, ex. Barack/B- PER Obama/L-PER 8 Barack B-PER Obama I-PER is O 44th O United B-LOC States I-LOC President O . O
  9. 9. Gazetteer – Aho Corasick • Create in-memory data structure from dictionary. • Stream content against data structure. • Multiple matches with single pass. 9 Aho, A.V., and Corasick, M.J., 1975. Efficient String Matching: An aid to bibliographic search 21 43 0 Barack Obama United States NOT(Barack, United) 5 Airlines PER LOC ORG
  10. 10. Sequence Modeling - CRF • Sequence version of logistic regression. • Computes optimum labeling l (y0, …, yn) over entire sentence s. • Build multiple feature functions f on each token, return real value in range 0..1. Function parameters: • sentence s with tokens (x0, …, xn) – feature can use any token, the entire sentence, or functions computed over the sentence (POS), • current position i, • previous and next labels yi-1 and yi+1. • Optimum labeling computed as follows, probability computed using softmax. • Weights wj learned using gradient descent. 10
  11. 11. Neural Model - BiLSTM • Input is sequence of tokens, output is sequence of BIO tags. • Weights trained end-to-end, no feature engineering needed. • Bidirectional LSTM gets signal from neighboring words on both sides. 11 B-PER I-PER O O B-LOC I-LOC O O Barack Obama is 44th United States PresidentStates .
  12. 12. Neural Model – BiLSTM-CRF • Same as previous model, with additional CRF layer. • No feature engineering for CRF, unlike CRF only NER model. • Pre-trained embeddings observed to improve performance. 12 Barack Obama is 44th United States PresidentStates . B-PER I-PER O O B-LOC I-LOC O O CRFBi-LSTM
  13. 13. Neural Model – adding char embeddings • Concatenate char embedding + word embedding and feed to Bi-LSTM-CRF. • All weights learned end-to-end. • Handles rare / unknown words; Exploits signal in prefix/suffix. 13 .Barack Obama is 44th United PresidentStates B-PER I-PER O O B-LOC I-LOC O O word embeddings char LSTM/CNN Bi-LSTM-CRF concatenate
  14. 14. Neural Model – ELMo preprocessing 14 .Barack Obama is 44th United PresidentStates B-PER I-PER O O B-LOC I-LOC O O char LSTM/CNN Bi-LSTM-CRF concat Contextualized wordembeddings
  15. 15. Neural Model – Transformer based • BERT = Bidirectional Encoder Representation for Transformers. • Source of embeddings similar to ELMo in standard BiLSTM + CRF models, OR • Fine-tune LM backed NERs such as HuggingFace’s BertForTokenClassification. 15 .Barack Obama is 44th United PresidentStates[CLS] B-PER I-PER O O B-LOC I-LOC O O
  16. 16. More Info on NER Techniques • High level overview on NER in series of blog posts by Tobias Sterbak (https://bit.ly/2pNdgPG). • Traditional NER techniques covered in paper by Rahul Shernagat (2014) -- Named Entity Recognition: A Literature Survey (https://bit.ly/2NRaCAg). • Introduction to Neural Models in paper by Ronan Collolbert and Jason Weston (2008) – A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning (https://bit.ly/32rRYnO) • Others (more modern papers) mentioned in slides. 16
  17. 17. Agenda • What can NER do for you? • Evolution of NER techniques • NERDS Architecture • NERDS Usage • Future Work 17
  18. 18. NERDS Overview • Framework that provides easy to use NER capabilities to Data Scientists. • Wraps various popular third party NER models. • Extendable, new third party NER tools can be added as needed. • Software Engineering tooling to boost Data Science productivity. • Looking for support, bug reports, contributions, and ideas. 18
  19. 19. Unification through I/O Format 19 pyAhoCorasick CRFSuite SpaCy NER Anago BiLSTM AnnotatedDocument ( doc: Document(“Barack Obama is 44th United States President .”), annotations: [ Annotation(start_offset:0, end_offset:12, text:”Barack Obama”, label:”PER”), Annotation(start_offset:22, end_offset:35, text:”United States”, label:”LOC”) ])
  20. 20. Benefits of Unification • Consistent API – all models are subclasses of NERModel. • Data prep. done once per project and reused across multiple models. • Reusable Training and Evaluation code. • Familiar Scikit-Learn like API, and access to Scikit-Learn utility functions. • Duck-typing allows us to build Ensembles of NER. • Easy to benchmark NER label data. 20
  21. 21. Can we do better? 21 Data: [[“Barack”, “Obama”, “is”, “44th”, “United” “States”, “President”, “.”]] Labels and Predictions: [[“B-PER”, “I-PER”, “O”, “O”, “B-LOC”, “I-LOC”, “O”, “O”]] DictionaryNER I/O Convert SpacyNER I/O Convert CrfNER BiLstmCrfNER
  22. 22. ELMo NER Model from Anago 22 DictionaryNER CrfNER SpacyNER BiLstmCrfNER Data: [[“Barack”, “Obama”, “is”, “44th”, “United” “States”, “President”, “.”]] Labels and Predictions: [[“B-PER”, “I-PER”, “O”, “O”, “B-LOC”, “I-LOC”, “O”, “O”]] I/O Convert I/O Convert ElmoNER
  23. 23. Agenda • What can NER do for you? • Evolution of NER techniques • NERDS Architecture • NERDS Usage • Future Work 23
  24. 24. Dataset • Bio Entity recognition task from BioNLP 2004. • Training and Test sets provided in BIO format. • 511,097 training examples • 104,895 test examples. • Entity Distribution (training set) • 25,307 DNA • 2,481 RNA • 11,217 cell_line • 15,466 cell_type • 55,117 protein 24
  25. 25. Dictionary NER • Wraps pyAhoCorasick Automaton • Improvements in fork. • Supports dictionary loading as well as fit(X, y) like other NER models. • Handles multiple entity classes. 25
  26. 26. Dictionary NER • Wraps pyAhoCorasick Automaton • Improvements in fork. • Supports dictionary loading as well as fit(X, y) like other NER models. • Handles multiple entity classes. 26
  27. 27. CRF NER • Wraps sklearn.crfsuite CRF • Improvements in this fork: • Removes NLTK dependency, replaces with SpaCy. • Allows non-default features to be passed in. 27
  28. 28. CRF NER • Wraps sklearn.crfsuite CRF • Improvements • Removes NLTK dependency, replaces with SpaCy. • Allows non-default features to be passed in. 28
  29. 29. SpaCy NER • Wraps NER provided by SpaCy toolkit. • Improvements in this fork: • More robust to large data sizes, uses mini-batches for training. 29
  30. 30. SpaCy NER • Wraps NER provided by SpaCy toolkit. • Improvements in this fork: • More robust to large data sizes, uses mini-batches for training. 30
  31. 31. BiLSTM CRF NER • Wraps Anago BiLSTMCRF. • Improvements in this fork: • Works against latest release (1.0.5) of Anago. • No more intermittent failures due to time step mismatches. 31
  32. 32. BiLSTM CRF NER • Wraps Anago BiLSTMCRF. • Improvements in this fork: • Works against latest release (1.0.5) of Anago. • No more intermittent failures due to time step mismatches. 32
  33. 33. Elmo NER • Wraps Anago ELModel. • New in this fork, available in current (dev) version of Anago. • Needs (mandatory) base embedding for ELMo preprocessor. 33
  34. 34. Elmo NER • Wraps Anago ELModel. • New in this fork, available in current (dev) version of Anago. • Needs (mandatory) base embedding for ELMo preprocessor. 34
  35. 35. Ensemble NER • Max Voting • Improvements in this fork: • Unifies Max Voting and Weighted Max Voting NERs into single model. 35
  36. 36. Ensemble NER • Max Voting • Improvements in this fork: • Unifies Max Voting and Weighted Max Voting NERs into single model. 36
  37. 37. Results (OOTB) • Comparison across models • ELMO based CRF has best performance. • SpaCy and BiLSTM have comparable performance, but CRF is competitive. • Model based NERs outperform gazetteers. • F1-scores range from 0.65 to 0.80 • Comparison across entity types • Some correlation observed between data volume and F1-scores for other models. • F1-scores range from 0.61 to 0.81 37
  38. 38. Agenda • What can NER do for you? • Evolution of NER techniques • NERDS Architecture • NERDS Usage • Future Work 38
  39. 39. Future Work • Current API is only superficially Scikit-Learn like, convert to make models fully conform to Scikit-Learn Classifier API. • Eliminate Serialization issues reported by joblib.Parallel. • Eliminate EnsembleNER in favor of ScikitLearn’s VotingClassifier. • Leverage Scikit-Learn’s Model Selection classes (RandomizedSearchCV and GridSearchCV). • Add FLAIR and BERT based NER to supported model collection. • BRAT annotation adapter. 39
  40. 40. Thank you https://github.com/sujitpal/nerds sujit.pal@elsevier.com

×