MedSpaCy
Presented By:
Siddharth Mamania
Introduction
• MedSpacy is a library tool for performing clinical NLP and text processing task.
• It brings together number of other packages, each of which implements specific
functionality for common clinical text processing such as sentence segmentation,
contextual analysis & attribute assertion and section detection.
• MedSpaCy is modularized.
• All of MedSpaCy is designed to be used as a part of spaCy processing pipeline.
Modules of MedSpaCy
• medspacy.preprocess
• medspacy.sentence_splitter
• medspacy.ner
• medspacy.context
• medspacy.section detection
• medspacy.postprocess
• medspacy.visualization
Components of MedSpaCy Model
• By default it is built off spaCy’s en_core_web_model and will include:
1. Tokenizer
2. Sentencizer
3. TargetMatcher
4. ConText
specifying component
adding specific component to existing model
Tokenization
• Clinical language is very different from general natural language.
• Abbreviation and Punctuation are used irregularly and tokenizer trained on general English
perform poorly on clinical text.
• For this MedSpaCy has custom tokenizer.
Sentence Segmentation
• Sentence Segmentation in MedSpaCy is performed in two ways:
1. standard POS tagger/dependency parser
2. PyRuSH
Target Extraction
Adding Custom Attributes
ConText
• Clinical text often contains mentions of concepts which patients did not actually experience.
• There is no evidence of pneumonia
• Mother with breast cancer
• Patients presents r/o COVID-19
• ConText links target entities like problem with semantic modifiers.
Adding Pipeline Component
• MedSpaCy also provides other component which can be instatiated & added to
existing pipeline:
• Sectionizer
• Preprocessor
• Postprocessor
Section Detection
• Sometimes we can see default rules didn’t catch some titles.
• We can add the pattern by passing two key values
• “section title”
• “pattern”
• The sectionizer will add attributes to allow us to access section data point.
Preprocessing
• Rules are defined using the PreprocessRule
• Rules take these arguments:
• pattern: A compiled regular expression defining the text to match in a text
• repl: An optional replacement for the matched text. Default will replace be a blank string,
meaning that the matched text will be removed. This can be either a string or a function to
pass in to re.sub
• callback: A callback function which takes the match object as an argument and returns the
replaced text. This can be used for more complex modifications to the text other than just
modifying the specific text
• desc: An optional description for the rule
Postprocessing
• Postprocessor iterates through each entity and checks series of conditions on each.
• The design of postprocessing rule is as follows:
• A PostprocessingRule contains a list of patterns and an action to take if all of the patterns
evaluate as True
• Each PostprocessingPattern takes a condition, which evaluates as True or False. If all
patterns return True, the action is taken
• Each pattern can take option condition_args to pass into the condition check, and each rule
takes optional action_args
• The module postprocessing_functions offer utility functions for the condtion and
description arguments.
Using Pretrained Model
• The model used is trained with data from i2b2 2012.
Visualizer
• Being able to visualize the result of NLP model are extremely useful for both
development and sharing result.
• MedSpaCy provides visualization capabilities based on spaCy’s displacy module.
• MedSpaCy’s visualizer are initiated with medspacy.visualizer.
• There are 2 types of visualization in medspacy:
• ent
• dep
THANKYOU !!!

An introduction to MedSpaCy

  • 1.
  • 2.
    Introduction • MedSpacy isa library tool for performing clinical NLP and text processing task. • It brings together number of other packages, each of which implements specific functionality for common clinical text processing such as sentence segmentation, contextual analysis & attribute assertion and section detection. • MedSpaCy is modularized. • All of MedSpaCy is designed to be used as a part of spaCy processing pipeline.
  • 3.
    Modules of MedSpaCy •medspacy.preprocess • medspacy.sentence_splitter • medspacy.ner • medspacy.context • medspacy.section detection • medspacy.postprocess • medspacy.visualization
  • 4.
    Components of MedSpaCyModel • By default it is built off spaCy’s en_core_web_model and will include: 1. Tokenizer 2. Sentencizer 3. TargetMatcher 4. ConText specifying component adding specific component to existing model
  • 5.
    Tokenization • Clinical languageis very different from general natural language. • Abbreviation and Punctuation are used irregularly and tokenizer trained on general English perform poorly on clinical text. • For this MedSpaCy has custom tokenizer.
  • 6.
    Sentence Segmentation • SentenceSegmentation in MedSpaCy is performed in two ways: 1. standard POS tagger/dependency parser 2. PyRuSH
  • 7.
  • 9.
  • 10.
    ConText • Clinical textoften contains mentions of concepts which patients did not actually experience. • There is no evidence of pneumonia • Mother with breast cancer • Patients presents r/o COVID-19 • ConText links target entities like problem with semantic modifiers.
  • 12.
    Adding Pipeline Component •MedSpaCy also provides other component which can be instatiated & added to existing pipeline: • Sectionizer • Preprocessor • Postprocessor
  • 13.
  • 14.
    • Sometimes wecan see default rules didn’t catch some titles. • We can add the pattern by passing two key values • “section title” • “pattern”
  • 15.
    • The sectionizerwill add attributes to allow us to access section data point.
  • 16.
    Preprocessing • Rules aredefined using the PreprocessRule • Rules take these arguments: • pattern: A compiled regular expression defining the text to match in a text • repl: An optional replacement for the matched text. Default will replace be a blank string, meaning that the matched text will be removed. This can be either a string or a function to pass in to re.sub • callback: A callback function which takes the match object as an argument and returns the replaced text. This can be used for more complex modifications to the text other than just modifying the specific text • desc: An optional description for the rule
  • 19.
    Postprocessing • Postprocessor iteratesthrough each entity and checks series of conditions on each. • The design of postprocessing rule is as follows: • A PostprocessingRule contains a list of patterns and an action to take if all of the patterns evaluate as True • Each PostprocessingPattern takes a condition, which evaluates as True or False. If all patterns return True, the action is taken • Each pattern can take option condition_args to pass into the condition check, and each rule takes optional action_args • The module postprocessing_functions offer utility functions for the condtion and description arguments.
  • 21.
    Using Pretrained Model •The model used is trained with data from i2b2 2012.
  • 28.
    Visualizer • Being ableto visualize the result of NLP model are extremely useful for both development and sharing result. • MedSpaCy provides visualization capabilities based on spaCy’s displacy module. • MedSpaCy’s visualizer are initiated with medspacy.visualizer. • There are 2 types of visualization in medspacy: • ent • dep
  • 32.