An introduction to MedSpaCy

MedSpaCy
Presented By:
Siddharth Mamania

Introduction
• MedSpacy is a library tool for performing clinical NLP and text processing task.
• It brings together number of other packages, each of which implements specific
functionality for common clinical text processing such as sentence segmentation,
contextual analysis & attribute assertion and section detection.
• MedSpaCy is modularized.
• All of MedSpaCy is designed to be used as a part of spaCy processing pipeline.

Modules of MedSpaCy
• medspacy.preprocess
• medspacy.sentence_splitter
• medspacy.ner
• medspacy.context
• medspacy.section detection
• medspacy.postprocess
• medspacy.visualization

Components of MedSpaCy Model
• By default it is built off spaCy’s en_core_web_model and will include:
1. Tokenizer
2. Sentencizer
3. TargetMatcher
4. ConText
specifying component
adding specific component to existing model

Tokenization
• Clinical language is very different from general natural language.
• Abbreviation and Punctuation are used irregularly and tokenizer trained on general English
perform poorly on clinical text.
• For this MedSpaCy has custom tokenizer.

Sentence Segmentation
• Sentence Segmentation in MedSpaCy is performed in two ways:
1. standard POS tagger/dependency parser
2. PyRuSH

ConText
• Clinical text often contains mentions of concepts which patients did not actually experience.
• There is no evidence of pneumonia
• Mother with breast cancer
• Patients presents r/o COVID-19
• ConText links target entities like problem with semantic modifiers.

Adding Pipeline Component
• MedSpaCy also provides other component which can be instatiated & added to
existing pipeline:
• Sectionizer
• Preprocessor
• Postprocessor

• Sometimes we can see default rules didn’t catch some titles.
• We can add the pattern by passing two key values
• “section title”
• “pattern”

• The sectionizer will add attributes to allow us to access section data point.

Preprocessing
• Rules are defined using the PreprocessRule
• Rules take these arguments:
• pattern: A compiled regular expression defining the text to match in a text
• repl: An optional replacement for the matched text. Default will replace be a blank string,
meaning that the matched text will be removed. This can be either a string or a function to
pass in to re.sub
• callback: A callback function which takes the match object as an argument and returns the
replaced text. This can be used for more complex modifications to the text other than just
modifying the specific text
• desc: An optional description for the rule

Postprocessing
• Postprocessor iterates through each entity and checks series of conditions on each.
• The design of postprocessing rule is as follows:
• A PostprocessingRule contains a list of patterns and an action to take if all of the patterns
evaluate as True
• Each PostprocessingPattern takes a condition, which evaluates as True or False. If all
patterns return True, the action is taken
• Each pattern can take option condition_args to pass into the condition check, and each rule
takes optional action_args
• The module postprocessing_functions offer utility functions for the condtion and
description arguments.

Using Pretrained Model
• The model used is trained with data from i2b2 2012.

Visualizer
• Being able to visualize the result of NLP model are extremely useful for both
development and sharing result.
• MedSpaCy provides visualization capabilities based on spaCy’s displacy module.
• MedSpaCy’s visualizer are initiated with medspacy.visualizer.
• There are 2 types of visualization in medspacy:
• ent
• dep

An introduction to MedSpaCy

More Related Content

What's hot

Recently uploaded

An introduction to MedSpaCy