SlideShare a Scribd company logo
Enabling semantic search
in a bio-specimen
repository
July 9th, 2013
ICBO 2013
Shahim Essaid, Carlo
Torniai, and Melissa Haendel
OHSU’s Biolibrary Search
Engine
 Data aggregated from four repositories
with plans for additional repositories
 A web-based search engine over de-
identified data
 Our goal was to develop a controlled
application ontology to support search
capabilities
OHSU Biolibrary system
Search application
Two search interfaces
(with no data integration)
Limited free text
search
Search application
Search through anatomy and
histology lists
Multiple wizard-like
forms
Example coded data vs. pathology
report
(Available structured data from one case)
However, pathology report also includes:
• Low grade pancreatic intraepithelial neoplasia
• Extensive perineural invasion
• Acute and chronic cholecystitis
• Bile duct tissue with chronic inflammation
• Chronic pancreatitis
• Acute gastric serositis
Entity recognition with
MetaMap
Selected mapping examples
(the same report from earlier)
Final Pathologic Diagnosis:
A: Gallbladder, cholecystectomy:
- Acute and chronic cholecystitis
- Negative for malignancy
B: Bile ductular tissue, biopsy:
- Bile duct tissue with chronic
inflammation
- Negative for malignancy
C: Superior mesenteric vein
margin, biopsy:
- Vascular tissue with no
diagnostic abnormality
- Negative for malignancy
D: Portal vein margin, biopsy:
- Fibroconnective tissue with no
diagnostic abnormality
- Negative for malignancy
Selected mapping examples
(the same report from earlier)
Selected mapping examples
(the same report from earlier)
E:
Pancreas, stomach, duodenum, pancreaticogast
roduodenectomy:
- Pancreatic ductal adenocarcinoma, grade
2/3, invading peripancreatic fat
- Size: 3 cm in greatest dimension
- Pancreatic neck margin positive for invasive
carcinoma (please see comment)
- Superior mesenteric artery margin negative
at 0.2 cm from invasive tumor, deep
pancreatic margin negative at 0.6 cm from
invasive tumor
- Extensive perineural invasion present
- No angiolymphatic invasion identified
- Metastatic pancreatic ductal
adenocarcinoma present in two of ten
peripancreatic lymph nodes (2/10)
Deriving an OWL ontology for DL
queries
Adding relationships
(developing an application ontology to support
search)
 “subclass of” axioms generated based on the UMLS hierarchy
table
 Mapped entities were augmented with transitive closure of
parents
 “part of” axioms were generated by aggregating many
mereological relationships from the UMLS relationship table
 Relate anatomy, pathology, and disease entities with
SMOMED-CT disorder/disease definitions
Adding relationships
(developing an application ontology to support
search)
 Problematic multiple and cyclic inheritance resolved manually
 Resulted in an OWL ontology that supports useful DL queries
along the “subclass of” and “part of/has part” axes. Examples:
• Retrieve all pathologies (limited to a type if needed) that
affect an anatomical site (± all parts)
• Retrieve all anatomical sites with a specific type of pathology
• List all pathologies/sites for a disease
• Etc.
 The MetaMap mappings were saved in a database table. After
relevant concepts are identified with a DL query, a database
query can find actual reports.
SNOMED-CT examples of disorder definitions
(used to relate anatomy to pathology in the application ontology)
Application integration
 Integration with existing application was limited to
appending the annotations to the text of pathology
reports
| C1521733 C0332144 0:26 | C0016976 32:44 |
C0205178 63:70 | …
 Annotations (CUIs and location) are then indexed in
Solr and can be searched with the existing free text
search form. (after a DL query on the OWL file)
A simple DL query for anatomy
(linked to actual report in the mapping table)
Difficulties and limitations
 “Structured” text in pathology reports is not in natural
language, making it perform less well using MetaMap
 Named entity recognition helps with document retrieval
but extraction of structured data is more valuable
 Negation detection is poor but very important
 Significant multiple inheritance and subsumption cycles
(inappropriate equivalences) when several UMLS
vocabularies are used to derive an OWL representation
 Short project, no access to full reports, limited
computational resources
Conclusions
 OHSU Biolibrary is adding many other specimen
collections, need for better search will increase
 Can use NER to enhance the data with SNOMED-CT
 Interest in identifying references in pathology reports
to specimen blocks and slides to annotate these
resources as well
 Still limited resources for supporting sophisticated
terminology and semantic efforts….
Thanks
 Dr. Chris Corless
 Rob Schuff
 Medical Research Foundation of Oregon

More Related Content

Similar to Enabling semantic search in a bio-specimen repository - ICBO 2013

Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
anwesha bhattacharya
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
Chimezie Ogbuji
 
Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)
Barry Smith
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
Koray Atalag
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
Peter Embi
 
openEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational ModelsopenEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational Models
Koray Atalag
 
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
Andre Freitas
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Koray Atalag
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
Russ Altman
 
PLOS Visualization Project
PLOS Visualization ProjectPLOS Visualization Project
PLOS Visualization Project
Access Innovations, Inc.
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
lucenerevolution
 
A statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelA statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell level
Shashaanka Ashili
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
Joel Saltz
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013
mhaendel
 
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
Janos Hajagos
 
Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
Vasa Curcin
 
Big data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcareBig data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcare
redpel dot com
 
Confocal Laser Endomicroscopy and In Vivo Optical Coherence Tomography
Confocal Laser Endomicroscopy and In Vivo Optical Coherence TomographyConfocal Laser Endomicroscopy and In Vivo Optical Coherence Tomography
Confocal Laser Endomicroscopy and In Vivo Optical Coherence Tomography
Dr Felipe Templo Jr
 
How is machine learning significant to computational pathology in the pharmac...
How is machine learning significant to computational pathology in the pharmac...How is machine learning significant to computational pathology in the pharmac...
How is machine learning significant to computational pathology in the pharmac...
Pubrica
 

Similar to Enabling semantic search in a bio-specimen repository - ICBO 2013 (20)

Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
openEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational ModelsopenEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational Models
 
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
PLOS Visualization Project
PLOS Visualization ProjectPLOS Visualization Project
PLOS Visualization Project
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
A statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelA statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell level
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013
 
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
 
Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
 
Big data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcareBig data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcare
 
Confocal Laser Endomicroscopy and In Vivo Optical Coherence Tomography
Confocal Laser Endomicroscopy and In Vivo Optical Coherence TomographyConfocal Laser Endomicroscopy and In Vivo Optical Coherence Tomography
Confocal Laser Endomicroscopy and In Vivo Optical Coherence Tomography
 
How is machine learning significant to computational pathology in the pharmac...
How is machine learning significant to computational pathology in the pharmac...How is machine learning significant to computational pathology in the pharmac...
How is machine learning significant to computational pathology in the pharmac...
 

More from mhaendel

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
mhaendel
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
mhaendel
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
mhaendel
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
mhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
mhaendel
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
mhaendel
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
mhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
mhaendel
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
mhaendel
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
mhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
mhaendel
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
mhaendel
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
mhaendel
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
mhaendel
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
mhaendel
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
mhaendel
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
mhaendel
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
mhaendel
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
mhaendel
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
mhaendel
 

More from mhaendel (20)

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Enabling semantic search in a bio-specimen repository - ICBO 2013

  • 1. Enabling semantic search in a bio-specimen repository July 9th, 2013 ICBO 2013 Shahim Essaid, Carlo Torniai, and Melissa Haendel
  • 2. OHSU’s Biolibrary Search Engine  Data aggregated from four repositories with plans for additional repositories  A web-based search engine over de- identified data  Our goal was to develop a controlled application ontology to support search capabilities
  • 4. Search application Two search interfaces (with no data integration) Limited free text search
  • 5. Search application Search through anatomy and histology lists Multiple wizard-like forms
  • 6. Example coded data vs. pathology report (Available structured data from one case) However, pathology report also includes: • Low grade pancreatic intraepithelial neoplasia • Extensive perineural invasion • Acute and chronic cholecystitis • Bile duct tissue with chronic inflammation • Chronic pancreatitis • Acute gastric serositis
  • 8. Selected mapping examples (the same report from earlier) Final Pathologic Diagnosis: A: Gallbladder, cholecystectomy: - Acute and chronic cholecystitis - Negative for malignancy B: Bile ductular tissue, biopsy: - Bile duct tissue with chronic inflammation - Negative for malignancy
  • 9. C: Superior mesenteric vein margin, biopsy: - Vascular tissue with no diagnostic abnormality - Negative for malignancy D: Portal vein margin, biopsy: - Fibroconnective tissue with no diagnostic abnormality - Negative for malignancy Selected mapping examples (the same report from earlier)
  • 10. Selected mapping examples (the same report from earlier) E: Pancreas, stomach, duodenum, pancreaticogast roduodenectomy: - Pancreatic ductal adenocarcinoma, grade 2/3, invading peripancreatic fat - Size: 3 cm in greatest dimension - Pancreatic neck margin positive for invasive carcinoma (please see comment) - Superior mesenteric artery margin negative at 0.2 cm from invasive tumor, deep pancreatic margin negative at 0.6 cm from invasive tumor - Extensive perineural invasion present - No angiolymphatic invasion identified - Metastatic pancreatic ductal adenocarcinoma present in two of ten peripancreatic lymph nodes (2/10)
  • 11. Deriving an OWL ontology for DL queries
  • 12. Adding relationships (developing an application ontology to support search)  “subclass of” axioms generated based on the UMLS hierarchy table  Mapped entities were augmented with transitive closure of parents  “part of” axioms were generated by aggregating many mereological relationships from the UMLS relationship table  Relate anatomy, pathology, and disease entities with SMOMED-CT disorder/disease definitions
  • 13. Adding relationships (developing an application ontology to support search)  Problematic multiple and cyclic inheritance resolved manually  Resulted in an OWL ontology that supports useful DL queries along the “subclass of” and “part of/has part” axes. Examples: • Retrieve all pathologies (limited to a type if needed) that affect an anatomical site (± all parts) • Retrieve all anatomical sites with a specific type of pathology • List all pathologies/sites for a disease • Etc.  The MetaMap mappings were saved in a database table. After relevant concepts are identified with a DL query, a database query can find actual reports.
  • 14. SNOMED-CT examples of disorder definitions (used to relate anatomy to pathology in the application ontology)
  • 15. Application integration  Integration with existing application was limited to appending the annotations to the text of pathology reports | C1521733 C0332144 0:26 | C0016976 32:44 | C0205178 63:70 | …  Annotations (CUIs and location) are then indexed in Solr and can be searched with the existing free text search form. (after a DL query on the OWL file)
  • 16. A simple DL query for anatomy (linked to actual report in the mapping table)
  • 17. Difficulties and limitations  “Structured” text in pathology reports is not in natural language, making it perform less well using MetaMap  Named entity recognition helps with document retrieval but extraction of structured data is more valuable  Negation detection is poor but very important  Significant multiple inheritance and subsumption cycles (inappropriate equivalences) when several UMLS vocabularies are used to derive an OWL representation  Short project, no access to full reports, limited computational resources
  • 18. Conclusions  OHSU Biolibrary is adding many other specimen collections, need for better search will increase  Can use NER to enhance the data with SNOMED-CT  Interest in identifying references in pathology reports to specimen blocks and slides to annotate these resources as well  Still limited resources for supporting sophisticated terminology and semantic efforts….
  • 19. Thanks  Dr. Chris Corless  Rob Schuff  Medical Research Foundation of Oregon

Editor's Notes

  1. I changed it to “four” because the next diagram shows 4 sources. Our goal was narrowed down to enhancing IR over the large free text content. We limited the NLP to surgical pathology since other types of reports have text that is even less natural
  2. KCI has tissue level info, sparse morphology and histology coding, CR has patient level info
  3. The key point, other than the free text (and poor/limited use of terminology in next slide) is that the data from the two systems is filtered for each top UI. I think the UIs were implemented for their corresponding clients and they didn’t want to see each other’s data, maybe.This slide shows the form for the free text search with the other forms and top two sections on the left.
  4. Parentheses is counts – even though there are 600K records, maybe only a few records were mapped from CR back to pathology dataBasically all structured data is cancer dataHave to select all entries that have adrenal gland in label, cannot select its parts or synonyms, etc.Wizards on the left allow filtering/query on some data fields
  5. This is one report. The table shows the existing coded data (the site and diagnosis are from the cancer registry but the ICD9 code is from pathology). This is the only coded data for the content of the report. There are additional data fields (names, dates, treatments, etc.) but pathology is limited to this.
  6. This shows that the UMLS imports biomedical vocabularies into the three tables shown. There are many more tables that hold additional information but there are the main tables. External vocabularies are analyzed (their labels) with the UMLS lexical tools and related resources to do an automated import (matching to existing content/cui in the UMLS) that is then manually curated.The expanding content of the UMLS is then used to enhance the lexical tools because each new terminology brings in new lexical iformation, and the cycle continues. MetaMap used the lexical tools to do its work and generates outputs that are coded/mapped with UMLS CUIs, and indications for which vocabulary and terms supported the mappings. MetaMap has many components and they are listed in the diagram. Most components can take customized training content but by default MetaMap is based on the full UMLS release.
  7. term, start location by character, end location, score (lower is better, -1000 is better than -800 )The table show the mappings for the shown text
  8. Same as before. See the odd “blood group antigen D” due to the D: bullet in the report. The is MetaMap attempting do recognize acronyms.
  9. E: is considered an acronym for “elementary charge”
  10. The small diagram tries to show the last bullet. The next slide has two examples.
  11. Two disorders/diseases. The definitions have groups that relate specific morphology to specific sites. F fully qualified nameP preferred nameD defining relationshipQ qualifierC current concept