SlideShare a Scribd company logo
1 of 20
Hybrid methodology for information extraction
from tables in biomedical literature
Nikola Milošević, Cassie Gregson, Robert Hernandez, Goran Nenadić
Contact: nikola.milosevic@manchester.ac.uk
Literature growth
• MEDLINE contains more than 26 million citations
• Number of citation is growing exponentially
• 2100 new articles published daily in biomedicine
• Professionals are no more able to cope with the state-of-the-art
Text mining
Source: https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
Table mining
• Current text mining efforts focus on main text of the article
• Usually ignore tables and figures
• Tables contain
• Settings of the experiment (patient characteristics, arms, dosages, etc.)
• Results of the experiment
• Definition of terms and quantitative scales
• Examples (i.e. questionnaires)
• …
• Article information are incomplete without tables (and figures)
Table complexity
One dimensional (list) table Two dimensional (matrix) table
Table complexity (2)
Multi-dimensional (super-row) table
Multi-dimensional (multi-table) table
Challenges
• Dense content
• Variety of layouts
• Variety of value representation formats
• Misleading visualization markup
• Lack of resources (labelled datasets)
Aim and objectives
• Create a multi-layered approach to mining information from
tables
• to facilitate largescale semi-automated extraction
• curation of data stored in tables
Table mining methodology overview
Functional processing
• Classifies cells to functional classes
• Header,
• super-row,
• stub,
• data
• Uses heuristics based on content and position
• Described in:
Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G.
Disentangling structure of tables in scientific literature.
In Proceedings of the 21th International Conference on Applications of Natural Language to
Information Systems (NLDB 2016) (2016), Springer.
Structural processing
• Determines relationships between cells
• Using cell functions and table structure classifies
table into one of the structural table type:
• List
• Matrix
• Super-row
• Multi-table
• Based on the type, set of rules resolves the relationships
• Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G.
Disentangling structure of tables in scientific literature.
In Proceedings of the 21th International Conference on Applications of Natural
Language to Information Systems (NLDB 2016) (2016), Springer.
Semantic tagging
• Semantically tags terms, phrases or words
• Knowledge sources (UMLS, DBPedia, WordNet)
• Used MetaMap for tagging with UMLS
• Helps with pragmatic classification and information extraction
Pragmatic processing
• Determines the purpose of the table
• Machine learning approach
• Naïve Bayes, Bayes Nets, SVM, Decision trees, random forests
• More specific classes -> better results
• Evidence based on 2 trials
• Settings, findings, support tables - ~ 80% F-score
• Baseline characteristics, Adverse events, Inclusion/Exclusion, Other - ~95%
F-score
Value identification and syntactic
processing
• Indemnifying the cell of interest:
• Looks at the navigational cells for lexical cues or for semantic types in
tags
• Lexical cues in white and black lists
• Syntactic processing
• Uses set of pattern to determine semantics of the value
• Extracts the selected value
Pragmatic classification results
• Pragmatic classification performs well with specific classes
• 4 classes – baseline characteristics, adverse events,
inclusion/exclusion, other
• Best performance - SVM
Information extraction results
• Extracted number of patiens
• New tests on extracting patient age, adverse events (using
UMLS)
Patiens’ age
Adverse reactions
Lessons learned
• Table mining requires multi-layered analysis
• Functional and structural analysis are crucial
• Semantics of value presentation patterns
• Semantic tagging helps
• Machine learning helps in certain steps (i.e. pragmatic analysis)
• Combination of heuristic based and machine learning based
steps
• Availability:
• https://github.com/nikolamilosevic86/TableAnnotator
• https://github.com/nikolamilosevic86/TableInformationExtractionScripts
Future plans
• Develop easy to use methodology
• Develop UI tool (wizard) for information extraction from tables
• Improve the methodology
• Compare heuristic based vs machine learning based IE
• Examine methods for unbalanced datasets
Acknowledgements
Dr Michele Filannino
Dr Azad Dehghan
Nikola Milošević
Ruth Stoney
Maksim Belousov
Dr Goran Nenadić
Robert Hernandez
Cassie Gregson
Richard Boyce
Jodi Schneider Steven DeMarco
nikola.milosevic@manchester.ac.uk

More Related Content

What's hot

Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project Renée Schulz
 
NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)Middlesex University
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2malathieswaran29
 
Research-only rankings of HEIs: Is it possible to measure scientific performa...
Research-only rankings of HEIs:Is it possible to measure scientific performa...Research-only rankings of HEIs:Is it possible to measure scientific performa...
Research-only rankings of HEIs: Is it possible to measure scientific performa...Ludo Waltman
 
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...KISK FF MU
 
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...MEYS, MŠMT in Czech
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPjaya lakshmi
 

What's hot (11)

How to access databases
How to access databasesHow to access databases
How to access databases
 
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
 
NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
Research-only rankings of HEIs: Is it possible to measure scientific performa...
Research-only rankings of HEIs:Is it possible to measure scientific performa...Research-only rankings of HEIs:Is it possible to measure scientific performa...
Research-only rankings of HEIs: Is it possible to measure scientific performa...
 
Aist2014
Aist2014Aist2014
Aist2014
 
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
 
20090813MEETING
20090813MEETING20090813MEETING
20090813MEETING
 
relational database
relational databaserelational database
relational database
 
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
 

Similar to BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature

Preparing data and documentation for digital curation
Preparing data and documentation for digital curationPreparing data and documentation for digital curation
Preparing data and documentation for digital curationArhiv družboslovnih podatkov
 
Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...Arhiv družboslovnih podatkov
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataKatja Šnuderl
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basicNivaTripathy2
 
10 Years of Multi-Label Learning
10 Years of Multi-Label Learning10 Years of Multi-Label Learning
10 Years of Multi-Label LearningGrigorios Tsoumakas
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptxChinna Chadayan
 
APSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptxAPSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptxMariaMalikAwan
 
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...Christina Silver
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceSusanna-Assunta Sansone
 
Mixed Methods Research Designs
Mixed Methods Research DesignsMixed Methods Research Designs
Mixed Methods Research DesignsJibran Mohsin
 
Mixed Methods Designs
Mixed Methods DesignsMixed Methods Designs
Mixed Methods DesignsJibran Mohsin
 
Extracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureExtracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureNikola Milosevic
 
Introduction To Research Methodology
Introduction To Research MethodologyIntroduction To Research Methodology
Introduction To Research MethodologyMero Eye
 

Similar to BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature (20)

Deposit data to data centre: ADP case
Deposit data to data centre: ADP caseDeposit data to data centre: ADP case
Deposit data to data centre: ADP case
 
Preparing data and documentation for digital curation
Preparing data and documentation for digital curationPreparing data and documentation for digital curation
Preparing data and documentation for digital curation
 
0 introduction
0  introduction0  introduction
0 introduction
 
Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 
10 Years of Multi-Label Learning
10 Years of Multi-Label Learning10 Years of Multi-Label Learning
10 Years of Multi-Label Learning
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptx
 
APSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptxAPSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptx
 
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
 
Relational databases
Relational databasesRelational databases
Relational databases
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
Mixed Methods Research Designs
Mixed Methods Research DesignsMixed Methods Research Designs
Mixed Methods Research Designs
 
Mixed Methods Designs
Mixed Methods DesignsMixed Methods Designs
Mixed Methods Designs
 
Extracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureExtracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literature
 
Introduction To Research Methodology
Introduction To Research MethodologyIntroduction To Research Methodology
Introduction To Research Methodology
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature

  • 1. Hybrid methodology for information extraction from tables in biomedical literature Nikola Milošević, Cassie Gregson, Robert Hernandez, Goran Nenadić Contact: nikola.milosevic@manchester.ac.uk
  • 2. Literature growth • MEDLINE contains more than 26 million citations • Number of citation is growing exponentially • 2100 new articles published daily in biomedicine • Professionals are no more able to cope with the state-of-the-art
  • 4. Table mining • Current text mining efforts focus on main text of the article • Usually ignore tables and figures • Tables contain • Settings of the experiment (patient characteristics, arms, dosages, etc.) • Results of the experiment • Definition of terms and quantitative scales • Examples (i.e. questionnaires) • … • Article information are incomplete without tables (and figures)
  • 5. Table complexity One dimensional (list) table Two dimensional (matrix) table
  • 6. Table complexity (2) Multi-dimensional (super-row) table Multi-dimensional (multi-table) table
  • 7. Challenges • Dense content • Variety of layouts • Variety of value representation formats • Misleading visualization markup • Lack of resources (labelled datasets)
  • 8. Aim and objectives • Create a multi-layered approach to mining information from tables • to facilitate largescale semi-automated extraction • curation of data stored in tables
  • 10. Functional processing • Classifies cells to functional classes • Header, • super-row, • stub, • data • Uses heuristics based on content and position • Described in: Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G. Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.
  • 11. Structural processing • Determines relationships between cells • Using cell functions and table structure classifies table into one of the structural table type: • List • Matrix • Super-row • Multi-table • Based on the type, set of rules resolves the relationships • Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G. Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.
  • 12. Semantic tagging • Semantically tags terms, phrases or words • Knowledge sources (UMLS, DBPedia, WordNet) • Used MetaMap for tagging with UMLS • Helps with pragmatic classification and information extraction
  • 13. Pragmatic processing • Determines the purpose of the table • Machine learning approach • Naïve Bayes, Bayes Nets, SVM, Decision trees, random forests • More specific classes -> better results • Evidence based on 2 trials • Settings, findings, support tables - ~ 80% F-score • Baseline characteristics, Adverse events, Inclusion/Exclusion, Other - ~95% F-score
  • 14. Value identification and syntactic processing • Indemnifying the cell of interest: • Looks at the navigational cells for lexical cues or for semantic types in tags • Lexical cues in white and black lists • Syntactic processing • Uses set of pattern to determine semantics of the value • Extracts the selected value
  • 15. Pragmatic classification results • Pragmatic classification performs well with specific classes • 4 classes – baseline characteristics, adverse events, inclusion/exclusion, other • Best performance - SVM
  • 16. Information extraction results • Extracted number of patiens • New tests on extracting patient age, adverse events (using UMLS) Patiens’ age Adverse reactions
  • 17. Lessons learned • Table mining requires multi-layered analysis • Functional and structural analysis are crucial • Semantics of value presentation patterns • Semantic tagging helps • Machine learning helps in certain steps (i.e. pragmatic analysis) • Combination of heuristic based and machine learning based steps • Availability: • https://github.com/nikolamilosevic86/TableAnnotator • https://github.com/nikolamilosevic86/TableInformationExtractionScripts
  • 18. Future plans • Develop easy to use methodology • Develop UI tool (wizard) for information extraction from tables • Improve the methodology • Compare heuristic based vs machine learning based IE • Examine methods for unbalanced datasets
  • 19. Acknowledgements Dr Michele Filannino Dr Azad Dehghan Nikola Milošević Ruth Stoney Maksim Belousov Dr Goran Nenadić Robert Hernandez Cassie Gregson Richard Boyce Jodi Schneider Steven DeMarco