SlideShare a Scribd company logo
1 of 20
Hybrid methodology for information extraction
from tables in biomedical literature
Nikola Milošević, Cassie Gregson, Robert Hernandez, Goran Nenadić
Contact: nikola.milosevic@manchester.ac.uk
Literature growth
• MEDLINE contains more than 26 million citations
• Number of citation is growing exponentially
• 2100 new articles published daily in biomedicine
• Professionals are no more able to cope with the state-of-the-art
Text mining
Source: https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
Table mining
• Current text mining efforts focus on main text of the article
• Usually ignore tables and figures
• Tables contain
• Settings of the experiment (patient characteristics, arms, dosages, etc.)
• Results of the experiment
• Definition of terms and quantitative scales
• Examples (i.e. questionnaires)
• …
• Article information are incomplete without tables (and figures)
Table complexity
One dimensional (list) table Two dimensional (matrix) table
Table complexity (2)
Multi-dimensional (super-row) table
Multi-dimensional (multi-table) table
Challenges
• Dense content
• Variety of layouts
• Variety of value representation formats
• Misleading visualization markup
• Lack of resources (labelled datasets)
Aim and objectives
• Create a multi-layered approach to mining information from
tables
• to facilitate largescale semi-automated extraction
• curation of data stored in tables
Table mining methodology overview
Functional processing
• Classifies cells to functional classes
• Header,
• super-row,
• stub,
• data
• Uses heuristics based on content and position
• Described in:
Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G.
Disentangling structure of tables in scientific literature.
In Proceedings of the 21th International Conference on Applications of Natural Language to
Information Systems (NLDB 2016) (2016), Springer.
Structural processing
• Determines relationships between cells
• Using cell functions and table structure classifies
table into one of the structural table type:
• List
• Matrix
• Super-row
• Multi-table
• Based on the type, set of rules resolves the relationships
• Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G.
Disentangling structure of tables in scientific literature.
In Proceedings of the 21th International Conference on Applications of Natural
Language to Information Systems (NLDB 2016) (2016), Springer.
Semantic tagging
• Semantically tags terms, phrases or words
• Knowledge sources (UMLS, DBPedia, WordNet)
• Used MetaMap for tagging with UMLS
• Helps with pragmatic classification and information extraction
Pragmatic processing
• Determines the purpose of the table
• Machine learning approach
• Naïve Bayes, Bayes Nets, SVM, Decision trees, random forests
• More specific classes -> better results
• Evidence based on 2 trials
• Settings, findings, support tables - ~ 80% F-score
• Baseline characteristics, Adverse events, Inclusion/Exclusion, Other - ~95%
F-score
Value identification and syntactic
processing
• Indemnifying the cell of interest:
• Looks at the navigational cells for lexical cues or for semantic types in
tags
• Lexical cues in white and black lists
• Syntactic processing
• Uses set of pattern to determine semantics of the value
• Extracts the selected value
Pragmatic classification results
• Pragmatic classification performs well with specific classes
• 4 classes – baseline characteristics, adverse events,
inclusion/exclusion, other
• Best performance - SVM
Information extraction results
• Extracted number of patiens
• New tests on extracting patient age, adverse events (using
UMLS)
Patiens’ age
Adverse reactions
Lessons learned
• Table mining requires multi-layered analysis
• Functional and structural analysis are crucial
• Semantics of value presentation patterns
• Semantic tagging helps
• Machine learning helps in certain steps (i.e. pragmatic analysis)
• Combination of heuristic based and machine learning based
steps
• Availability:
• https://github.com/nikolamilosevic86/TableAnnotator
• https://github.com/nikolamilosevic86/TableInformationExtractionScripts
Future plans
• Develop easy to use methodology
• Develop UI tool (wizard) for information extraction from tables
• Improve the methodology
• Compare heuristic based vs machine learning based IE
• Examine methods for unbalanced datasets
Acknowledgements
Dr Michele Filannino
Dr Azad Dehghan
Nikola Milošević
Ruth Stoney
Maksim Belousov
Dr Goran Nenadić
Robert Hernandez
Cassie Gregson
Richard Boyce
Jodi Schneider Steven DeMarco
nikola.milosevic@manchester.ac.uk

More Related Content

What's hot

Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project Renée Schulz
 
NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)Middlesex University
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2malathieswaran29
 
Research-only rankings of HEIs: Is it possible to measure scientific performa...
Research-only rankings of HEIs:Is it possible to measure scientific performa...Research-only rankings of HEIs:Is it possible to measure scientific performa...
Research-only rankings of HEIs: Is it possible to measure scientific performa...Ludo Waltman
 
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...KISK FF MU
 
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...MEYS, MŠMT in Czech
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPjaya lakshmi
 

What's hot (11)

How to access databases
How to access databasesHow to access databases
How to access databases
 
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
 
NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)NSPC Introduction to the library (2021)
NSPC Introduction to the library (2021)
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
Research-only rankings of HEIs: Is it possible to measure scientific performa...
Research-only rankings of HEIs:Is it possible to measure scientific performa...Research-only rankings of HEIs:Is it possible to measure scientific performa...
Research-only rankings of HEIs: Is it possible to measure scientific performa...
 
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
Kristina Berketa, Nikolina Peša Pavlović, Drahomira Cupar: Do library users k...
 
Aist2014
Aist2014Aist2014
Aist2014
 
20090813MEETING
20090813MEETING20090813MEETING
20090813MEETING
 
relational database
relational databaserelational database
relational database
 
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
IS VaVaI as the information tool for the new Institutional Evaluation Methodo...
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
 

Similar to BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature

Preparing data and documentation for digital curation
Preparing data and documentation for digital curationPreparing data and documentation for digital curation
Preparing data and documentation for digital curationArhiv družboslovnih podatkov
 
Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...Arhiv družboslovnih podatkov
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataKatja Šnuderl
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basicNivaTripathy2
 
10 Years of Multi-Label Learning
10 Years of Multi-Label Learning10 Years of Multi-Label Learning
10 Years of Multi-Label LearningGrigorios Tsoumakas
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptxChinna Chadayan
 
APSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptxAPSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptxMariaMalikAwan
 
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...Christina Silver
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceSusanna-Assunta Sansone
 
Mixed Methods Research Designs
Mixed Methods Research DesignsMixed Methods Research Designs
Mixed Methods Research DesignsJibran Mohsin
 
Mixed Methods Designs
Mixed Methods DesignsMixed Methods Designs
Mixed Methods DesignsJibran Mohsin
 
Extracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureExtracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureNikola Milosevic
 
Introduction To Research Methodology
Introduction To Research MethodologyIntroduction To Research Methodology
Introduction To Research MethodologyMero Eye
 

Similar to BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature (20)

Deposit data to data centre: ADP case
Deposit data to data centre: ADP caseDeposit data to data centre: ADP case
Deposit data to data centre: ADP case
 
Preparing data and documentation for digital curation
Preparing data and documentation for digital curationPreparing data and documentation for digital curation
Preparing data and documentation for digital curation
 
0 introduction
0  introduction0  introduction
0 introduction
 
Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...Handling quantitative data and preparing for sharing and reuse, including dat...
Handling quantitative data and preparing for sharing and reuse, including dat...
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 
10 Years of Multi-Label Learning
10 Years of Multi-Label Learning10 Years of Multi-Label Learning
10 Years of Multi-Label Learning
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptx
 
APSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptxAPSY3206 Lecture 1.pptx
APSY3206 Lecture 1.pptx
 
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
 
Relational databases
Relational databasesRelational databases
Relational databases
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
Mixed Methods Research Designs
Mixed Methods Research DesignsMixed Methods Research Designs
Mixed Methods Research Designs
 
Mixed Methods Designs
Mixed Methods DesignsMixed Methods Designs
Mixed Methods Designs
 
Extracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureExtracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literature
 
Introduction To Research Methodology
Introduction To Research MethodologyIntroduction To Research Methodology
Introduction To Research Methodology
 

More from Nikola Milosevic

Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...Nikola Milosevic
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Nikola Milosevic
 
AI an the future of society
AI an the future of societyAI an the future of society
AI an the future of societyNikola Milosevic
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock marketsNikola Milosevic
 
Equity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learningEquity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learningNikola Milosevic
 
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP SeraphimdroidMobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP SeraphimdroidNikola Milosevic
 
Table mining and data curation from biomedical literature
Table mining and data curation from biomedical literatureTable mining and data curation from biomedical literature
Table mining and data curation from biomedical literatureNikola Milosevic
 
Sentiment analysis for Serbian language
Sentiment analysis for Serbian languageSentiment analysis for Serbian language
Sentiment analysis for Serbian languageNikola Milosevic
 
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture Nikola Milosevic
 
Mašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jezikuMašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jezikuNikola Milosevic
 
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...Nikola Milosevic
 
Software Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenjaSoftware Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenjaNikola Milosevic
 
OWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfigurationOWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfigurationNikola Milosevic
 

More from Nikola Milosevic (20)

Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
Veštačka inteligencija
Veštačka inteligencijaVeštačka inteligencija
Veštačka inteligencija
 
AI an the future of society
AI an the future of societyAI an the future of society
AI an the future of society
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock markets
 
Equity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learningEquity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learning
 
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP SeraphimdroidMobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
 
Serbia2
Serbia2Serbia2
Serbia2
 
Table mining and data curation from biomedical literature
Table mining and data curation from biomedical literatureTable mining and data curation from biomedical literature
Table mining and data curation from biomedical literature
 
Malware
MalwareMalware
Malware
 
Sentiment analysis for Serbian language
Sentiment analysis for Serbian languageSentiment analysis for Serbian language
Sentiment analysis for Serbian language
 
Http and security
Http and securityHttp and security
Http and security
 
Android business models
Android business modelsAndroid business models
Android business models
 
Android(1)
Android(1)Android(1)
Android(1)
 
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
 
Mašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jezikuMašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jeziku
 
Malware
MalwareMalware
Malware
 
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
 
Software Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenjaSoftware Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenja
 
OWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfigurationOWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfiguration
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature

  • 1. Hybrid methodology for information extraction from tables in biomedical literature Nikola Milošević, Cassie Gregson, Robert Hernandez, Goran Nenadić Contact: nikola.milosevic@manchester.ac.uk
  • 2. Literature growth • MEDLINE contains more than 26 million citations • Number of citation is growing exponentially • 2100 new articles published daily in biomedicine • Professionals are no more able to cope with the state-of-the-art
  • 4. Table mining • Current text mining efforts focus on main text of the article • Usually ignore tables and figures • Tables contain • Settings of the experiment (patient characteristics, arms, dosages, etc.) • Results of the experiment • Definition of terms and quantitative scales • Examples (i.e. questionnaires) • … • Article information are incomplete without tables (and figures)
  • 5. Table complexity One dimensional (list) table Two dimensional (matrix) table
  • 6. Table complexity (2) Multi-dimensional (super-row) table Multi-dimensional (multi-table) table
  • 7. Challenges • Dense content • Variety of layouts • Variety of value representation formats • Misleading visualization markup • Lack of resources (labelled datasets)
  • 8. Aim and objectives • Create a multi-layered approach to mining information from tables • to facilitate largescale semi-automated extraction • curation of data stored in tables
  • 10. Functional processing • Classifies cells to functional classes • Header, • super-row, • stub, • data • Uses heuristics based on content and position • Described in: Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G. Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.
  • 11. Structural processing • Determines relationships between cells • Using cell functions and table structure classifies table into one of the structural table type: • List • Matrix • Super-row • Multi-table • Based on the type, set of rules resolves the relationships • Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G. Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.
  • 12. Semantic tagging • Semantically tags terms, phrases or words • Knowledge sources (UMLS, DBPedia, WordNet) • Used MetaMap for tagging with UMLS • Helps with pragmatic classification and information extraction
  • 13. Pragmatic processing • Determines the purpose of the table • Machine learning approach • Naïve Bayes, Bayes Nets, SVM, Decision trees, random forests • More specific classes -> better results • Evidence based on 2 trials • Settings, findings, support tables - ~ 80% F-score • Baseline characteristics, Adverse events, Inclusion/Exclusion, Other - ~95% F-score
  • 14. Value identification and syntactic processing • Indemnifying the cell of interest: • Looks at the navigational cells for lexical cues or for semantic types in tags • Lexical cues in white and black lists • Syntactic processing • Uses set of pattern to determine semantics of the value • Extracts the selected value
  • 15. Pragmatic classification results • Pragmatic classification performs well with specific classes • 4 classes – baseline characteristics, adverse events, inclusion/exclusion, other • Best performance - SVM
  • 16. Information extraction results • Extracted number of patiens • New tests on extracting patient age, adverse events (using UMLS) Patiens’ age Adverse reactions
  • 17. Lessons learned • Table mining requires multi-layered analysis • Functional and structural analysis are crucial • Semantics of value presentation patterns • Semantic tagging helps • Machine learning helps in certain steps (i.e. pragmatic analysis) • Combination of heuristic based and machine learning based steps • Availability: • https://github.com/nikolamilosevic86/TableAnnotator • https://github.com/nikolamilosevic86/TableInformationExtractionScripts
  • 18. Future plans • Develop easy to use methodology • Develop UI tool (wizard) for information extraction from tables • Improve the methodology • Compare heuristic based vs machine learning based IE • Examine methods for unbalanced datasets
  • 19. Acknowledgements Dr Michele Filannino Dr Azad Dehghan Nikola Milošević Ruth Stoney Maksim Belousov Dr Goran Nenadić Robert Hernandez Cassie Gregson Richard Boyce Jodi Schneider Steven DeMarco