SlideShare a Scribd company logo
1 of 22
Mapping the Pubmed data
under different sub-topics
Email: venkykasprov@gmail.com
Venkatasubramani Karthikeyan
PROBLEM STATEMENT
Analogy Implementation
PROBLEM SOLVING APPROACH
Traditional approach
Data cleaning
Bag of words
Classification and clustering
Pre-Trained Model approach
No data cleaning required
BERT, BART & DEBARTA
ORIGINAL CATEGORIES CATEGORIES CONSIDERED
Traditional
approach
• Bag of words
Traditional
approach
• Bag of words
• After Remove stop words and stemming
• Using count vectorizer
Traditional
approach
• Classification
• Logistic regression
Traditional
approach
• Classification
• Logistic regression (cont)
Traditional
approach
• Classification (cont)
• Decision Tree
Entropy
Information Gain
Traditional
approach
• Classification (cont)
• Decision Tree
Traditional
approach
• Classification (cont)
• Random Forest
Traditional
approach
• Classification (cont)
• Random Forest
Traditional
approach
• Clustering
Traditional
approach
• Clustering
Traditional
approach
• Clustering (cont)
Hierarchical clustering HDBSCAN
Traditional
approach
• Clustering (cont)
Pre-trained
model approach Transformer
Pre-trained
model approach HuggingFace Transformers
Pre-trained
model approach
• BERT (Bidirectional Encoder Representations
from Transformers)
• Developed by Google in 2018.
• Revolutionary for its bidirectional training approach.
• BERT is pre-trained on a large corpus of unlabeled text
data.
id parent_title level_3 labels scores
126 293Big Data 0Bio-IT 0.645831
127 293Big Data 1Big Data 0.612736
128 293Big Data 2
Healthcare
Technology
0.602229
129 293Big Data 3
Disease
Processes
0.521784
• 🎉 40th Anniversary Special: IBM unveils the
eServer zSeries 890 (z890) mainframe, celebrating four
decades of their System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking
tech aimed at simplifying IT environments, tailored especially
for medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in
flexibility, virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with
28 capacity settings, letting businesses align server capacity
with specific needs.
• 📦 Advanced Storage: Introduction of
IBM TotalStorage Enterprise Storage Server 750, bringing
enterprise-grade storage capabilities to mid-sized businesses.
Pre-trained
model approach
• BART (Bidirectional and Auto-Regressive
Transformers)
• Developed by Facebook in 2019.
• BART is a denoising autoencoder for pretraining
sequence-to-sequence models.
• It corrupts the input by masking and then learns to
reconstruct the original data.
• 🎉 40th Anniversary Special: IBM unveils the eServer zSeries
890 (z890) mainframe, celebrating four decades of their
System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking tech
aimed at simplifying IT environments, tailored especially for
medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in flexibility,
virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with 28
capacity settings, letting businesses align server capacity with
specific needs.
• 📦 Advanced Storage: Introduction of IBM TotalStorage
Enterprise Storage Server 750, bringing enterprise-grade
storage capabilities to mid-sized businesses.
id parent_title level_3 labels scores
126 293Big Data 0Big Data 0.677244
127 293Big Data 1Proteomics 0.636867
128 293Big Data 2
Disease
Processes
0.511485
129 293Big Data 3Bio-IT 0.480203
Pre-trained
model approach
• DeBERTa (Decoding-enhanced BERT with
disentangled attention)
• Developed by Microsoft in 2020.
• Improves BERT by disentangling the content and position
information in the self-attention mechanism.
• 🎉 40th Anniversary Special: IBM unveils the
eServer zSeries 890 (z890) mainframe, celebrating four decades
of their System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking
tech aimed at simplifying IT environments, tailored especially
for medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in
flexibility, virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with
28 capacity settings, letting businesses align server capacity
with specific needs.
• 📦 Advanced Storage: Introduction of
IBM TotalStorage Enterprise Storage Server 750, bringing
enterprise-grade storage capabilities to mid-sized businesses.
id parent_title
level_
3
labels scores
126 293Big Data 0Big Data 0.808621
127 293Big Data 1Cell Biology 0.764249
128 293Big Data 2
Food
Bioscience
0.754545
129 293Big Data 3Green Biology 0.700146
if questions==True:
Ask()
else:
Thank_you()

More Related Content

Similar to Mapping the pubmed data under different suptopics using NLP.pptx

Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseStorage Switzerland
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahMatěj Jakimov
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation Joe Krotz
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3xKinAnx
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesAshnikbiz
 
SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)ITCamp
 
Sirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehSirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehEric Herzog
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryMemVerge
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataDenodo
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire NetApp
 
Techgate solution sets 2014
Techgate solution sets 2014Techgate solution sets 2014
Techgate solution sets 2014Techgate plc
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Ali Mirfallah
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...Brian Boyd
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? Hitachi Vantara
 
Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Joe Krotz
 

Similar to Mapping the pubmed data under different suptopics using NLP.pptx (20)

Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the Enterprise
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson Darkwah
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional Databases
 
SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)
 
Sirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehSirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 eh
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than Memory
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP Data
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire
 
FS900 Data Sheet.PDF
FS900 Data Sheet.PDFFS900 Data Sheet.PDF
FS900 Data Sheet.PDF
 
Techgate solution sets 2014
Techgate solution sets 2014Techgate solution sets 2014
Techgate solution sets 2014
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
 
Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 

Mapping the pubmed data under different suptopics using NLP.pptx

  • 1. Mapping the Pubmed data under different sub-topics Email: venkykasprov@gmail.com Venkatasubramani Karthikeyan
  • 3. PROBLEM SOLVING APPROACH Traditional approach Data cleaning Bag of words Classification and clustering Pre-Trained Model approach No data cleaning required BERT, BART & DEBARTA
  • 6. Traditional approach • Bag of words • After Remove stop words and stemming • Using count vectorizer
  • 9. Traditional approach • Classification (cont) • Decision Tree Entropy Information Gain
  • 19. Pre-trained model approach • BERT (Bidirectional Encoder Representations from Transformers) • Developed by Google in 2018. • Revolutionary for its bidirectional training approach. • BERT is pre-trained on a large corpus of unlabeled text data. id parent_title level_3 labels scores 126 293Big Data 0Bio-IT 0.645831 127 293Big Data 1Big Data 0.612736 128 293Big Data 2 Healthcare Technology 0.602229 129 293Big Data 3 Disease Processes 0.521784 • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses.
  • 20. Pre-trained model approach • BART (Bidirectional and Auto-Regressive Transformers) • Developed by Facebook in 2019. • BART is a denoising autoencoder for pretraining sequence-to-sequence models. • It corrupts the input by masking and then learns to reconstruct the original data. • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses. id parent_title level_3 labels scores 126 293Big Data 0Big Data 0.677244 127 293Big Data 1Proteomics 0.636867 128 293Big Data 2 Disease Processes 0.511485 129 293Big Data 3Bio-IT 0.480203
  • 21. Pre-trained model approach • DeBERTa (Decoding-enhanced BERT with disentangled attention) • Developed by Microsoft in 2020. • Improves BERT by disentangling the content and position information in the self-attention mechanism. • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses. id parent_title level_ 3 labels scores 126 293Big Data 0Big Data 0.808621 127 293Big Data 1Cell Biology 0.764249 128 293Big Data 2 Food Bioscience 0.754545 129 293Big Data 3Green Biology 0.700146