SlideShare a Scribd company logo
1 of 27
Download to read offline
ChemNLP
A Natural Language Processing based Library
for Materials Chemistry Text Data
Kamal Choudhary
https://jarvis.nist.gov/
NIST, Gaithersburg, MD, USA
Polymer group
7/13/2023
1
Joint Automated Repository for Various Integrated Simulations
Outline
2
• Introduction
• AI for Materials
• JARVIS
• NLP basics
• ChemNLP
• Datasets
• TextClassification
• TokenClassification
• WebApp
• TextSummarization
• TextGeneration
• Integrating DFT database
• JARVIS-Leaderboard/benchmarking
• Hands-on
• Summary
Electronic structure
DFT,DMFT,
TB,QMC
Quantum
Computation
AtomQC
Force-Field
JARVIS-FF
ALIGNN-FF
AI/ML
CFID
ALIGNN
AtomVision
ChemNLP
AI for Materials Science
3
Established: January 2017
Published: >40 articles
Users: >20000+ users worldwide
Materials: >80000, millions of properties
Events:
• Quantum Matters in Materials Science (QMMS)
• Artificial Intelligence for Materials Science (AIMS)
• JARVIS-School
User-comments:
• “There are many different theoretical levels on which you can
approach the field. JARVIS is unusual in that it spans more levels
than other databases.”
• “A pure gold-mine for the data-quality effort…”
• Thanks for your generous sharing. Your works inspire me a lot.
• “You guys are doing something really beneficial…”
• “I find JARVIS-DFT very useful for my research…”
JARVIS: Databases, Tools, Events, Outreach
4
https://jarvis.nist.gov
Requires login credentials, free registration
Updates
• 80,000 materials
• QMC, tight binding, ALIGNN, ALIGNN-FF,
• AtomVision, ChemNLP, JARVIS-Leaderboard
• Quantum Computation algorithms
• Superconductors (bulk and 2D), magnetic topological mats.
Recent Updates to JARVIS
Tools
Used for hands-on session!
ChemNLP
Text classification & Token classification
Text summarization & Text-generation
Conventional NLP: TFIDF
https://www.kaggle.com/code/ashoksrinivas/nlp-with-tfidf-neural-networks
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://i.stack.imgur.com/mtmP6.png
Transformers & “Attention Is All You Need”
Much better than RNN, LSTM etc.
Attention: extremely long-term memory
https://towardsdatascience.com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0
ChemNLP Datasets
Exploratory data analysis (EDA)
Accuracy
Text classification
Named Entity Recognition/Token classification
87 % F1 score
ChemNLP Webpage for Composition Search
https://jarvis.nist.gov/jarvischemnlp/
ChemNLP
Abstractive summarization (Abstract to Title)
Google’s T5-225million transformer model
ROGUE-1 score: 46.5 %
Text generation (Title to Abstract)
GPT2-medium LLM model
ROGUE-1 score: 32 %
Without fine tuning: 26 %
ROUGE:
Recall-Oriented Understudy for Gisting Evaluation
T5: Text-to-Text Transfer Transformer
Other test cases
19
ChemNLP for supercondutors
Confusion matrix for text classification (137927 articles) • arXiv cond-mat.supr-con and JARVIS-SuperconDB
• Venn diagram for chemical formula
ChatGPT response
JARVIS-Leaderboard: Large Scale Benchmark
Challenges in materials science community:
• Reproducibility
• Transparency
• Validation
• Fidelity
• Data vs. metadata
• What is the ground truth/reference to compare our models to?
How does this change depending on the model?
• Synergy of computational and experimental databases
Community
effort to tackle
challenges:
https://pages.nist.gov/jarvis_leaderboard/
JARVIS-Leaderboard: Contributors
• Growing list of collaborators
• Multi-institutional effort
• Contributions are welcomed and
encouraged from community!
JARVIS-Leaderboard: Methods and Data
Types of Data:
• Atomic structure (Molecule, Crystal)
• Material Property (Bandgap, bulk modulus)
• Images (Microscopy: SEM, TEM, STM)
• Spectra (Diffraction: X-ray, Neutron, PL)
• Text (Research articles, notebooks, blogs)
• Eigensolver (Quantum Computation algorithms)
1) Electronic Structure
2) Artificial Intelligence
3) Force Field
4) Quantum Computation
5) Experiment
JARVIS-Leaderboard: Benchmarks
Contributions
1) Electronic Structure
2) Artificial Intelligence
3) Force Field
4) Quantum Computation
5) Experiment
Benchmarks (reference point)
1) Experiment/s
2) Test dataset
3) Electronic Structure
4) Analytical results
5) Other Experiments
Error metrics
*Benchmarks must be well-defined with an associated DOI
JARVIS-Leaderboard: Snapshot
JARVIS-Leaderboard: Snapshot
Hands-on session notebooks (later)
Natural Language Processing [44,45]
1. ChemNLP example (Part I)
2. ChemNLP example (Part II)
JARVIS-Leaderboard [5]
Analyzing benchmarks in the JARVIS-Leaderboard
27
Summary
• NIST-JARVIS infrastructure with multiple components
• ChemNLP for solids currently, expand to polymers…
• Several events to engage (sign-up today & Demo!)
• Continuously growing, contribute, collaborate…
https://jarvis.nist.gov
https://github.com/usnistgov/jarvis
https://github.com/usnistgov/alignn
https://github.com/usnistgov/atomvision
https://github.com/usnistgov/chemnlp
https://github.com/usnistgov/atomqc
https://github.com/usnistgov/jarvis_leaderboard
Email: kamal.choudhary@nist.gov,
@dr_k_choudhary
@knc6
Slides:https://www.slideshare.net/KAMALCHOUDHARY4

More Related Content

Similar to ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data

The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignKAMAL CHOUDHARY
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Insights from Knowledge Graphs
Insights from Knowledge GraphsInsights from Knowledge Graphs
Insights from Knowledge GraphsAnirudh Prabhu
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphBesnik Fetahu
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through DatabaseNina Jeliazkova
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
Sustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials ModelingSustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials ModelingSoftwarePractice
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 

Similar to ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data (20)

The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Insights from Knowledge Graphs
Insights from Knowledge GraphsInsights from Knowledge Graphs
Insights from Knowledge Graphs
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data Graph
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
Sustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials ModelingSustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials Modeling
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 

More from KAMAL CHOUDHARY

NIST-JARVIS infrastructure for Improved Materials Design
NIST-JARVIS infrastructure for Improved Materials DesignNIST-JARVIS infrastructure for Improved Materials Design
NIST-JARVIS infrastructure for Improved Materials DesignKAMAL CHOUDHARY
 
Quantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsQuantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsKAMAL CHOUDHARY
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
 
Database of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageDatabase of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageKAMAL CHOUDHARY
 
Elastic properties of bulk and low-dimensional materials using Van der Waals ...
Elastic properties of bulk and low-dimensional materials using Van der Waals ...Elastic properties of bulk and low-dimensional materials using Van der Waals ...
Elastic properties of bulk and low-dimensional materials using Van der Waals ...KAMAL CHOUDHARY
 
High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...KAMAL CHOUDHARY
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...KAMAL CHOUDHARY
 
Computational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discoveryComputational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discoveryKAMAL CHOUDHARY
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...KAMAL CHOUDHARY
 

More from KAMAL CHOUDHARY (9)

NIST-JARVIS infrastructure for Improved Materials Design
NIST-JARVIS infrastructure for Improved Materials DesignNIST-JARVIS infrastructure for Improved Materials Design
NIST-JARVIS infrastructure for Improved Materials Design
 
Quantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsQuantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of Solids
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
Database of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageDatabase of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit Spillage
 
Elastic properties of bulk and low-dimensional materials using Van der Waals ...
Elastic properties of bulk and low-dimensional materials using Van der Waals ...Elastic properties of bulk and low-dimensional materials using Van der Waals ...
Elastic properties of bulk and low-dimensional materials using Van der Waals ...
 
High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
 
Computational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discoveryComputational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discovery
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data

  • 1. ChemNLP A Natural Language Processing based Library for Materials Chemistry Text Data Kamal Choudhary https://jarvis.nist.gov/ NIST, Gaithersburg, MD, USA Polymer group 7/13/2023 1 Joint Automated Repository for Various Integrated Simulations
  • 2. Outline 2 • Introduction • AI for Materials • JARVIS • NLP basics • ChemNLP • Datasets • TextClassification • TokenClassification • WebApp • TextSummarization • TextGeneration • Integrating DFT database • JARVIS-Leaderboard/benchmarking • Hands-on • Summary Electronic structure DFT,DMFT, TB,QMC Quantum Computation AtomQC Force-Field JARVIS-FF ALIGNN-FF AI/ML CFID ALIGNN AtomVision ChemNLP
  • 3. AI for Materials Science 3
  • 4. Established: January 2017 Published: >40 articles Users: >20000+ users worldwide Materials: >80000, millions of properties Events: • Quantum Matters in Materials Science (QMMS) • Artificial Intelligence for Materials Science (AIMS) • JARVIS-School User-comments: • “There are many different theoretical levels on which you can approach the field. JARVIS is unusual in that it spans more levels than other databases.” • “A pure gold-mine for the data-quality effort…” • Thanks for your generous sharing. Your works inspire me a lot. • “You guys are doing something really beneficial…” • “I find JARVIS-DFT very useful for my research…” JARVIS: Databases, Tools, Events, Outreach 4 https://jarvis.nist.gov Requires login credentials, free registration
  • 5. Updates • 80,000 materials • QMC, tight binding, ALIGNN, ALIGNN-FF, • AtomVision, ChemNLP, JARVIS-Leaderboard • Quantum Computation algorithms • Superconductors (bulk and 2D), magnetic topological mats. Recent Updates to JARVIS
  • 8. Text classification & Token classification
  • 9. Text summarization & Text-generation
  • 11. Transformers & “Attention Is All You Need” Much better than RNN, LSTM etc. Attention: extremely long-term memory https://towardsdatascience.com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0
  • 13.
  • 15. Named Entity Recognition/Token classification 87 % F1 score
  • 16. ChemNLP Webpage for Composition Search https://jarvis.nist.gov/jarvischemnlp/
  • 17. ChemNLP Abstractive summarization (Abstract to Title) Google’s T5-225million transformer model ROGUE-1 score: 46.5 % Text generation (Title to Abstract) GPT2-medium LLM model ROGUE-1 score: 32 % Without fine tuning: 26 % ROUGE: Recall-Oriented Understudy for Gisting Evaluation T5: Text-to-Text Transfer Transformer
  • 19. 19 ChemNLP for supercondutors Confusion matrix for text classification (137927 articles) • arXiv cond-mat.supr-con and JARVIS-SuperconDB • Venn diagram for chemical formula ChatGPT response
  • 20. JARVIS-Leaderboard: Large Scale Benchmark Challenges in materials science community: • Reproducibility • Transparency • Validation • Fidelity • Data vs. metadata • What is the ground truth/reference to compare our models to? How does this change depending on the model? • Synergy of computational and experimental databases Community effort to tackle challenges: https://pages.nist.gov/jarvis_leaderboard/
  • 21. JARVIS-Leaderboard: Contributors • Growing list of collaborators • Multi-institutional effort • Contributions are welcomed and encouraged from community!
  • 22. JARVIS-Leaderboard: Methods and Data Types of Data: • Atomic structure (Molecule, Crystal) • Material Property (Bandgap, bulk modulus) • Images (Microscopy: SEM, TEM, STM) • Spectra (Diffraction: X-ray, Neutron, PL) • Text (Research articles, notebooks, blogs) • Eigensolver (Quantum Computation algorithms) 1) Electronic Structure 2) Artificial Intelligence 3) Force Field 4) Quantum Computation 5) Experiment
  • 23. JARVIS-Leaderboard: Benchmarks Contributions 1) Electronic Structure 2) Artificial Intelligence 3) Force Field 4) Quantum Computation 5) Experiment Benchmarks (reference point) 1) Experiment/s 2) Test dataset 3) Electronic Structure 4) Analytical results 5) Other Experiments Error metrics *Benchmarks must be well-defined with an associated DOI
  • 26. Hands-on session notebooks (later) Natural Language Processing [44,45] 1. ChemNLP example (Part I) 2. ChemNLP example (Part II) JARVIS-Leaderboard [5] Analyzing benchmarks in the JARVIS-Leaderboard
  • 27. 27 Summary • NIST-JARVIS infrastructure with multiple components • ChemNLP for solids currently, expand to polymers… • Several events to engage (sign-up today & Demo!) • Continuously growing, contribute, collaborate… https://jarvis.nist.gov https://github.com/usnistgov/jarvis https://github.com/usnistgov/alignn https://github.com/usnistgov/atomvision https://github.com/usnistgov/chemnlp https://github.com/usnistgov/atomqc https://github.com/usnistgov/jarvis_leaderboard Email: kamal.choudhary@nist.gov, @dr_k_choudhary @knc6 Slides:https://www.slideshare.net/KAMALCHOUDHARY4