SlideShare a Scribd company logo
Topic Extraction for Domain Ontology
Guided By:
Prof. S.B. Karthick
Project By:
Keerti Bhogaraju TY – C - 13
Pratiksha Jadhav TY – C – 50
Rasika Khatke TY – C - 66
Prajakta Jawale TY – C - 71
BRACT’s VISHWAKARMA INSTITUTE of TECHNOLOGY
Pune
Department of Computer Engineering
 Domain Ontology
Domain ontology is a collection of vocabularies and the specifications of the conceptualization of a
given domain (Gruber, 1993)
Examples: -
 Specific Domain chosen
Knowledge - based search engine in the field of science for students in 3rd, 4th and 5th grade.
Example: - The human body systems
 Purpose of Topic Extraction
• To Identify relevant concepts hidden in the corpus of documents
• To obtain terms which may be considered as linguistic realizations of domain specific concepts
• To assign every term found on the corpus to a specific context
• To classify documents for information discovery
• To identify key concepts and the relationships among them in ontology
 Project Development Stages
i. Obtain domain knowledge
ii. PDF to document conversion
iii. “Cleansing” of the document
a. Tokenizing
b. Filtering (Removal of stop words)
iv. Applying either of the methods mentioned below: -
a. Clustering using K-Means algorithm
b. Topic Modeling – Latent Dirichlet Allocation (LDA)
v. Extraction of topics
 Method1: Clustering using K-Means
 Clustering is the process of partitioning a group of data points into a small number of clusters
 K- Means clustering is a method of vector quantization which aims to partition n observations
into k clusters in which each observation belongs to the clusters with the nearest mean, serving
as a prototype of the cluster.
 Algorithm: -
 About K-Means
1. Initial centroids are often chosen randomly.
2. The centroid is (typically) the mean of the points in the cluster.
3. ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc.
Euclidean Distance
4. We use the following equation to calculate the n dimensional centroid point amidst k n-
dimensional points
 Example of K-Means
 Example of K-Means
 Advantages and Limitations of K-
Means
Advantages
 If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering,
if we keep k small.
 K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.
Limitations
 Difficult to predict K-Value
 Doesn’t work well with globular clusters
 Different initial partitions can result in different final clusters
 It does not work well with clusters (in the original data) of Different size and Different density
 Method2: Topic Modeling using LDA
 Useful for organizing large blocks of textual data, information retrieval from unstructured text
and feature selection.
 A process to automatically identify topics present in a text object and to derive hidden
patterns exhibited by a text corpus. Thus, assisting better decision making.
 LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are
represented as random mixtures over latent topics, where a topic is characterized by a
distribution over words.
 Latent Dirichlet Allocation (LDA)
Algorithm
1. A new topic “k” is assigned to word “w” with a probability P which is a product of two
probabilities p1 and p2.
2. p1 –> p(topic t / document d) = the proportion of words in document d that are currently
assigned to topic t.
3. p2 –> p(word w / topic t) = the proportion of assignments to topic t over all documents that
come from this word w.
4. The current topic – word assignment is updated with a new topic with the probability,
product of p1 and p2
5. Iterates through each word “w” for each document “d” and tries to adjust the current topic –
word assignment with a new assignment.
 Dirichlet Distribution
 Example of LDA
 Example of LDA
 Advantages and Limitations of LDA
 The advantages of LDA is that LDA is a probabilistic model with interpretable topics.
The disadvantages are that it is hard to know when LDA is working --- topics are soft-clusters so
there is no objective metric to say "this is the best choice" of hyper parameters.
 Natural Language Text Processing
 Natural Language Processing (NLP) refers to AI method of communicating with an intelligent
systems using a natural language such as English.
Techniques from NLP used in the project: -
i.) Tokenizing
ii.) Stop words
iii.) Named Entity Recognition
iv.) POS Tagging
v.) Lemmatizing
 Future Scope
 Information Extraction
 Retrieval of relations and hierarchies among concepts
 Ontology Building
 Tools and Libraries used: -
 Programming Language: Python
 Libraries used: - nltk
gensim
scikitlearn
quandl
pandas
 References
 https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/
 sentdex machine learning algorithms tutorial
 sentdex nltk tutorial
 Automatic Building of an Ontology from a Corpus of Text Documents Using Data Mining Tools
J. I. Toledo-Alvarado*, A. Guzmán-Arenas, G. L. Martínez-Luna
Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN)
Av. Juan de Dios Báti
 K-means Algorithm Cluster Analysis in Data Mining by Zijun Zhang
Thank You

More Related Content

What's hot

5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
Soojung Hong
 
Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all that
Zhibo Xiao
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
Yueshen Xu
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Bhaskar Mitra
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
KozoChikai
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
Farheen Nilofer
 
Seminar dm
Seminar dmSeminar dm
Seminar dm
MHDAmmarALkelany
 
The Duet model
The Duet modelThe Duet model
The Duet model
Bhaskar Mitra
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
The vector space model
The vector space modelThe vector space model
The vector space model
pkgosh
 
E43022023
E43022023E43022023
E43022023
IJERA Editor
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
inscit2006
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)KU Leuven
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
Shubhmay Potdar
 
Ir 03
Ir   03Ir   03
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
Bhaskar Mitra
 

What's hot (20)

5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all that
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
 
Seminar dm
Seminar dmSeminar dm
Seminar dm
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
E43022023
E43022023E43022023
E43022023
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
 
Ir 03
Ir   03Ir   03
Ir 03
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 

Similar to Topic Extraction on Domain Ontology

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
cscpconf
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
csandit
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
ijma
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
AI Publications
 
G04124041046
G04124041046G04124041046
G04124041046
IOSR-JEN
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
Parang Saraf
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Nimrita Koul
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
Carlos Badenes-Olmedo
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
Co-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text ClassificationCo-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text Classification
paperpublications3
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
ICDEcCnferenece
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Leonardo Di Donato
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Jennifer D'Souza
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Andre Freitas
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
dannyijwest
 

Similar to Topic Extraction on Domain Ontology (20)

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
 
G04124041046
G04124041046G04124041046
G04124041046
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
Co-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text ClassificationCo-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text Classification
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 

Topic Extraction on Domain Ontology

  • 1. Topic Extraction for Domain Ontology Guided By: Prof. S.B. Karthick Project By: Keerti Bhogaraju TY – C - 13 Pratiksha Jadhav TY – C – 50 Rasika Khatke TY – C - 66 Prajakta Jawale TY – C - 71 BRACT’s VISHWAKARMA INSTITUTE of TECHNOLOGY Pune Department of Computer Engineering
  • 2.  Domain Ontology Domain ontology is a collection of vocabularies and the specifications of the conceptualization of a given domain (Gruber, 1993) Examples: -
  • 3.  Specific Domain chosen Knowledge - based search engine in the field of science for students in 3rd, 4th and 5th grade. Example: - The human body systems
  • 4.  Purpose of Topic Extraction • To Identify relevant concepts hidden in the corpus of documents • To obtain terms which may be considered as linguistic realizations of domain specific concepts • To assign every term found on the corpus to a specific context • To classify documents for information discovery • To identify key concepts and the relationships among them in ontology
  • 5.  Project Development Stages i. Obtain domain knowledge ii. PDF to document conversion iii. “Cleansing” of the document a. Tokenizing b. Filtering (Removal of stop words) iv. Applying either of the methods mentioned below: - a. Clustering using K-Means algorithm b. Topic Modeling – Latent Dirichlet Allocation (LDA) v. Extraction of topics
  • 6.  Method1: Clustering using K-Means  Clustering is the process of partitioning a group of data points into a small number of clusters  K- Means clustering is a method of vector quantization which aims to partition n observations into k clusters in which each observation belongs to the clusters with the nearest mean, serving as a prototype of the cluster.  Algorithm: -
  • 7.  About K-Means 1. Initial centroids are often chosen randomly. 2. The centroid is (typically) the mean of the points in the cluster. 3. ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc. Euclidean Distance 4. We use the following equation to calculate the n dimensional centroid point amidst k n- dimensional points
  • 8.  Example of K-Means
  • 9.  Example of K-Means
  • 10.  Advantages and Limitations of K- Means Advantages  If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering, if we keep k small.  K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular. Limitations  Difficult to predict K-Value  Doesn’t work well with globular clusters  Different initial partitions can result in different final clusters  It does not work well with clusters (in the original data) of Different size and Different density
  • 11.  Method2: Topic Modeling using LDA  Useful for organizing large blocks of textual data, information retrieval from unstructured text and feature selection.  A process to automatically identify topics present in a text object and to derive hidden patterns exhibited by a text corpus. Thus, assisting better decision making.  LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words.
  • 12.  Latent Dirichlet Allocation (LDA) Algorithm 1. A new topic “k” is assigned to word “w” with a probability P which is a product of two probabilities p1 and p2. 2. p1 –> p(topic t / document d) = the proportion of words in document d that are currently assigned to topic t. 3. p2 –> p(word w / topic t) = the proportion of assignments to topic t over all documents that come from this word w. 4. The current topic – word assignment is updated with a new topic with the probability, product of p1 and p2 5. Iterates through each word “w” for each document “d” and tries to adjust the current topic – word assignment with a new assignment.
  • 16.  Advantages and Limitations of LDA  The advantages of LDA is that LDA is a probabilistic model with interpretable topics. The disadvantages are that it is hard to know when LDA is working --- topics are soft-clusters so there is no objective metric to say "this is the best choice" of hyper parameters.
  • 17.  Natural Language Text Processing  Natural Language Processing (NLP) refers to AI method of communicating with an intelligent systems using a natural language such as English. Techniques from NLP used in the project: - i.) Tokenizing ii.) Stop words iii.) Named Entity Recognition iv.) POS Tagging v.) Lemmatizing
  • 18.  Future Scope  Information Extraction  Retrieval of relations and hierarchies among concepts  Ontology Building
  • 19.  Tools and Libraries used: -  Programming Language: Python  Libraries used: - nltk gensim scikitlearn quandl pandas
  • 20.  References  https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/  sentdex machine learning algorithms tutorial  sentdex nltk tutorial  Automatic Building of an Ontology from a Corpus of Text Documents Using Data Mining Tools J. I. Toledo-Alvarado*, A. Guzmán-Arenas, G. L. Martínez-Luna Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN) Av. Juan de Dios Báti  K-means Algorithm Cluster Analysis in Data Mining by Zijun Zhang