SlideShare a Scribd company logo
Keyphrase Extraction And Source
Code Similarity Detection- A Survey
Paper ID 123
Presented By
Nakul Sharma,Dr. Prasanth Yalla
Department of Computer Science and Engineering
Koneru Lakshmaiah Education Foundation
Vaddeswaram,Guntur-522502, India.
Agenda
• Abstract
• Introduction
• Literature Survey on Keyphrase Extraction & Source
Code Similarity Calculation
– LR on survey papers
– LR on Keyphrase Extraction
– LR on Source Code Similarity Calculation
• Evaluation of System Developed
• Recommendation System Keyphrase V/S Source Code
based
• Conclusion
Abstract
• Keyphrase extraction is the starting phase of feature extraction and
many other NLP tasks.
• Hence, keyphrase extraction ensures separation of essential and
relevant information from rest of the document or corpus.
• Similarity score is used for detecting similarity between two or
more documents. In this paper, a summary of existing research
conducted in the field of keyphrase extraction and source code
similarity detection is presented.
• The literature focuses on more concept based research instead of
going towards domain level. There exist a promising potential for
creation of recommendation systems by employing similarly
situated research techniques.
Introduction
• Keyphrase Extraction (KE) involves getting the
essential and relevant words from a given
document or a text.
• Used in text mining and its allied fields.
• Source Code Analysis (SCA) forms the basis for
program understanding and comprehension.
• Source Code Analysis can also aid in
development of recommendation system for
the developer all its allied roles.
Source Code Based Similarity
Measures
Keyphrase Extraction Techniques
LR on survey papers
• An Overview of Graph-Based Keyword Extraction
Methods and Approaches
– The authors, in this work, provide an introduction to
Keyphrase Extraction basics. The authors provide a
classification of KE methods research. This is followed by
the categorization of keyphrase extraction techniques. The
paper provides a strong mathematical base for graph-
based keyword extraction research.
• Automatic keyphrase extraction: a survey and trends”
– The paper by Zakariae et. al. provides a major review on
the current existing state-of-art research on Automatic
Keyphrase Extraction (APKE).
Literature Review Of Existing Work On
Keyphrase Extraction
Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work
A New Multi-lingual
Knowledge-base
Approach to Keyphrase
Extraction for the Italian
Language
Authors propose creating a multi-lingual
tool for extracting keyphrases of English
and Italian language
Authors did not consider keyphrase extraction for
different languages.
Keyphrase generation can be extended for other
languages also.
Knowledge-Based
Techniques for Scholarly
Data Access: Towards
Automatic Curation
The authors propose a system that is
useful for researchers by using data
mining, NLP, clustering, Graph-based
approaches.
This is a major project & needs multiple domain level
experts for its proper execution
A Distributed
Framework for NLP-
Based Keyword and
Keyphrase Extraction
From Web Pages and
Documents
The authors use keyphrase-extraction in
cloud based systems. GATE
architecture is employed on cloud. The
web crawler is also integrated into the
system
The more testing of GATE based applications can be
done. Hadoop implementation can also be extended
Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work
A New Multi-lingual
Knowledge-base
Approach to Keyphrase
Extraction for the Italian
Language
Authors propose creating a multi-lingual
tool for extracting keyphrases of English
and Italian language
Authors did not consider keyphrase extraction for
different languages.
Keyphrase generation can be extended for other
languages also.
Knowledge-Based
Techniques for Scholarly
Data Access: Towards
Automatic Curation
The authors propose a system that is
useful for researchers by using data
mining, NLP, clustering, Graph-based
approaches.
This is a major project & needs multiple domain level
experts for its proper execution
A Distributed
Framework for NLP-
Based Keyword and
Keyphrase Extraction
From Web Pages and
Documents
The authors use keyphrase-extraction in
cloud based systems. GATE
architecture is employed on cloud. The
web crawler is also integrated into the
system
The more testing of GATE based applications can be
done. Hadoop implemetation can also be extended
Building a Construction
Project Key-Phrase
Network from
Unstructured Text
Documents
For unstructured text, authors propose
document management system, project
keyphrase network, support libraries &
system for civil engineering projects
The current work can be extended for contextual
awareness, interactive interface and controlled
vocabulary
Identifying Candidate
Tasks for Robotic
Process
Automation in Textual
Process Descriptions
Authors identify& classify tasks using
BPMN for textual processes
In practice textual descriptions are not the same as
provided in the paper.
Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work
Simple Unsupervised
Keyphrase Extraction
using Sentence
Embeddings
Authors make use of EmbedRank, to
extract keyphrases from single document.
The unsupervised ML algorithm is used
for the process. Sentence embedding is
employed in the process.
Unsupervised approach may introduce biasing due to
inherit noise in the document.
How Document Pre-
processing affects
Keyphrase Extraction
Performance
Authors reassess the performance of five
keyphrase extraction models and
conclude that document pre-processing
affects performance.
More keyphrase extraction models such as
ExpandRank, SingleRank can be used for assessment.
An Unsupervised
Approach for Keyphrase
Extraction Using Within-
Collection Resources
Authors use graph-based ranking for
checking relevance across the words. The
topic-based cluster for the purpose of
semantic enrichment.
The work is restricted to only scholarly documents
Keyphrase Extraction
Based on Prior
Knowledge
Authors make use of controlled
vocabulary, prior-probability & supervised
ML for Keyphrase Extraction
None mentioned
Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work
21
A graph-based
unsupervised N-gram
filtration technique
for automatic
keyphrase extraction
Authors used ngram approach for
candidate selection and ranking of
candidate keyphrases.
As resources become multilingual, it is possible
to extend the authors work.
22
Feature selection,
optimization and
clustering strategies
of text documents
Authors provided a review of
clustering and its allied research
areas including similarity measures.
Authors conclude text clustering
research focussed on dynamic &
Heterogeneous clustering
applications
Community-driven clustering could not be
addressed in the given literature.
23
Keyphrase extraction
as sequence labeling
using contextualized
embeddings
Authors make use of BiLSTM-CRF
ML technique architecture for
keyphrase extraction and including
embedding the context.
The authors work can be extended to keyphrase
generation
24
DAKE: Document-
Level Attention for
Keyphrase Extraction
Authors makes use bi-directional
LSTM which extracts keyphrases
from documents.
The source documents are not multilingual
Literature Review of Research Papers
Related to Source Code Analysis
Based Similarity Index
Title of Paper Advantage Research of Paper Disadvantage/Future Scope of Research
Paper
Building Program Vector
Representations for Deep
Learning
Author applied vector representations to code
analysis.
DL can be applied to complete domain of SCA.
TF-IDF Inspired Detection
for Cross Language Source
Code Plagiarism and
Collusion
The author applied vector representations to code
analysis. DL can be applied to complete domain
of SCA
Identifier renaming disguise to be done based
on first occurrence.
A Scalable Code Similarity
Detection with Online
Architecture and Focused
Comparison for Maintaining
Academic Integrity in
Programming
Cosine similarity introduced by the authors
reduces lead to short running time especially with
large token strings.
The techniques proposed can be proposed for
some other programming courses as well.
Evaluation of System Developed
• Keyphrase Extraction based
– Preference measure (bpref)
– Mean Reciprocal Rank (MRR)
– Precision
– Recall
– F-measure
Recommendation System Keyphrase
V/S Source Code based
Keyphrase-Based RS Source Code-Based RS
Part of a Larger System including Source Code-
based RS
Specific to Source Code
Addressed for larger audience Addressed for specific audience
Domain-level concepts can be addressed Domain-level concepts can be addressed
Conclusion
• In the current work, an effort is made to
summarize the existing state-of-art work in
the field of Keyphrase Extraction and similarity
measure based source code analysis.
• Keyphrase extraction is more related to the
pre-processing stage of Source Code Analysis.
Similarity measures denote a statistical term
of measurement of any file based parameter.
References
• A W. Shi, W. Zheng, J. Xu Yu, H. Cheng,L. Zou, "Keyphrase Extraction Using Knowledge Graphs", Data Sci. Eng. (2017) 2:275–288.
https://doi.org/10.1007/s41019-017-0055-z.
• S. Beliga, A. Meštrovic, S. Martincic-Ipšic, "An Overview of Graph-Based Keyword Extraction Methods and Approaches", Journal of Information and
Organizational Sciences, VOL. 39, NO. 1 (2015), PP. 1-20.
• F. Bulgarov, C. Caragea, "A Comparison of Supervised Keyphrase Extraction Models", WWW 2015 Companion, May 18–22, 2015, Florence, Italy, ACM
978-1-4503-3473-0/15/05.http://dx.doi.org/10.1145/2740908.2742776.
• K. Saidul Hasan, V. Ng, "Automatic Keyphrase Extraction: A Survey of the State of the Art", Proceedings of the 52nd Annual Meeting of the
Association for Computational Linguistics, Baltimore, Maryland, USA, June 23-25 2014, pages 1262–1273. http://acl2014.org/acl2014/P14-
1/pdf/P14-1119.pdf
• T. Gupta, "Keyword Extraction: A Review", International Journal of Engineering Applied Sciences and Technology, 2017. Vol. 2, Issue 4, ISSN No. 2455-
2143, Pages 215-220.
• S. Siddiqi, Aditi Sharan, "Keyword and Keyphrase Extraction Techniques: A Literature Review", International Journal of Computer Applications,
Volume 109 – No. 2, January 2015. ISBN: 0975 – 8887.
• Jose Mary, G., & Haritha, D. (2017). A survey on best keyword cover search. Journal of Advanced Research in Dynamical and Control Systems,
9(Special issue 14), 2217-2231. Retrieved from www.scopus.com
• D. Innocenti, D. Nart, C. Tasso, "A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language", In Proceedings of
the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, 78-85, 2014, Rome, Italy, ISBN: 978-989-758-048-
2.
• V. B. T. Santhi, Santhi, M. V. B. T., Sagar Imambi, S., Yamini Devi, J., & Tejaswani, Design and development of information using keyword-element
relationship graph - a critical study. International Journal of Engineering & Technology, [S.l.], v. 7, n. 2.7, p. 1020-1024, mar. 2018. ISSN 2227-524X.
Available at: <https://www.sciencepubco.com/index.php/ijet/article/view/12211>. doi:http://dx.doi.org/10.14419/ijet.v7i2.7.12211.
• D. Nart, C. Tasso, "Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation.", Ph.D. Thesis, Universita degli Studi di
Udine, Italy, 2016.
• S. Gollapalli, X. Li, P. Yang, "Incorporating Expert Knowledge into Keyphrase Extraction", Proceedings of the Thirty-First AAAI Conference on Artificial
Intelligence (AAAI-17), pg.3180-3187,2017.
• P. Nesi, G. Pantaleo and G. Sanesi, "A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents",
DOI reference number: 10.18293/DMS2015-024 , URL: http://www.disit.org/axmedis/791/00000-7915eb81-51cc-4285-bf37-
5a483323ba91/2/~saved-on-db-7915eb81-51cc-4285-bf37-5a483323ba91.pdf
• N. jkovic, Ð., M. Kovacevic, "Building a Construction Project Key-Phrase Network from Unstructured Text Documents.", Journal of Computing in Civil
Engineering 31, no. 6 (2017): 04017058.
• H. Henrik , H. Aa, Hajo A. Reijers,"Identifying Candidate Tasks for Robotic Process Automation in Textual Process Descriptions." ,In Enterprise,
Business-Process and Information Systems Modeling, pp. 67-81, Springer, Cham, 2018.
Cite This Article As
• Nakul Sharma and Prasanth Yalla 2021 IOP
Conf. Ser.: Mater. Sci. Eng. 1074 012027,
https://iopscience.iop.org/article/10.1088/17
57-899X/1074/1/012027
• Complete article available at :
– https://iopscience.iop.org/article/10.1088/1757-
899X/1074/1/012027
Any Questions ?
Thank You

More Related Content

What's hot

A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
 A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code... A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
Nakul Sharma
 
Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
iosrjce
 
A New Metric for Code Readability
A New Metric for Code ReadabilityA New Metric for Code Readability
A New Metric for Code Readability
IOSR Journals
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
Ali Ouni
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
Ali Ouni
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
The University of Adelaide
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
AmmAr mobark
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
The University of Adelaide
 
Software bug prediction
Software bug prediction Software bug prediction
Software bug prediction
Muthukumaran Kasinathan
 
Clean coding in plsql and sql
Clean coding in plsql and sqlClean coding in plsql and sql
Clean coding in plsql and sql
Brendan Furey
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
IRJET Journal
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
IJECEIAES
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
 
The Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyRay Buse
 
A novel data type architecture support for programming languages
A novel data type architecture support for programming languagesA novel data type architecture support for programming languages
A novel data type architecture support for programming languages
ijpla
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
Sebastiano Panichella
 

What's hot (19)

A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
 A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code... A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
 
Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
 
A New Metric for Code Readability
A New Metric for Code ReadabilityA New Metric for Code Readability
A New Metric for Code Readability
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
 
Mannu_Kumar_CV
Mannu_Kumar_CVMannu_Kumar_CV
Mannu_Kumar_CV
 
Software bug prediction
Software bug prediction Software bug prediction
Software bug prediction
 
Clean coding in plsql and sql
Clean coding in plsql and sqlClean coding in plsql and sql
Clean coding in plsql and sql
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
Arch java
Arch javaArch java
Arch java
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 
The Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency Statically
 
A novel data type architecture support for programming languages
A novel data type architecture support for programming languagesA novel data type architecture support for programming languages
A novel data type architecture support for programming languages
 
kapil_2_3years
kapil_2_3yearskapil_2_3years
kapil_2_3years
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
 

Similar to Keyphrase Extraction And Source Code Similarity Detection- A Survey

Review of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationReview of Topic Modeling and Summarization
Review of Topic Modeling and Summarization
IRJET Journal
 
D017232729
D017232729D017232729
D017232729
IOSR Journals
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
INFOGAIN PUBLICATION
 
A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...
ijma
 
Final presentation
Final presentationFinal presentation
Final presentation
Nitish Upreti
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
Andrea Wiggins
 
K0936266
K0936266K0936266
K0936266
IOSR Journals
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
IJCSIS Research Publications
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
Andrea Wiggins
 
Keyword_extraction.pptx
Keyword_extraction.pptxKeyword_extraction.pptx
Keyword_extraction.pptx
BiswarupDas18
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documentsKriti Khanna
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
eSAT Journals
 
Text summarization
Text summarization Text summarization
Text summarization
prateek khandelwal
 
Semantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionSemantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detection
TELKOMNIKA JOURNAL
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
changedaeoh
 
Viva
VivaViva
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 

Similar to Keyphrase Extraction And Source Code Similarity Detection- A Survey (20)

Review of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationReview of Topic Modeling and Summarization
Review of Topic Modeling and Summarization
 
D017232729
D017232729D017232729
D017232729
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
 
K0936266
K0936266K0936266
K0936266
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
 
Keyword_extraction.pptx
Keyword_extraction.pptxKeyword_extraction.pptx
Keyword_extraction.pptx
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documents
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
 
Text summarization
Text summarization Text summarization
Text summarization
 
Resume_Apoorva
Resume_ApoorvaResume_Apoorva
Resume_Apoorva
 
Semantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionSemantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detection
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Viva
VivaViva
Viva
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 

More from Nakul Sharma

Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Nakul Sharma
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a surveyMapping and visualization of source code a survey
Mapping and visualization of source code a survey
Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
Nakul Sharma
 
Possibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering andPossibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering and
Nakul Sharma
 
Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016
Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
Nakul Sharma
 

More from Nakul Sharma (8)

Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a surveyMapping and visualization of source code a survey
Mapping and visualization of source code a survey
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
 
Possibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering andPossibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering and
 
Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 

Recently uploaded

Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 

Recently uploaded (20)

Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 

Keyphrase Extraction And Source Code Similarity Detection- A Survey

  • 1. Keyphrase Extraction And Source Code Similarity Detection- A Survey Paper ID 123 Presented By Nakul Sharma,Dr. Prasanth Yalla Department of Computer Science and Engineering Koneru Lakshmaiah Education Foundation Vaddeswaram,Guntur-522502, India.
  • 2. Agenda • Abstract • Introduction • Literature Survey on Keyphrase Extraction & Source Code Similarity Calculation – LR on survey papers – LR on Keyphrase Extraction – LR on Source Code Similarity Calculation • Evaluation of System Developed • Recommendation System Keyphrase V/S Source Code based • Conclusion
  • 3. Abstract • Keyphrase extraction is the starting phase of feature extraction and many other NLP tasks. • Hence, keyphrase extraction ensures separation of essential and relevant information from rest of the document or corpus. • Similarity score is used for detecting similarity between two or more documents. In this paper, a summary of existing research conducted in the field of keyphrase extraction and source code similarity detection is presented. • The literature focuses on more concept based research instead of going towards domain level. There exist a promising potential for creation of recommendation systems by employing similarly situated research techniques.
  • 4. Introduction • Keyphrase Extraction (KE) involves getting the essential and relevant words from a given document or a text. • Used in text mining and its allied fields. • Source Code Analysis (SCA) forms the basis for program understanding and comprehension. • Source Code Analysis can also aid in development of recommendation system for the developer all its allied roles.
  • 5. Source Code Based Similarity Measures
  • 7. LR on survey papers • An Overview of Graph-Based Keyword Extraction Methods and Approaches – The authors, in this work, provide an introduction to Keyphrase Extraction basics. The authors provide a classification of KE methods research. This is followed by the categorization of keyphrase extraction techniques. The paper provides a strong mathematical base for graph- based keyword extraction research. • Automatic keyphrase extraction: a survey and trends” – The paper by Zakariae et. al. provides a major review on the current existing state-of-art research on Automatic Keyphrase Extraction (APKE).
  • 8. Literature Review Of Existing Work On Keyphrase Extraction Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language Authors propose creating a multi-lingual tool for extracting keyphrases of English and Italian language Authors did not consider keyphrase extraction for different languages. Keyphrase generation can be extended for other languages also. Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation The authors propose a system that is useful for researchers by using data mining, NLP, clustering, Graph-based approaches. This is a major project & needs multiple domain level experts for its proper execution A Distributed Framework for NLP- Based Keyword and Keyphrase Extraction From Web Pages and Documents The authors use keyphrase-extraction in cloud based systems. GATE architecture is employed on cloud. The web crawler is also integrated into the system The more testing of GATE based applications can be done. Hadoop implementation can also be extended
  • 9. Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language Authors propose creating a multi-lingual tool for extracting keyphrases of English and Italian language Authors did not consider keyphrase extraction for different languages. Keyphrase generation can be extended for other languages also. Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation The authors propose a system that is useful for researchers by using data mining, NLP, clustering, Graph-based approaches. This is a major project & needs multiple domain level experts for its proper execution A Distributed Framework for NLP- Based Keyword and Keyphrase Extraction From Web Pages and Documents The authors use keyphrase-extraction in cloud based systems. GATE architecture is employed on cloud. The web crawler is also integrated into the system The more testing of GATE based applications can be done. Hadoop implemetation can also be extended Building a Construction Project Key-Phrase Network from Unstructured Text Documents For unstructured text, authors propose document management system, project keyphrase network, support libraries & system for civil engineering projects The current work can be extended for contextual awareness, interactive interface and controlled vocabulary Identifying Candidate Tasks for Robotic Process Automation in Textual Process Descriptions Authors identify& classify tasks using BPMN for textual processes In practice textual descriptions are not the same as provided in the paper.
  • 10. Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work Simple Unsupervised Keyphrase Extraction using Sentence Embeddings Authors make use of EmbedRank, to extract keyphrases from single document. The unsupervised ML algorithm is used for the process. Sentence embedding is employed in the process. Unsupervised approach may introduce biasing due to inherit noise in the document. How Document Pre- processing affects Keyphrase Extraction Performance Authors reassess the performance of five keyphrase extraction models and conclude that document pre-processing affects performance. More keyphrase extraction models such as ExpandRank, SingleRank can be used for assessment. An Unsupervised Approach for Keyphrase Extraction Using Within- Collection Resources Authors use graph-based ranking for checking relevance across the words. The topic-based cluster for the purpose of semantic enrichment. The work is restricted to only scholarly documents Keyphrase Extraction Based on Prior Knowledge Authors make use of controlled vocabulary, prior-probability & supervised ML for Keyphrase Extraction None mentioned
  • 11. Title of Paper Advantage of Proposed Work Disadvantages of Proposed Work 21 A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction Authors used ngram approach for candidate selection and ranking of candidate keyphrases. As resources become multilingual, it is possible to extend the authors work. 22 Feature selection, optimization and clustering strategies of text documents Authors provided a review of clustering and its allied research areas including similarity measures. Authors conclude text clustering research focussed on dynamic & Heterogeneous clustering applications Community-driven clustering could not be addressed in the given literature. 23 Keyphrase extraction as sequence labeling using contextualized embeddings Authors make use of BiLSTM-CRF ML technique architecture for keyphrase extraction and including embedding the context. The authors work can be extended to keyphrase generation 24 DAKE: Document- Level Attention for Keyphrase Extraction Authors makes use bi-directional LSTM which extracts keyphrases from documents. The source documents are not multilingual
  • 12. Literature Review of Research Papers Related to Source Code Analysis Based Similarity Index Title of Paper Advantage Research of Paper Disadvantage/Future Scope of Research Paper Building Program Vector Representations for Deep Learning Author applied vector representations to code analysis. DL can be applied to complete domain of SCA. TF-IDF Inspired Detection for Cross Language Source Code Plagiarism and Collusion The author applied vector representations to code analysis. DL can be applied to complete domain of SCA Identifier renaming disguise to be done based on first occurrence. A Scalable Code Similarity Detection with Online Architecture and Focused Comparison for Maintaining Academic Integrity in Programming Cosine similarity introduced by the authors reduces lead to short running time especially with large token strings. The techniques proposed can be proposed for some other programming courses as well.
  • 13. Evaluation of System Developed • Keyphrase Extraction based – Preference measure (bpref) – Mean Reciprocal Rank (MRR) – Precision – Recall – F-measure
  • 14. Recommendation System Keyphrase V/S Source Code based Keyphrase-Based RS Source Code-Based RS Part of a Larger System including Source Code- based RS Specific to Source Code Addressed for larger audience Addressed for specific audience Domain-level concepts can be addressed Domain-level concepts can be addressed
  • 15. Conclusion • In the current work, an effort is made to summarize the existing state-of-art work in the field of Keyphrase Extraction and similarity measure based source code analysis. • Keyphrase extraction is more related to the pre-processing stage of Source Code Analysis. Similarity measures denote a statistical term of measurement of any file based parameter.
  • 16. References • A W. Shi, W. Zheng, J. Xu Yu, H. Cheng,L. Zou, "Keyphrase Extraction Using Knowledge Graphs", Data Sci. Eng. (2017) 2:275–288. https://doi.org/10.1007/s41019-017-0055-z. • S. Beliga, A. Meštrovic, S. Martincic-Ipšic, "An Overview of Graph-Based Keyword Extraction Methods and Approaches", Journal of Information and Organizational Sciences, VOL. 39, NO. 1 (2015), PP. 1-20. • F. Bulgarov, C. Caragea, "A Comparison of Supervised Keyphrase Extraction Models", WWW 2015 Companion, May 18–22, 2015, Florence, Italy, ACM 978-1-4503-3473-0/15/05.http://dx.doi.org/10.1145/2740908.2742776. • K. Saidul Hasan, V. Ng, "Automatic Keyphrase Extraction: A Survey of the State of the Art", Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, June 23-25 2014, pages 1262–1273. http://acl2014.org/acl2014/P14- 1/pdf/P14-1119.pdf • T. Gupta, "Keyword Extraction: A Review", International Journal of Engineering Applied Sciences and Technology, 2017. Vol. 2, Issue 4, ISSN No. 2455- 2143, Pages 215-220. • S. Siddiqi, Aditi Sharan, "Keyword and Keyphrase Extraction Techniques: A Literature Review", International Journal of Computer Applications, Volume 109 – No. 2, January 2015. ISBN: 0975 – 8887. • Jose Mary, G., & Haritha, D. (2017). A survey on best keyword cover search. Journal of Advanced Research in Dynamical and Control Systems, 9(Special issue 14), 2217-2231. Retrieved from www.scopus.com • D. Innocenti, D. Nart, C. Tasso, "A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language", In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, 78-85, 2014, Rome, Italy, ISBN: 978-989-758-048- 2. • V. B. T. Santhi, Santhi, M. V. B. T., Sagar Imambi, S., Yamini Devi, J., & Tejaswani, Design and development of information using keyword-element relationship graph - a critical study. International Journal of Engineering & Technology, [S.l.], v. 7, n. 2.7, p. 1020-1024, mar. 2018. ISSN 2227-524X. Available at: <https://www.sciencepubco.com/index.php/ijet/article/view/12211>. doi:http://dx.doi.org/10.14419/ijet.v7i2.7.12211. • D. Nart, C. Tasso, "Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation.", Ph.D. Thesis, Universita degli Studi di Udine, Italy, 2016. • S. Gollapalli, X. Li, P. Yang, "Incorporating Expert Knowledge into Keyphrase Extraction", Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), pg.3180-3187,2017. • P. Nesi, G. Pantaleo and G. Sanesi, "A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents", DOI reference number: 10.18293/DMS2015-024 , URL: http://www.disit.org/axmedis/791/00000-7915eb81-51cc-4285-bf37- 5a483323ba91/2/~saved-on-db-7915eb81-51cc-4285-bf37-5a483323ba91.pdf • N. jkovic, Ð., M. Kovacevic, "Building a Construction Project Key-Phrase Network from Unstructured Text Documents.", Journal of Computing in Civil Engineering 31, no. 6 (2017): 04017058. • H. Henrik , H. Aa, Hajo A. Reijers,"Identifying Candidate Tasks for Robotic Process Automation in Textual Process Descriptions." ,In Enterprise, Business-Process and Information Systems Modeling, pp. 67-81, Springer, Cham, 2018.
  • 17. Cite This Article As • Nakul Sharma and Prasanth Yalla 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1074 012027, https://iopscience.iop.org/article/10.1088/17 57-899X/1074/1/012027 • Complete article available at : – https://iopscience.iop.org/article/10.1088/1757- 899X/1074/1/012027