SlideShare a Scribd company logo
1 of 8
Download to read offline
TELKOMNIKA, Vol.15, No.1, March 2017, pp. 486~493
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v15i1.4631  486
Received August 5, 2016; Revised October 12, 2016; Accepted November 18, 2016
Entity Annotation WordPress Plugin using TAGME
Technology
William Aprilius
1
, Seng Hansun
2
, Dennis Gunawan
3
1,2,3
Universitas Multimedia Nusantara, Jl. Boulevard Gading Serpong, Scientia Garden, Tangerang
*Corresponding author, e-mail: william.aprilius@yahoo.com
1
, hansun@umn.ac.id
2
,
dennis.gunawan@umn.ac.id
3
Abstract
The development of internet technology makes more information can be accessed. It makes
information need to be organized in order to be easily managed. One solution can be used is by using the
entity annotation approach which generates tags to represent that document. In this study, TAGME
technology is implemented on a WordPress plugin, which is used to manage a blog. Moreover, information
on Wikipedia ‘Bahasa Indonesia’ is processed to generate an anchor dictionary which is required by the
technology that is implemented. This plugin performs entity annotation by giving tag suggestion for posts in
a blog. Testing is carried out by measuring the precision, recall, and of tag suggestions given by the
plugin. The result shows that the plugin can give tag suggestions with precision 0.7638, recall 0.5508, and
0.59.
Keywords: entity annotation, TAGME, wikipedia, wordpress
Copyright © 2017 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
The development of internet technology makes more information can be accessed. The
most important part of the data remains unstructured documents [1]. This makes internet users
need to manage large amounts of information [2]. Therefore, a method to organize that
information is needed in order to be easily managed.
One of the solutions used to overcome the problem of organizing the information in the
document is indexing or "tagging". Here, each document in a collection is indexed with a set of
key phrases ("tags") that reflect its principal topics. However, assigning tag or key phrase
manually is so time-consuming and impractical [3]. Therefore, a technique that can generate
tags for a document automatically is required.
Key phrase indexing is an approach that maps a word or phrase in the documents to a
related term in a controlled vocabulary. It is an intermediate approach between key phrase
extraction and term assignment that combines the advantages of both and avoids their
shortcomings [4]. Wikipedia can be used effectively to build a controlled vocabulary, which is
then used to perform key phrase indexing. This approach is also called as topic indexing [5].
Thus, the resulting tag or key phrase is a term in a controlled vocabulary built from Wikipedia.
Mihalcea & Csomai [6] define "text wikification" (shortly "wikification") as the task of
automatically extracting the most important words and phrases in the document, and identifying
for each such words and phrases the appropriate link to a Wikipedia article. Milne and Witten
[7] also had done a research to resolve "wikification" by using machine learning algorithm.
Thus, the problem of topic indexing is closely related to "wikification" when Wikipedia is used to
build a controlled vocabulary.
TAGME [8, 9] is the first software system that, on-the-fly and with high precision or
recall, perform entity annotation [10] for short text. An example of a short text is the post of a
blog. Ferragina and Scaiella [8] also explains that there are new challenges in performing entity
annotation on short text, which (1) could occur on-the-fly and thus cannot be pre-processed and
(2) should be designed properly, because the input texts are too short that it is difficult to mine
significant statistics that are rather available when texts are long. Thus, organizing the posts of
a blog can be done with entity annotation approach, which uses tags to represent that blog
posts. Challenges in entity annotation process can be solved with TAGME's algorithmic
technology (shortly TAGME technology). Therefore, in this study, technology (algorithm), which
TELKOMNIKA ISSN: 1693-6930 
Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius)
487
is used in TAGME, is implemented to perform entity annotation on a post in WordPress CMS.
WordPress is the most used CMS (47.09%) over the world [11]. Furthermore, it's also the most
used CMS for blog management (97.78%) [11]. Linawati et al. [12] even proposed
synchronization interfaces to migrate teachers‟ or lecturers‟ learning materials from WordPress
into Moodle, so that it can improve the Moodle utilization as an e-learning system.
2. TAGME Technology
TAGME is a “topic annotator” that is able to identify meaningful sequences of words in
a short text and link them to a pertinent Wikipedia [13]. TAGME uses the sequences of terms
composing the anchor texts which occur in the Wikipedia pages to identify meaningful term
sequences (spot) in the input text, then uses the pages (possibly many) pointed in Wikipedia by
that spot or anchor as possible senses for each spot. Sense selection for the spot which has
more than one sense is done by using functions, which fast to be computed and accurate to
find a collective agreement among all spot to Wikipedia page (sense) mapping [8].
There are three steps, which is done by TAGME to perform annotation. They are as
described by the following subsection [8].
2.1. Anchor Parsing
TAGME receives a short text as input, tokenizes it, and then detects the anchors by
querying the anchor dictionary. If there are two anchors ( and ) and is a (word-based)
substring of , TAGME drop only if ( ) ( ).
2.2. Anchor Disambiguation
In this step, TAGME tries to disambiguate each anchor, i.e. choose one sense for each
anchor. This is done by calculating a value for each anchor's possible sense which is wanted to
be disambiguated. This value is obtained by using a voting scheme, which calculates the vote
from each other anchors to annotation for that anchor.
Ferragina and Scaiella [8] explains that the basic calculation of vote as described by
Equation (1), which is proposed by Milne and Witten [14] to measure relatedness between two
Wikipedia pages ( and ).
( )
( (| ( ) ( )|)) (| ( ) ( )|)
( ) ( (| ( ) ( )|))
(1)
In (1), is the number of page in Wikipedia, ( ) and ( ) are the set of Wikipedia
pages pointing to page and .
After obtaining total vote for each anchor, TAGME uses an approach which is called
Disambiguation by Threshold (shortly DT), i.e. get 30% sense which has highest total vote, then
annotates anchor with sense which has the highest commonness.
2.3. Anchor Pruning
The disambiguation phase produces a set of candidate annotations, one per anchor
detected in the input text. This set has to be pruned in order to possibly discard the
meaningless annotations. These "bad annotations" are detected by using a simple scoring
function that takes into account only two features: the value of link probability and coherence.
Pruning score is obtained by calculating the average of these two values. If the obtained
pruning score for an annotation is smaller than a threshold, , that annotation is discarded
from the final resulting set of annotation.
3. Entity Annotation
In this section, it is explained the results of research and at the same time is given the
comprehensive discussion. Results can be presented in figures, graphs, tables and others that
make the reader understand easily [2], [5]. The discussion can be made in several sub-
chapters.
Entity annotation is an approach to overcome the limitation of classic approaches which
are based on "bag-of-words" paradigm in providing semantic representation for a text
 ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 1, March 2017 : 486 – 493
488
document. The key idea of this approach is to identify, in the input text, short-and-meaningful
sequences of terms (also called mentions) and annotate them with unambiguous identifiers
(also called entities) drawn from a catalog, such as Wikipedia [10].
The process of entity annotation involves three main steps [10]:
1. parsing of the input text, which is the task to detect candidate entity mentions and link each
of them to all possible entities they could mention;
2. disambiguation of mentions, which is the task of selecting the most pertinent Wikipedia
page (i.e. entity) that best describes each mention;
3. Pruning of a mention, which discards a detected mention and its annotated entity if they are
considered not interesting or pertinent to the semantic interpretation of the input text.
The problem of entity annotation can be casted in two main classes: (1) the
identification of (possibly scored) annotations, and thus the identification of mention-entity pairs;
and (2) finding tags (possibly scored or ranked), and thus accounting only for the entities [10].
4. Methodology and Design
In this study, two applications were developed, i.e. preprocessing application and
plugin application. Preprocessing application is used to process information in Wikipedia (i.e.
Wikipedia articles which are exported by using 'Special: Export' facility) to create an anchor
dictionary which is required to implement TAGME technology. Plugin application is a plugin for
WordPress CMS which is used to perform entity annotation. The entity annotation process is
performed by using anchor dictionary, which is created by preprocessing application. The
relationship between these two applications is shown in Figure 1.
In this study, Wikipedia Bahasa Indonesia articles under Computer Science (Ilmu
Komputer in Bahasa Indonesia) category structure were downloaded. We retrieved only articles
that are in the range of two level subcategories from the root category. These Wikipedia articles
were downloaded on 22 February 2015. This also explains the steps being taken to get the
Wikipedia Bahasa Indonesia articles in Computer Science category which is shown in Figure 1.
Figure 1. Relationship between Preprocessing Application and Plugin Application
4.1. Preprocessing Application
Preprocessing application is an application which is used to process information in
Wikipedia. This information is in the form of XML file of the exported Wikipedia pages
(henceforth referred to as Wikipedia XML dump). This application creates an anchor dictionary
in the form of XML file.
Figure 2 shows a flowchart of the main process in preprocessing application. Parsing
and iteration process of Wikipedia XML dump is done by using WikiXMLJ. In Figure 2, in the
process of getting the set of mapping data, a mapping data is a wikilink or internal link. A
„wikilink‟ links a page to another page within Wikipedia. This process aims to get the set of
mapping data in the Wikipedia page which is being processed in the iteration. This also means
that mapping data consists of an anchor text (i.e. a text that appears in Wikipedia pages as a
link) and a sense (i.e. Wikipedia page which is targeted by that link). In addition, this process
ensures the anchor text in the mapping data is composed by more than one character and not
just numbers.
TELKOMNIKA ISSN: 1693-6930 
Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius)
489
Figure 2. Flowchart of the Main Process in Preprocessing Application
The next process is to insert the set of mapping data which is found to anchor
dictionary. Moreover, ( ), the number of mapping data (counter), and "in page" set which
corresponds to a mapping data is also specified to anchor dictionary. The "in page" set is a set
of Wikipedia page title which the corresponding page contains that mapping data.
In Figure 2, the process of elimination of the anchor dictionary records (first phase) aims
to discard anchor dictionary records with ( ) . Moreover, this process counts the
frequency of anchor text occurrence in Wikipedia page which is being processed in the iteration,
then accumulates that frequency value to ( ) in anchor dictionary record which
corresponds to that anchor text. The process of elimination of the anchor dictionary records
(second phase) aims to discard anchor dictionary records with ( ) Figure 3 shows the
XML element structure of an anchored dictionary record.
Figure 3. XML Element Structure of an Anchor Dictionary Record
4.2. Plugin Application
Plugin application performs entity annotation by using anchor dictionary which is
created by preprocessing application. The entity annotation process is performed by giving tag
suggestion(s) to user based on the written input text and creating hyperlink on word or phrase
 ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 1, March 2017 : 486 – 493
490
in the input text to Wikipedia page based on the selected tag suggestion(s) which are wanted to
be the tags of the post. This plugin is named Wiki CS Annotation.
The plugin's user interface is displayed in the page of editing a post or creating a new
post, under the text editor in WordPress CMS. This plugin allows user to request tag
suggestion(s), choose tag suggestion(s) given to become tag(s) of a post, delete the chosen
tag(s), and change the threshold value.
TAGME technology which had been implemented on this plugin works when user
requests tag suggestion. Figure 4 shows the process when user request tag suggestion.
Moreover, Figure 4 shows that there are three main processes which are performed to handle
tag suggestion request. These processes are also denoted as the main step in TAGME, i.e.
anchor parsing, anchor disambiguation, and anchor pruning.
Figure 4. Flowchart of the Main Process when User Requests Tag Suggestion
Anchor parsing process queries the anchor dictionary to obtain anchor text records and
searches the occurrence of those anchor text records in the post. Moreover, this process also
perform selection process if more than one anchor text is found, which one of those anchor text
is substring (word-based) of other anchors text, follows what is done by TAGME.
The next process is anchor disambiguation. If an anchor text has more than one sense,
disambiguation is a process of choosing one of those senses. Thus, this process generates a
set of anchor text that is mapped to exactly one sense.
The next process is anchor pruning which aims to discard meaningless annotations.
This is done by calculating a pruning score, then by comparing it with a threshold as done
by TAGME. This process results a set of final annotations for a post.
5. Implementation
The plugin application which was developed is implemented on WordPress 4.1.5. This
plugin can be used after the user installs the plugin by using Add Plugins page on WordPress.
Then, user needs to activate the plugin which can be done through Plugins page on
WordPress. The installation can be done by user who has privilege as an administrator.
Anchor dictionary which is created by preprocessing application contains 1,259
records. This anchor dictionary needs to be imported to WordPress, so it can be used by the
plugin. This importing process can be done by using anchor dictionary importing facility which is
provided by the plugin.
TELKOMNIKA ISSN: 1693-6930 
Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius)
491
The request of tag suggestion can be done from the page of editing a post or creating a
new post. Figure 5 shows tag suggestions which given by the plugin with input text about
computer hardware.
Figure 5. Tag Suggestion Given by the Plugin for Input Text about Computer Hardware
A tag suggestion which is chosen by the user will become a post's tag. When the user
saves (as a draft) or publishes a post, before that process is executed by the WordPress, plugin
will create hyperlink on word or phrase in the input text to Wikipedia page based on the
selected tag suggestions. For example, in Figure 5, the "Perangkat keras" tag ("perangkat
keras" means "hardware" in English) is chosen, then user decides to save or publish that post
(an input text about computer hardware). Before WordPress saves or publishes that post,
plugin will create hyperlink on all "perangkat keras" phrase in the input text to Wikipedia page
titled “Perangkat keras”. This is because "perangkat keras" phrase is the anchor text of
"Perangkat keras" sense.
6. Testing and Results
Testing is performed to measure precision, recall, and of the tag suggestion(s) given
by the plugin.
6.1. Testing Data
Data set for testing is built by selecting randomly 100 Wikipedia articles under
Computer Science category structure. Then, texts in those articles are edited by discarding
citation (if available), i.e. number between square brackets. The result of this process is 100
plain texts which becomes the data set for testing.
In this study, we limit the text's length in the testing data set to around 200 words. This
is done as follows: if the length of a Wikipedia article is less than 200 words, we take all text in
that article as the testing data. Otherwise, we take only the first (around) 200 words with
sentence as the window of selection.
6.2. Results
In this study, precision and recall are measured with focus on which sense got linked,
i.e. tag suggestion which is given by the plugin. Let ( ) is the set of sense in text in ground
truth, i.e. Wikipedia articles which are used to create text and ( ) is the set of sense in text
 ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 1, March 2017 : 486 – 493
492
which is given by the plugin in the form of tag suggestion set. Then, ( ) is added with the
title of Wikipedia article which is used to create text, . This addition is performed because that
title of Wikipedia article can become a tag of the text , but not contained in the set of sense in
the text in ground truth. Precision and recall is calculated based on ( ) and ( ). Moreover,
let ( ) and ( ), there is probability that , but is still relevant to ( is
considered equal to ). This can be occured if in the Wikipedia, is a redirect of , or vice
versa.
For each text in testing data set, precision and recall is calculated with five variations of
values, i.e. 0, 0.1, 0.2, 0.3, and 0.4. Table 1 shows the average of precision and recall
from 100 testing data, and for that average of precision and recall.
Table 1. Precision, Recall, and F1 for each Variation of ρNA Values
Precision Recall
0 0.5185 0.5508 0.5342
0.1 0.6674 0.5287 0.5900
0.2 0.7638 0.3923 0.5184
0.3 0.7043 0.2430 0.3613
0.4 0.5968 0.1339 0.2187
Table 1 shows that the highest average of precision (i.e. 0.7638), is obtained when
and the highest average of recall (i.e. 0.5508) is obtained when . Moreover,
the highest (i.e. 0.59) is obtained when .
Based on the testing process, it was found that Wikipedia Bahasa Indonesia is not
good enough if it is used as ground truth for testing or evaluation, particularly the articles which
are under the Computer Science category structure. This is because we still found sense that is
not appropriate for a particular anchor text. For example, sense of anchor text "eksekusi"
(means "execute" in English) in Ada (programming language) is “Hukuman mati” (means "death
penalty" in English), which is not proper. Anchor text "eksekusi" in that article should be
interpreted as execution of program code. Thus, it is necessary to use a better set of testing
data, i.e. each anchor text in that data set has an appropriate sense.
7. Conclusion
Information in Wikipedia (i.e. the exported Wikipedia articles, particularly which are
under Computer Science category structure) is processed to create an anchor dictionary in the
form of XML file. This anchor dictionary contains 1,259 records. Moreover, TAGME technology
is implemented as a WordPress plugin to perform entity annotation. The entity annotation is
performed by giving tag suggestion for a post and creating hyperlink to Wikipedia page on word
or phrase in the post based on the selected tag suggestion which is wanted to be the tag of the
post.
Based on the testing result, the highest precision, recall, and , that can be achieved
by the plugin is 0.7638, 0.5508, and 0.59 respectively. In a subsequent study, testing should be
carried out by using a better data set. Moreover, to increase precision, recall, and , a study
can be conducted to find a novel or modified relatedness function which can give better result
when use a smaller anchor dictionary (subset of Wikipedia), as done in this study.
In the future, we intent to develop the plugin to perform text categorization, which uses
the entity annotation approach. In our hypothesis, this can be done by including the Wikipedia
page's category structure and information to anchor dictionary which is conducted by the
preprocessing application. Moreover, we also need to modify the plugin so that it can facilitate
the text categorization functionality.
References
[1] Abdoulahi Boubacar, Zhendong Niu. Conceptual Search Based on Semantic Relatedness.
Indonesian Journal of Electrical Engineering and Computer Science. 2014; 12(8): 6380-6385.
TELKOMNIKA ISSN: 1693-6930 
Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius)
493
[2] Dan Roth, Heng Ji, Ming-Wei Chang, Taylor Cassidy. Wikification and Beyond: The Challenges of
Entity and Concept Grounding. Proc. 52
nd
Annual Meeting of the Association for Computational
Linguistics: Tutorials (ACL2014), USA. 2014: 7.
[3] Olena Medelyan, Ian H Witten. Domain Independent Automatic Keyphrase Indexing with Small
Training Sets. Journal of the American Society for Information Science and Technology. 2008; 59(7):
1026-1040.
[4] Olena Medelyan, Ian H. Witten. Thesaurus Based Automatic Keyphrase Indexing. Proc. Joint
Conference on Digital Libraries (JCDL), USA. 2006.
[5] Olena Medelyan, Ian H. Witten, David Milne. Topic Indexing with Wikipedia. Proc. Wikipedia and AI
Workshop at the AAAI-2008 Conference, Chicago. 2008: 19-24.
[6] Rada Mihalcea, Andras Csomai. Wikify! Linking Documents to Encyclopedic Knowledge. Proc. 16
th
ACM Conference on Information and Knowledge Management (CIKM ‟07), New York. 2007: 233-
242.
[7] David Milne, Ian H. Witten. Learning to Link with Wikipedia. Proc. Conference on Information and
Knowledge Management (CIKM '08), New York. 2008: 509-518.
[8] Paolo Ferragina, Ugo Scaiella. Fast and Accurate Annotation of Short Texts with Wikipedia Pages,
IEEE Software. 2011; 29(1): 70-75.
[9] Paolo Ferragina, Ugo Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments (by
Wikipedia Entities). Proc. Conference on Information and Knowledge Management (CIKM '10),
Canada, October 2010.
[10] Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita. A Framework for Benchmarking Entity-
Annotation System. Proc. 22
nd
International Conference on World Wide Web (WWW '13),
Switzerland. 2013: 249-260.
[11] BuiltWith, CMS Usage Statistics: Statistics for Websites using CMS Technologies, [online]. Available
on http://trends.builtwith.com/cms. accessed on 6 June 2015.
[12] Linawati, Gede Sukadarmika, GM Arya Sasmita. Synchronization Interfaces for Improving Moodle
Utilization. TELKOMNIKA. 2012; 10(1): 179-188.
[13] Paolo Ferragina. TAGME Technology. Available on http://acube.di.unipi.it/tagme/. accessed on 23
February 2015.
[14] David Milne, Ian H Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from
Wikipedia Links. Proc. AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy,
Chicago. 2008: 25-30.

More Related Content

What's hot

Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
Indonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmIndonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental AnalysisIRJET Journal
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...TELKOMNIKA JOURNAL
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
 
semantic text doc clustering
semantic text doc clusteringsemantic text doc clustering
semantic text doc clusteringSouvik Roy
 
Automatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingAutomatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingIRJET Journal
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersMOHDSAIFWAJID1
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Designijtsrd
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...eSAT Journals
 
Survey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniquesSurvey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniquesAnunaya
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
Finding important nodes in social networks based on modified pagerank
Finding important nodes in social networks based on modified pagerankFinding important nodes in social networks based on modified pagerank
Finding important nodes in social networks based on modified pagerankcsandit
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Miningmlaij
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 

What's hot (18)

Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Indonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmIndonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithm
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
semantic text doc clustering
semantic text doc clusteringsemantic text doc clustering
semantic text doc clustering
 
I2 madankarky1 jharibabu
I2 madankarky1 jharibabuI2 madankarky1 jharibabu
I2 madankarky1 jharibabu
 
Automatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingAutomatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language Processing
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...
 
Survey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniquesSurvey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniques
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
 
Finding important nodes in social networks based on modified pagerank
Finding important nodes in social networks based on modified pagerankFinding important nodes in social networks based on modified pagerank
Finding important nodes in social networks based on modified pagerank
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 

Similar to Entity Annotation WordPress Plugin Using TAGME Technology

Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362Editor IJARCET
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...IJwest
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
 
Dr31564567
Dr31564567Dr31564567
Dr31564567IJMER
 
Syntactic Indexes for Text Retrieval
Syntactic Indexes for Text RetrievalSyntactic Indexes for Text Retrieval
Syntactic Indexes for Text RetrievalITIIIndustries
 
Group4 doc
Group4 docGroup4 doc
Group4 docfirati
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringIRJET Journal
 
IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET Journal
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMIRJET Journal
 
A web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition ExtractionA web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition ExtractionIRJET Journal
 
An automatic text summarization using lexical cohesion and correlation of sen...
An automatic text summarization using lexical cohesion and correlation of sen...An automatic text summarization using lexical cohesion and correlation of sen...
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
 
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...CSEIJJournal
 
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
 
Re-enactment of Newspaper Articles
Re-enactment of Newspaper ArticlesRe-enactment of Newspaper Articles
Re-enactment of Newspaper ArticlesEditor IJCATR
 

Similar to Entity Annotation WordPress Plugin Using TAGME Technology (20)

Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
 
Dr31564567
Dr31564567Dr31564567
Dr31564567
 
G0361034038
G0361034038G0361034038
G0361034038
 
Syntactic Indexes for Text Retrieval
Syntactic Indexes for Text RetrievalSyntactic Indexes for Text Retrieval
Syntactic Indexes for Text Retrieval
 
Group4 doc
Group4 docGroup4 doc
Group4 doc
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
 
IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text Rank
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
A web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition ExtractionA web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition Extraction
 
An automatic text summarization using lexical cohesion and correlation of sen...
An automatic text summarization using lexical cohesion and correlation of sen...An automatic text summarization using lexical cohesion and correlation of sen...
An automatic text summarization using lexical cohesion and correlation of sen...
 
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
 
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
 
Re-enactment of Newspaper Articles
Re-enactment of Newspaper ArticlesRe-enactment of Newspaper Articles
Re-enactment of Newspaper Articles
 
I6 mala3 sowmya
I6 mala3 sowmyaI6 mala3 sowmya
I6 mala3 sowmya
 

More from TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesTELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
 

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
 

Recently uploaded

Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Recently uploaded (20)

Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Entity Annotation WordPress Plugin Using TAGME Technology

  • 1. TELKOMNIKA, Vol.15, No.1, March 2017, pp. 486~493 ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013 DOI: 10.12928/TELKOMNIKA.v15i1.4631  486 Received August 5, 2016; Revised October 12, 2016; Accepted November 18, 2016 Entity Annotation WordPress Plugin using TAGME Technology William Aprilius 1 , Seng Hansun 2 , Dennis Gunawan 3 1,2,3 Universitas Multimedia Nusantara, Jl. Boulevard Gading Serpong, Scientia Garden, Tangerang *Corresponding author, e-mail: william.aprilius@yahoo.com 1 , hansun@umn.ac.id 2 , dennis.gunawan@umn.ac.id 3 Abstract The development of internet technology makes more information can be accessed. It makes information need to be organized in order to be easily managed. One solution can be used is by using the entity annotation approach which generates tags to represent that document. In this study, TAGME technology is implemented on a WordPress plugin, which is used to manage a blog. Moreover, information on Wikipedia ‘Bahasa Indonesia’ is processed to generate an anchor dictionary which is required by the technology that is implemented. This plugin performs entity annotation by giving tag suggestion for posts in a blog. Testing is carried out by measuring the precision, recall, and of tag suggestions given by the plugin. The result shows that the plugin can give tag suggestions with precision 0.7638, recall 0.5508, and 0.59. Keywords: entity annotation, TAGME, wikipedia, wordpress Copyright © 2017 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction The development of internet technology makes more information can be accessed. The most important part of the data remains unstructured documents [1]. This makes internet users need to manage large amounts of information [2]. Therefore, a method to organize that information is needed in order to be easily managed. One of the solutions used to overcome the problem of organizing the information in the document is indexing or "tagging". Here, each document in a collection is indexed with a set of key phrases ("tags") that reflect its principal topics. However, assigning tag or key phrase manually is so time-consuming and impractical [3]. Therefore, a technique that can generate tags for a document automatically is required. Key phrase indexing is an approach that maps a word or phrase in the documents to a related term in a controlled vocabulary. It is an intermediate approach between key phrase extraction and term assignment that combines the advantages of both and avoids their shortcomings [4]. Wikipedia can be used effectively to build a controlled vocabulary, which is then used to perform key phrase indexing. This approach is also called as topic indexing [5]. Thus, the resulting tag or key phrase is a term in a controlled vocabulary built from Wikipedia. Mihalcea & Csomai [6] define "text wikification" (shortly "wikification") as the task of automatically extracting the most important words and phrases in the document, and identifying for each such words and phrases the appropriate link to a Wikipedia article. Milne and Witten [7] also had done a research to resolve "wikification" by using machine learning algorithm. Thus, the problem of topic indexing is closely related to "wikification" when Wikipedia is used to build a controlled vocabulary. TAGME [8, 9] is the first software system that, on-the-fly and with high precision or recall, perform entity annotation [10] for short text. An example of a short text is the post of a blog. Ferragina and Scaiella [8] also explains that there are new challenges in performing entity annotation on short text, which (1) could occur on-the-fly and thus cannot be pre-processed and (2) should be designed properly, because the input texts are too short that it is difficult to mine significant statistics that are rather available when texts are long. Thus, organizing the posts of a blog can be done with entity annotation approach, which uses tags to represent that blog posts. Challenges in entity annotation process can be solved with TAGME's algorithmic technology (shortly TAGME technology). Therefore, in this study, technology (algorithm), which
  • 2. TELKOMNIKA ISSN: 1693-6930  Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius) 487 is used in TAGME, is implemented to perform entity annotation on a post in WordPress CMS. WordPress is the most used CMS (47.09%) over the world [11]. Furthermore, it's also the most used CMS for blog management (97.78%) [11]. Linawati et al. [12] even proposed synchronization interfaces to migrate teachers‟ or lecturers‟ learning materials from WordPress into Moodle, so that it can improve the Moodle utilization as an e-learning system. 2. TAGME Technology TAGME is a “topic annotator” that is able to identify meaningful sequences of words in a short text and link them to a pertinent Wikipedia [13]. TAGME uses the sequences of terms composing the anchor texts which occur in the Wikipedia pages to identify meaningful term sequences (spot) in the input text, then uses the pages (possibly many) pointed in Wikipedia by that spot or anchor as possible senses for each spot. Sense selection for the spot which has more than one sense is done by using functions, which fast to be computed and accurate to find a collective agreement among all spot to Wikipedia page (sense) mapping [8]. There are three steps, which is done by TAGME to perform annotation. They are as described by the following subsection [8]. 2.1. Anchor Parsing TAGME receives a short text as input, tokenizes it, and then detects the anchors by querying the anchor dictionary. If there are two anchors ( and ) and is a (word-based) substring of , TAGME drop only if ( ) ( ). 2.2. Anchor Disambiguation In this step, TAGME tries to disambiguate each anchor, i.e. choose one sense for each anchor. This is done by calculating a value for each anchor's possible sense which is wanted to be disambiguated. This value is obtained by using a voting scheme, which calculates the vote from each other anchors to annotation for that anchor. Ferragina and Scaiella [8] explains that the basic calculation of vote as described by Equation (1), which is proposed by Milne and Witten [14] to measure relatedness between two Wikipedia pages ( and ). ( ) ( (| ( ) ( )|)) (| ( ) ( )|) ( ) ( (| ( ) ( )|)) (1) In (1), is the number of page in Wikipedia, ( ) and ( ) are the set of Wikipedia pages pointing to page and . After obtaining total vote for each anchor, TAGME uses an approach which is called Disambiguation by Threshold (shortly DT), i.e. get 30% sense which has highest total vote, then annotates anchor with sense which has the highest commonness. 2.3. Anchor Pruning The disambiguation phase produces a set of candidate annotations, one per anchor detected in the input text. This set has to be pruned in order to possibly discard the meaningless annotations. These "bad annotations" are detected by using a simple scoring function that takes into account only two features: the value of link probability and coherence. Pruning score is obtained by calculating the average of these two values. If the obtained pruning score for an annotation is smaller than a threshold, , that annotation is discarded from the final resulting set of annotation. 3. Entity Annotation In this section, it is explained the results of research and at the same time is given the comprehensive discussion. Results can be presented in figures, graphs, tables and others that make the reader understand easily [2], [5]. The discussion can be made in several sub- chapters. Entity annotation is an approach to overcome the limitation of classic approaches which are based on "bag-of-words" paradigm in providing semantic representation for a text
  • 3.  ISSN: 1693-6930 TELKOMNIKA Vol. 15, No. 1, March 2017 : 486 – 493 488 document. The key idea of this approach is to identify, in the input text, short-and-meaningful sequences of terms (also called mentions) and annotate them with unambiguous identifiers (also called entities) drawn from a catalog, such as Wikipedia [10]. The process of entity annotation involves three main steps [10]: 1. parsing of the input text, which is the task to detect candidate entity mentions and link each of them to all possible entities they could mention; 2. disambiguation of mentions, which is the task of selecting the most pertinent Wikipedia page (i.e. entity) that best describes each mention; 3. Pruning of a mention, which discards a detected mention and its annotated entity if they are considered not interesting or pertinent to the semantic interpretation of the input text. The problem of entity annotation can be casted in two main classes: (1) the identification of (possibly scored) annotations, and thus the identification of mention-entity pairs; and (2) finding tags (possibly scored or ranked), and thus accounting only for the entities [10]. 4. Methodology and Design In this study, two applications were developed, i.e. preprocessing application and plugin application. Preprocessing application is used to process information in Wikipedia (i.e. Wikipedia articles which are exported by using 'Special: Export' facility) to create an anchor dictionary which is required to implement TAGME technology. Plugin application is a plugin for WordPress CMS which is used to perform entity annotation. The entity annotation process is performed by using anchor dictionary, which is created by preprocessing application. The relationship between these two applications is shown in Figure 1. In this study, Wikipedia Bahasa Indonesia articles under Computer Science (Ilmu Komputer in Bahasa Indonesia) category structure were downloaded. We retrieved only articles that are in the range of two level subcategories from the root category. These Wikipedia articles were downloaded on 22 February 2015. This also explains the steps being taken to get the Wikipedia Bahasa Indonesia articles in Computer Science category which is shown in Figure 1. Figure 1. Relationship between Preprocessing Application and Plugin Application 4.1. Preprocessing Application Preprocessing application is an application which is used to process information in Wikipedia. This information is in the form of XML file of the exported Wikipedia pages (henceforth referred to as Wikipedia XML dump). This application creates an anchor dictionary in the form of XML file. Figure 2 shows a flowchart of the main process in preprocessing application. Parsing and iteration process of Wikipedia XML dump is done by using WikiXMLJ. In Figure 2, in the process of getting the set of mapping data, a mapping data is a wikilink or internal link. A „wikilink‟ links a page to another page within Wikipedia. This process aims to get the set of mapping data in the Wikipedia page which is being processed in the iteration. This also means that mapping data consists of an anchor text (i.e. a text that appears in Wikipedia pages as a link) and a sense (i.e. Wikipedia page which is targeted by that link). In addition, this process ensures the anchor text in the mapping data is composed by more than one character and not just numbers.
  • 4. TELKOMNIKA ISSN: 1693-6930  Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius) 489 Figure 2. Flowchart of the Main Process in Preprocessing Application The next process is to insert the set of mapping data which is found to anchor dictionary. Moreover, ( ), the number of mapping data (counter), and "in page" set which corresponds to a mapping data is also specified to anchor dictionary. The "in page" set is a set of Wikipedia page title which the corresponding page contains that mapping data. In Figure 2, the process of elimination of the anchor dictionary records (first phase) aims to discard anchor dictionary records with ( ) . Moreover, this process counts the frequency of anchor text occurrence in Wikipedia page which is being processed in the iteration, then accumulates that frequency value to ( ) in anchor dictionary record which corresponds to that anchor text. The process of elimination of the anchor dictionary records (second phase) aims to discard anchor dictionary records with ( ) Figure 3 shows the XML element structure of an anchored dictionary record. Figure 3. XML Element Structure of an Anchor Dictionary Record 4.2. Plugin Application Plugin application performs entity annotation by using anchor dictionary which is created by preprocessing application. The entity annotation process is performed by giving tag suggestion(s) to user based on the written input text and creating hyperlink on word or phrase
  • 5.  ISSN: 1693-6930 TELKOMNIKA Vol. 15, No. 1, March 2017 : 486 – 493 490 in the input text to Wikipedia page based on the selected tag suggestion(s) which are wanted to be the tags of the post. This plugin is named Wiki CS Annotation. The plugin's user interface is displayed in the page of editing a post or creating a new post, under the text editor in WordPress CMS. This plugin allows user to request tag suggestion(s), choose tag suggestion(s) given to become tag(s) of a post, delete the chosen tag(s), and change the threshold value. TAGME technology which had been implemented on this plugin works when user requests tag suggestion. Figure 4 shows the process when user request tag suggestion. Moreover, Figure 4 shows that there are three main processes which are performed to handle tag suggestion request. These processes are also denoted as the main step in TAGME, i.e. anchor parsing, anchor disambiguation, and anchor pruning. Figure 4. Flowchart of the Main Process when User Requests Tag Suggestion Anchor parsing process queries the anchor dictionary to obtain anchor text records and searches the occurrence of those anchor text records in the post. Moreover, this process also perform selection process if more than one anchor text is found, which one of those anchor text is substring (word-based) of other anchors text, follows what is done by TAGME. The next process is anchor disambiguation. If an anchor text has more than one sense, disambiguation is a process of choosing one of those senses. Thus, this process generates a set of anchor text that is mapped to exactly one sense. The next process is anchor pruning which aims to discard meaningless annotations. This is done by calculating a pruning score, then by comparing it with a threshold as done by TAGME. This process results a set of final annotations for a post. 5. Implementation The plugin application which was developed is implemented on WordPress 4.1.5. This plugin can be used after the user installs the plugin by using Add Plugins page on WordPress. Then, user needs to activate the plugin which can be done through Plugins page on WordPress. The installation can be done by user who has privilege as an administrator. Anchor dictionary which is created by preprocessing application contains 1,259 records. This anchor dictionary needs to be imported to WordPress, so it can be used by the plugin. This importing process can be done by using anchor dictionary importing facility which is provided by the plugin.
  • 6. TELKOMNIKA ISSN: 1693-6930  Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius) 491 The request of tag suggestion can be done from the page of editing a post or creating a new post. Figure 5 shows tag suggestions which given by the plugin with input text about computer hardware. Figure 5. Tag Suggestion Given by the Plugin for Input Text about Computer Hardware A tag suggestion which is chosen by the user will become a post's tag. When the user saves (as a draft) or publishes a post, before that process is executed by the WordPress, plugin will create hyperlink on word or phrase in the input text to Wikipedia page based on the selected tag suggestions. For example, in Figure 5, the "Perangkat keras" tag ("perangkat keras" means "hardware" in English) is chosen, then user decides to save or publish that post (an input text about computer hardware). Before WordPress saves or publishes that post, plugin will create hyperlink on all "perangkat keras" phrase in the input text to Wikipedia page titled “Perangkat keras”. This is because "perangkat keras" phrase is the anchor text of "Perangkat keras" sense. 6. Testing and Results Testing is performed to measure precision, recall, and of the tag suggestion(s) given by the plugin. 6.1. Testing Data Data set for testing is built by selecting randomly 100 Wikipedia articles under Computer Science category structure. Then, texts in those articles are edited by discarding citation (if available), i.e. number between square brackets. The result of this process is 100 plain texts which becomes the data set for testing. In this study, we limit the text's length in the testing data set to around 200 words. This is done as follows: if the length of a Wikipedia article is less than 200 words, we take all text in that article as the testing data. Otherwise, we take only the first (around) 200 words with sentence as the window of selection. 6.2. Results In this study, precision and recall are measured with focus on which sense got linked, i.e. tag suggestion which is given by the plugin. Let ( ) is the set of sense in text in ground truth, i.e. Wikipedia articles which are used to create text and ( ) is the set of sense in text
  • 7.  ISSN: 1693-6930 TELKOMNIKA Vol. 15, No. 1, March 2017 : 486 – 493 492 which is given by the plugin in the form of tag suggestion set. Then, ( ) is added with the title of Wikipedia article which is used to create text, . This addition is performed because that title of Wikipedia article can become a tag of the text , but not contained in the set of sense in the text in ground truth. Precision and recall is calculated based on ( ) and ( ). Moreover, let ( ) and ( ), there is probability that , but is still relevant to ( is considered equal to ). This can be occured if in the Wikipedia, is a redirect of , or vice versa. For each text in testing data set, precision and recall is calculated with five variations of values, i.e. 0, 0.1, 0.2, 0.3, and 0.4. Table 1 shows the average of precision and recall from 100 testing data, and for that average of precision and recall. Table 1. Precision, Recall, and F1 for each Variation of ρNA Values Precision Recall 0 0.5185 0.5508 0.5342 0.1 0.6674 0.5287 0.5900 0.2 0.7638 0.3923 0.5184 0.3 0.7043 0.2430 0.3613 0.4 0.5968 0.1339 0.2187 Table 1 shows that the highest average of precision (i.e. 0.7638), is obtained when and the highest average of recall (i.e. 0.5508) is obtained when . Moreover, the highest (i.e. 0.59) is obtained when . Based on the testing process, it was found that Wikipedia Bahasa Indonesia is not good enough if it is used as ground truth for testing or evaluation, particularly the articles which are under the Computer Science category structure. This is because we still found sense that is not appropriate for a particular anchor text. For example, sense of anchor text "eksekusi" (means "execute" in English) in Ada (programming language) is “Hukuman mati” (means "death penalty" in English), which is not proper. Anchor text "eksekusi" in that article should be interpreted as execution of program code. Thus, it is necessary to use a better set of testing data, i.e. each anchor text in that data set has an appropriate sense. 7. Conclusion Information in Wikipedia (i.e. the exported Wikipedia articles, particularly which are under Computer Science category structure) is processed to create an anchor dictionary in the form of XML file. This anchor dictionary contains 1,259 records. Moreover, TAGME technology is implemented as a WordPress plugin to perform entity annotation. The entity annotation is performed by giving tag suggestion for a post and creating hyperlink to Wikipedia page on word or phrase in the post based on the selected tag suggestion which is wanted to be the tag of the post. Based on the testing result, the highest precision, recall, and , that can be achieved by the plugin is 0.7638, 0.5508, and 0.59 respectively. In a subsequent study, testing should be carried out by using a better data set. Moreover, to increase precision, recall, and , a study can be conducted to find a novel or modified relatedness function which can give better result when use a smaller anchor dictionary (subset of Wikipedia), as done in this study. In the future, we intent to develop the plugin to perform text categorization, which uses the entity annotation approach. In our hypothesis, this can be done by including the Wikipedia page's category structure and information to anchor dictionary which is conducted by the preprocessing application. Moreover, we also need to modify the plugin so that it can facilitate the text categorization functionality. References [1] Abdoulahi Boubacar, Zhendong Niu. Conceptual Search Based on Semantic Relatedness. Indonesian Journal of Electrical Engineering and Computer Science. 2014; 12(8): 6380-6385.
  • 8. TELKOMNIKA ISSN: 1693-6930  Entity Annotation WordPress Plugin using TAGME Technology (William Aprilius) 493 [2] Dan Roth, Heng Ji, Ming-Wei Chang, Taylor Cassidy. Wikification and Beyond: The Challenges of Entity and Concept Grounding. Proc. 52 nd Annual Meeting of the Association for Computational Linguistics: Tutorials (ACL2014), USA. 2014: 7. [3] Olena Medelyan, Ian H Witten. Domain Independent Automatic Keyphrase Indexing with Small Training Sets. Journal of the American Society for Information Science and Technology. 2008; 59(7): 1026-1040. [4] Olena Medelyan, Ian H. Witten. Thesaurus Based Automatic Keyphrase Indexing. Proc. Joint Conference on Digital Libraries (JCDL), USA. 2006. [5] Olena Medelyan, Ian H. Witten, David Milne. Topic Indexing with Wikipedia. Proc. Wikipedia and AI Workshop at the AAAI-2008 Conference, Chicago. 2008: 19-24. [6] Rada Mihalcea, Andras Csomai. Wikify! Linking Documents to Encyclopedic Knowledge. Proc. 16 th ACM Conference on Information and Knowledge Management (CIKM ‟07), New York. 2007: 233- 242. [7] David Milne, Ian H. Witten. Learning to Link with Wikipedia. Proc. Conference on Information and Knowledge Management (CIKM '08), New York. 2008: 509-518. [8] Paolo Ferragina, Ugo Scaiella. Fast and Accurate Annotation of Short Texts with Wikipedia Pages, IEEE Software. 2011; 29(1): 70-75. [9] Paolo Ferragina, Ugo Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities). Proc. Conference on Information and Knowledge Management (CIKM '10), Canada, October 2010. [10] Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita. A Framework for Benchmarking Entity- Annotation System. Proc. 22 nd International Conference on World Wide Web (WWW '13), Switzerland. 2013: 249-260. [11] BuiltWith, CMS Usage Statistics: Statistics for Websites using CMS Technologies, [online]. Available on http://trends.builtwith.com/cms. accessed on 6 June 2015. [12] Linawati, Gede Sukadarmika, GM Arya Sasmita. Synchronization Interfaces for Improving Moodle Utilization. TELKOMNIKA. 2012; 10(1): 179-188. [13] Paolo Ferragina. TAGME Technology. Available on http://acube.di.unipi.it/tagme/. accessed on 23 February 2015. [14] David Milne, Ian H Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. Proc. AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, Chicago. 2008: 25-30.