The amount of information generated in the Web has grown enormously over the years. This information is significant to individuals, businesses and organizations. If analyzed, understood and utilized, it will provide a valuable insight to its stakeholders. However, many of these information are semi-structured or unstructured which makes it difficult to draw in-depth understanding of the implications behind those information. This is where Ontology-based Information Extraction (OBIE) and social media content analysis come into play. OBIE has now become a popular way to extract information coming from machine-readable sources. This paper presents a survey of OBIE, Ontology languages and tools and the process to build an ontology model and framework. The author made a comparison of two ontology building frameworks and identified which framework is complete.
Survey of Ontology-based Information Extraction for Social Media
1. Integrated Intelligent Research (IIR) International Journal of Business Intelligents
Volume: 06 Issue: 02, December 2017, Page No.45-47
ISSN: 2278-2400
45
A Survey of Ontology-based Information Extraction
for Social Media Content Analysis
Redjie T. Villar
Information Technology Department, Shinas College of Technology, Shinas, Sultanate of Oman
Email: redjietan@yahoo.com
Abstract— The amount of information generated in the Web
has grown enormously over the years. This information is
significant to individuals, businesses and organizations. If
analyzed, understood and utilized, it will provide a valuable
insight to its stakeholders. However, many of these information
are semi-structured or unstructured which makes it difficult to
draw in-depth understanding of the implications behind those
information. This is where Ontology-based Information
Extraction (OBIE) and social media content analysis come into
play. OBIE has now become a popular way to extract
information coming from machine-readable sources. This
paper presents a survey of OBIE, Ontology languages and tools
and the process to build an ontology model and framework.
The author made a comparison of two ontology building
frameworks and identified which framework is complete.
Keywords— Ontology, OBIE, Social Media
1. INTRODUCTION
1.1 Ontology-Based Information Extraction
Information in the web has grown exponentially over the years
especially with web 2.0 and the propagation of social media.
Many researches have seen the potential of information behind
social media content and seek to understand and interpret them.
The most popular way is with the use of Ontology-based
information extraction (OBIE) which is a subfield of
information extraction (IE). According to Sha and Jain [1],
ontologies are specific to a particular domain as well as
dependent on its application. So ontology should be designed
to fit its domain and purpose. Wimalasuriya [2] cited that
OBIE is a subfield of information extraction which uses
ontologies consists of classes, properties, individuals and
values as a focal point of extraction.
1.2 Sentiment Analysis
With the growing inclination of people to express themselves
in social media, to share their excitement or happiness or to
vent anger or disappointments, there is a growing curiosity to
understand the causes of these diverse emotions in the media.
Sentiment analysis which is the process of determining
whether a piece of writing is positive, negative or neutral is
also known as opinion mining. [3] Many of these sentiment
analyses are focused on determining the satisfaction or
dissatisfaction of consumers of products and understanding
their causes become an invaluable insight to the companies.
Many studies were focused on sentiment analysis. Thakor and
Sasi [4] found out that OBIE can be used to conduct sentiment
analysis on the customer’s dissatisfaction in the postal service
and becomes a vital input to the company to improve their
service by analyzing their social media posts. Hassan, He and
Harith [5] proposes to use semantic features in Twitter
sentiment classification. They further proposed the use of three
different approaches to incorporate these classifications into
analysis. According to them the three approaches are
replacement, augmentation and interpolation. Several studies
[4][1][11] were focused on building an ontology model for
different domains. However, since OBIE and its used in social
media is still relatively new, there is no standard model used
yet in any of the domains.
This paper presents an overview of ontology for readers who
may not be familiar with this concept. It also contains a brief
survey of popular ontology languages and open-source tools. I
discussed the approach used to build an ontology model for
social media analysis. The remaining sections of the paper are
organized as follows: Section II provides the overview of
ontology. Section III provides ontology languages and tools.
Section IV discusses the steps in building an ontology model
and framework. Section V presents the conclusion of the study.
2. OVERVIEW OF ONTOLOGY
Ontology is popularly defined as a “formal, explicit
specification of a shared conceptualization”. [6] In this
perspective, formal specification means it is encoded in a
language in which properties are well understood. Formal
specification is important as it eliminates ambiguity which is
known in formal language and notations. Explicit specification
means that the concepts and relationships in the abstract model
are named and defined explicitly. Shared here means that the
reason why ontology is developed is to be reused across
different domains, applications and communities. Finally, once
ptualization here refers to an abstract model of how people
think in the world about a specific area. [7]
Ontology is developed to share common understanding of the
structure of information among people and software agents.
For example, if all the relevant terms in a particular domain,
say social media, will be collected and documented and build
into an ontology, this ontology can be shared and used to
answer queries related to the domain. It can also be used as an
input to other applications. [8]
3. ONTOLOGY LANGUAGES AND TOOLS
Before an analysis of the web content, an ontology model has
to be built first. This ontology model can be stored in one of
the ontology languages. An ontology language is a formal
language used to encode the ontology. [9] There are a number
of such languages for ontologies. Some are proprietary and
2. Integrated Intelligent Research (IIR) International Journal of Business Intelligents
Volume: 06 Issue: 02, December 2017, Page No.45-47
ISSN: 2278-2400
46
others are standard-based. Examples of ontology languages are
Web Ontology Language (OWL)/Extensible Markup Language
(XML) or Resource Description Framework (RDF)/XML
format. [10] Other languages include Common Algebraic
Specification Language (CASL), Common Logic, Developing
Ontology-Grounded Methods and Applications (DOGMA) and
Rule Interchange Format (RIF). [11] There are many open-
source tools that you can use to build your domain-specific
model. They come in different names like ontological
engineering tool, ontology editor, knowledge management
tool. The popular ones are Protégé created by the University of
Stanford [12] and General Architecture for Text Engineering
(GATE) which is a collaboration of different people and their
industry partners. [13]
4. BUILDING AN ONTOLOGY MODEL AND
FRAMEWORK
Thakor and Sasi [4] cited a five-step process in building an
ontology model which is as follows:
1. Data extraction
2. Data cleaning to remove special characters and foreign
languages
3. Text parsing using GATE software
4. Data cleaning of the result of text parsing to remove
duplicated and non-qualified nouns and verbs.
5. Building an Ontology model
Figure 1: Ontology Model Building Process According to
Thakor and Sasi
During the data extraction, a script can be written to extract the
social media content. Then, data cleaning can be performed
using Excel macros. Class object and object properties of a
specific domain are used as inputs to build an ontology model.
Then text parsing can be performed to analyze strings of text
and identifying the important keywords. From the result of text
parsing, data cleaning is performed to make sure that there are
no duplicated and non-qualified nouns and verbs. Finally, an
ontology model can be built using software. The result of the
process is an ontology model in OWL/RDF/XML format.
To do sentiment analysis, the ontology model is used to query
specific information for example to identify the polarity, which
is positive or negative view, of a sentiment. An ontology
framework by Kaur et al., presents a user interface, which can
be used to interact with the system and SPARQL queries in
OWL’s ontology which can directly query from the database.
After that there is middle software which is focused on
management services and then the core applications. [11]
Figure 2: An Ontology Framework Proposed by Kaur et al.
5. RESULT AND CONCLUSION
In this paper, the author compared two (2) ontology
frameworks.
Feature 1 2
Data
Extraction
Yes Yes
Data
Cleaning
Yes No
Ontology
Building
Yes Yes
Table 1: Comparison of two (2) Frameworks
Table 1 shows a comparison of two (2) frameworks. One
proposed by Thakor and Sasi [4] and another one proposed by
Kaur et al. [11] 1 in the table refers to the first framework and
2 refers to the latter. Based on the identified important features
of an ontology model, the author compared these frameworks.
The two (2) frameworks have the data extraction and ontology
3. Integrated Intelligent Research (IIR) International Journal of Business Intelligents
Volume: 06 Issue: 02, December 2017, Page No.45-47
ISSN: 2278-2400
47
building feature in the framework. However, the framework
proposed by Kaur et al. [11] have no data cleaning feature.
Based on the comparison of two frameworks, the author
concluded that these features data extraction, data cleaning and
ontology building are important features of any ontology
framework and the ontology proposed by Thakor and Sasi [4]
is a complete framework.
6. References
[1] Shah, R., Jain, S. (2014). Ontology-based Information
Extraction: An Overview and a study of different
approaches. International Journal of Computer
Applications. Volume 87(No. 4). Retrieved from
https://pdfs.semanticscholar.org/f533/73c8eba5a75f7f5eb
5ba61f986accef6bee9.pdf.
[2] Wimalasuriya, D.C, Dou, D. (2010). Components for
Information Extraction: Ontology-Based Information
Extractors and Generic Platforms. Retrieved from
http://aimlab.cs.uoregon.edu/obie/papers/cikm255m-
wimalasuriya.pdf.
[3] Sentiment Analysis. (2017, February 26). Retrieved from
https://www.lexalytics.com/technology/sentiment.
[4] Thakor, P., Sasi S. (2015). Ontology-based Sentiment
Analysis Process for Social Media Content. Retrieved
from
http://www.sciencedirect.com/science/article/pii/S187705
0915017986.
[5] Sauf, H., He, Y., Alani, H. (N.D.), Semantic Sentiment
Analysis of Twitter. Retrieved from
https://pdfs.semanticscholar.org/ec4a/94637ecd11521986
9e9df8902cb7282481e0.pdf.
[6] Hassim, M. (2015, August 8). Retrieved from
https://www.linkedin.com/pulse/ontologyan-explicit-
specification-muhammad-hassim.
[7] Uschold, M., Gruninger M., (2004). Ontologies and
Semantics for Seamless Connectivity. Retrieved from
https://pdfs.semanticscholar.org/a610/22f5745c23ee742e
a838bff905b60c8cc138.pdf.
[8] Ling, T. C., Jusoh, Y. Y., Adbullah, R., Alwi, N. H.
(2013). An Ontology for Software Engineering
Education. Retrieved from
http://files.eric.ed.gov/fulltext/ED557194.pdf.
[9] Ontology language. (n.d.). Retrieved 2017, February 27,
from https://en.wikipedia.org/wiki/Ontology_language.
[10] Cardoso, J. The Web Ontology (OWL) and its
Applications. Retrieved from https://jorge-
cardoso.github.io/publications/Papers/BC-2015-031-ISR-
OWL-and-Its-Applications.pdf.
[11] Kaur, P., Sharma, P., Vohra, N. (2015). An Ontology-
based Elearning System. International Journal of Grid
Distribution Computing 8 (No. 5). Retrieved from
http://www.sersc.org/journals/IJGDC/vol8_no5/27.pdf.
[12] Protégé. Retrieved from http://protege.stanford.edu/.
[13] GATE: a full-lifecycle open source solution for text
processing. (n.d.). Retrieved from
https://gate.ac.uk/overview.html.