Self-Adaptive Based 
Natural Language Interface 
for Disambiguation of 
Semantic Search 
NURFADHL INA MOHD SHARE F NURFADHL INA@UPM.EDU.MY 
MOHAMMAD YAS S ER SHAFAZAND 79. ZAND@GMAI L .COM 
FACULT Y OF COMPUT ER SCI ENCE AND INFORMAT ION T ECHNOLOGY, 
UNIVERS I T I PUTRA MALAYS IA 
S ERDANG, S E LANGOR, MALAYS IA
"Big Data" refers to data sets whose size is beyond the ability of 
typical database software tools to capture, store manage and analyze 
(McKinsey). 
“Linked Data” stands for semantically well structured, 
interconnected, syntactically interoperable datasets that are 
distributed among several repositories either inside or outside 
organisations http://www.semantic-web.at/big-data-linked-data
Utilizing Linked Data and Big Data for organisational and 
enterprise purposes will be one of the next big challenges in 
the evolution of the web. 
Big Data takes account of the fact that new techniques and 
technologies are needed for the sustainable and socially 
balanced exploitation of huge data pools. The Linked Data 
paradigm is one approach to cope with Big Data, as it advances 
the hypertext principle from a web of documents to a web of 
rich data.
Semantic Web: a webby way to link data 
Open Data meets the Semantic Web: Linked Open Data 
http://www.semantic-web-journal.net/system/files/swj488.pdf
One of the key challenges in making use of Big Data lies in 
finding ways of dealing with heterogeneity, diversity, and 
complexity of the data, while its volume and velocity forbid 
solutions available for smaller datasets as based, e.g., on 
manual curation or manual integration of data. Semantic 
Web Technologies are meant to deal with these issues, 
and indeed since the advent of Linked Data a few years ago, 
they have become central to mainstream Semantic Web 
research and development. 
We can easily understand Linked Data as being a part of the 
greater Big Data landscape, as many of the challenges are 
the same. The linking component of Linked Data, however, 
puts an additional focus on the integration and conflation 
of data across multiple sources.
Volume Velocity Variety 
BIG DATA 
Value and 
Veracity 
Supercomputing 
Internet of Things 
Semantic Web 
Social Science
Smart Data 
Smart data makes sense out of Big data 
http://amitsheth.blogspot.com/2013/06/transforming-big-data-into-smart-data.html 
It provides value from harnessing the challenges posed by 
volume, velocity, variety and veracity of big data, in-turn providing actionable 
information and improve decision making. 
uses background knowledge, experiences, advanced and contextualized 
reasoning, and is often highly personalized 
focused on the actionable value in data creation, processing and consumption 
phases for improving the human experience
5 steps to Turn Big Data into Smart Data 
http://tdwi.org/Articles/2014/07/15/Turning-Big-Data-into-Smart-Data-2.aspx?Page=1 
1. Add meaning 
2. Add context 
3. Embrace Graphs 
4. Iterate 
5. Adopt standard
Natural Language Query Generated SPARQL 
What is the lowest point in kansas? SELECT ?c0 
WHERE { 
?c0 ?p0 ?i0 . ?c0 a geo:LoPoint . 
filter (?i0 = geo:kansas) . 
filter ( ?p0 = geo:isLowestPointOf ) . 
} 
What is the area of idaho? SELECT ?i0 
WHERE { 
?c0 ?p0 ?i0 . 
filter (?c0 = geo:idaho) . 
filter ( ?p0 = geo:stateArea ) . 
} 
what states border oklahoma? SELECT ?i0 
WHERE { 
?c0 ?p0 ?i0 . ?i0 a geo:State . 
filter (?c0 = geo:oklahoma) . 
filter ( ?p0 = geo:borders ) . 
} 
what is the population of oregon? SELECT ?i0 
WHERE { 
?c0 ?p0 ?i0 . 
filter (?c0 = geo:oregon) . 
filter ( ?p0 = geo:statePopulation ) . 
}
Ambiguities in Querying Big Data 
when there are more than one possible concept annotation for a word in the 
NL input 
when a word inside the NL input cannot be matched with any KB concept 
when constructing the SPARQL where there is more than one possibility of 
SPARQL pattern
Self Adaptive Model for Semantic Data Search in Big Data
Input: NL query 
Output: Answer 
Process: 
1. Load ontology and build a matrix of the object properties, classes and instances and its 
connections 
2. Let T as the tokenized and stemmed NL query 
3. For each tT, let A be the set of annotation based on relevant concepts 
4. For each aA 
a. Create and add possible triplets, filters and options statements using dictionary 
and reasoner (using bottom up reasoning rules) 
b. Create new SPARQL syntax using (4(a)) 
c. Run SPARQL and send statements and results to reasoner. 
5. Return last created SPARQL syntax which has results.
Results 
The SANLI is tested on two different datasets namely the Mooney’s Geography 
ontology and a Quran structure ontology. 
SANLI is able to correctly answer all questions in the geography ontology 
where the questions have <s, p, o>, <o, p, s>, <p, o>, <o, p> and <o > patterns 
identified. 
Rules for other patterns have not yet been implemented. For example <o, p, o> 
patterns mostly result in a true false result as in “Does Texas border Oklahoma?” 
which we have not implemented yet.
Conclusion 
The Semantic Web can leverage the sophisticated analytics with big 
data. 
Big Data and Linked Data will be an integral part of the future web 
infrastructure, where massive amounts of data are available, 
connected and identifiable via Uniform Resource Identifiers. 
More personalized-based applications to exploit smart data to its 
maximum potential

Self adaptive based natural language interface for disambiguation of

  • 1.
    Self-Adaptive Based NaturalLanguage Interface for Disambiguation of Semantic Search NURFADHL INA MOHD SHARE F NURFADHL INA@UPM.EDU.MY MOHAMMAD YAS S ER SHAFAZAND 79. ZAND@GMAI L .COM FACULT Y OF COMPUT ER SCI ENCE AND INFORMAT ION T ECHNOLOGY, UNIVERS I T I PUTRA MALAYS IA S ERDANG, S E LANGOR, MALAYS IA
  • 2.
    "Big Data" refersto data sets whose size is beyond the ability of typical database software tools to capture, store manage and analyze (McKinsey). “Linked Data” stands for semantically well structured, interconnected, syntactically interoperable datasets that are distributed among several repositories either inside or outside organisations http://www.semantic-web.at/big-data-linked-data
  • 3.
    Utilizing Linked Dataand Big Data for organisational and enterprise purposes will be one of the next big challenges in the evolution of the web. Big Data takes account of the fact that new techniques and technologies are needed for the sustainable and socially balanced exploitation of huge data pools. The Linked Data paradigm is one approach to cope with Big Data, as it advances the hypertext principle from a web of documents to a web of rich data.
  • 5.
    Semantic Web: awebby way to link data Open Data meets the Semantic Web: Linked Open Data http://www.semantic-web-journal.net/system/files/swj488.pdf
  • 6.
    One of thekey challenges in making use of Big Data lies in finding ways of dealing with heterogeneity, diversity, and complexity of the data, while its volume and velocity forbid solutions available for smaller datasets as based, e.g., on manual curation or manual integration of data. Semantic Web Technologies are meant to deal with these issues, and indeed since the advent of Linked Data a few years ago, they have become central to mainstream Semantic Web research and development. We can easily understand Linked Data as being a part of the greater Big Data landscape, as many of the challenges are the same. The linking component of Linked Data, however, puts an additional focus on the integration and conflation of data across multiple sources.
  • 7.
    Volume Velocity Variety BIG DATA Value and Veracity Supercomputing Internet of Things Semantic Web Social Science
  • 8.
    Smart Data Smartdata makes sense out of Big data http://amitsheth.blogspot.com/2013/06/transforming-big-data-into-smart-data.html It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making. uses background knowledge, experiences, advanced and contextualized reasoning, and is often highly personalized focused on the actionable value in data creation, processing and consumption phases for improving the human experience
  • 9.
    5 steps toTurn Big Data into Smart Data http://tdwi.org/Articles/2014/07/15/Turning-Big-Data-into-Smart-Data-2.aspx?Page=1 1. Add meaning 2. Add context 3. Embrace Graphs 4. Iterate 5. Adopt standard
  • 10.
    Natural Language QueryGenerated SPARQL What is the lowest point in kansas? SELECT ?c0 WHERE { ?c0 ?p0 ?i0 . ?c0 a geo:LoPoint . filter (?i0 = geo:kansas) . filter ( ?p0 = geo:isLowestPointOf ) . } What is the area of idaho? SELECT ?i0 WHERE { ?c0 ?p0 ?i0 . filter (?c0 = geo:idaho) . filter ( ?p0 = geo:stateArea ) . } what states border oklahoma? SELECT ?i0 WHERE { ?c0 ?p0 ?i0 . ?i0 a geo:State . filter (?c0 = geo:oklahoma) . filter ( ?p0 = geo:borders ) . } what is the population of oregon? SELECT ?i0 WHERE { ?c0 ?p0 ?i0 . filter (?c0 = geo:oregon) . filter ( ?p0 = geo:statePopulation ) . }
  • 11.
    Ambiguities in QueryingBig Data when there are more than one possible concept annotation for a word in the NL input when a word inside the NL input cannot be matched with any KB concept when constructing the SPARQL where there is more than one possibility of SPARQL pattern
  • 12.
    Self Adaptive Modelfor Semantic Data Search in Big Data
  • 13.
    Input: NL query Output: Answer Process: 1. Load ontology and build a matrix of the object properties, classes and instances and its connections 2. Let T as the tokenized and stemmed NL query 3. For each tT, let A be the set of annotation based on relevant concepts 4. For each aA a. Create and add possible triplets, filters and options statements using dictionary and reasoner (using bottom up reasoning rules) b. Create new SPARQL syntax using (4(a)) c. Run SPARQL and send statements and results to reasoner. 5. Return last created SPARQL syntax which has results.
  • 14.
    Results The SANLIis tested on two different datasets namely the Mooney’s Geography ontology and a Quran structure ontology. SANLI is able to correctly answer all questions in the geography ontology where the questions have <s, p, o>, <o, p, s>, <p, o>, <o, p> and <o > patterns identified. Rules for other patterns have not yet been implemented. For example <o, p, o> patterns mostly result in a true false result as in “Does Texas border Oklahoma?” which we have not implemented yet.
  • 15.
    Conclusion The SemanticWeb can leverage the sophisticated analytics with big data. Big Data and Linked Data will be an integral part of the future web infrastructure, where massive amounts of data are available, connected and identifiable via Uniform Resource Identifiers. More personalized-based applications to exploit smart data to its maximum potential