My thesis

1
HEURISTICS-BASED QUERY OPTIMISATION SOLUTION IMPLEMENTATION IN RSP
ENGINES: THE CQELS AND C-SPARQL
Submitted in fulfilment of the requirements for the degree of Masters of Science
Supervisor:
Co-supervisor
The Insight Centre for Data Analytics, National University of Ireland, Galway
September, 2016

2
Abstract
This thesis addresses the gravity of basing and constructing the query optimisation process
executed in RDF stream processing engines around an efficient heuristics engine. The Resource
Description Framework (RDF has taken the world over by storm and by its golden standard of
data stream processing and communication of real-time data items collected from medical
institutions, industrial plants, financial entities, and telecommunication service providers. For
instance, DBPedia and Yago help reinforce structural querying in Wikipedia searches by
retrieving metadata and encoding them in an RDF format. Also, biological information such as
experiments and their distinctive results are stored as RDF data compilations to enable sufficient
communication between chemists and biological specialists.
The data streaming framework has been highlighted by the invention of the Semantic
Web by Tim-Berners Lee that works to stream linked data from sourced documents and
applications, thus serving users with precise web pages. However, the query optimisation
performed in both of these query languages is still somewhat deficient in regards to the time
expended before the results of the search are delivered. The execution of flawed queries is also
another worrying factor in the query optimisation function of the RSP engines. All these
elements: lengthy run-time, extravagant computational costs such as join operations, and the
implementation of inaccurate queries contribute to the downgrade of RDF stream processing.
Heuristics will help identify early error signs in the user queries and solve them by use of
its inbuilt configurations and algorithms. The novel heuristics optimisation model can be used as
a benchmark in querying of the Semantic Web metadata in departments such as in military
logistics, data warehousing, engineering analysis, and health care. Some of the main

3
contributions of this research work include: (i) Deploy an implementation of reference on
existing CQELS and C-SPARQL execution framework; (ii) Extend the two RSP engines
(CQELS and C-SPARQL). This new engine helps in allowing the processing and resource space
sharing among multiple concurrent queries; (iii)Evaluate the performance of the extended RSP
engines and compare them with the first released CQELS and C-SPARQL engines. The results
of the evaluation show a remarkable improvement in the performance in addition to the
demonstration of the practicality of the approach used.

4
Table of Contents
Table of Contents........................................................................................................................................4
Chapter 1: Introduction...............................................................................................................................9
1.1 Motivation.........................................................................................................................................9
1.2 Problem Statement and Hypotheses...............................................................................................10
1.3 The Outcome of the Thesis..............................................................................................................14
1.3.1 Adaptive execution framework.................................................................................................14
1.3.2The linked data stream adaptive processing model...................................................................14
1.3.3 Algorithms and data structures for triple-based windowing operator incremental evaluation15
1.3.4 The techniques for optimization for multiway joins.................................................................16
1.4The Outline of This Thesis.................................................................................................................16
Chapter 2: The General Background..........................................................................................................17
2.1Introduction......................................................................................................................................17
2.2 Comparative and Survey Evaluations...............................................................................................24
2.3Query Optimization..........................................................................................................................27
2.4RDF Stream Processing and Semantic Web......................................................................................29
Chapter 3: Background to RSP Engines......................................................................................................32
3.1 C-SPARQL.........................................................................................................................................32
3.2CQELS................................................................................................................................................32
3.2.1 Introduction..............................................................................................................................34
3.2.2 Proposed heuristics approach...................................................................................................37
3.2.3 Results simulation.....................................................................................................................43
3.2.4 The performance comparison graph between new improved model and the previous version
of CQELS and C-SPARQL.....................................................................................................................46
Chapter 4: State of The Art in LSDP or the Linked Stream Data Processing...............................................53
4.1 Query Semantics and Data Models..................................................................................................53
4.2 Data Model......................................................................................................................................53
4.3 Query Semantics..............................................................................................................................55
4.4 Query Languages.............................................................................................................................55

5
Chapter 5: The Optimization Solutions for the CQELS...............................................................................59
5.1 The Adaptive Optimizer...................................................................................................................65
5.2 The Dynamic Executor.....................................................................................................................67
Chapter 6:Exploration of the RDF Engine – Continuous C-SPARQL............................................................69
Chapter 7: Adaptive Query Optimiser in RDF Engines...............................................................................74
7.1 Adaptive Query Optimiser...............................................................................................................74
7.2 MultiwayJoins Adaptive Cost-based Optimisation...........................................................................74
7.3 Shared Window Joins Optimisation.................................................................................................76
7.4 Multiple Join Operator.....................................................................................................................76
7.5 Features of Adaptive Query Optimization.......................................................................................78
7.6 Adaptive Plans Concepts..................................................................................................................79
Chapter 8: Conclusion and Future Work....................................................................................................81
8.1 Conclusion.......................................................................................................................................81
8.2 Future Work.....................................................................................................................................84
References.................................................................................................................................................87

6
List of Figures
Figure 1: Semantic Web processing...........................................................................................................29
Figure 2: Query flow through a DBMS.......................................................................................................37
Figure 3: Binary tree..................................................................................................................................38
Figure 4: Magic tree...................................................................................................................................39
Figure 5: Cost versus time graph...............................................................................................................45
Figure 6: Performance versus complexity..................................................................................................46
Figure 7: Graphical performance comparison...........................................................................................48
Figure 8: An architecture of the C-SPARQL engine....................................................................................72

7
List of Tables
Table 1: Algorithm 1..................................................................................................................................40
Table 2: Algorithm 2..................................................................................................................................42
Table 3: Query 1........................................................................................................................................44
Table 4: ThePerformance Comparison by Features...................................................................................47
Table 5: Performance Comparison by the Mechanism of Execution.........................................................47

8
Summary
This work aims at exploring query optimisation solution implementation in RSP engines namely;
the CQELS and C – SPARQL. The framework presents one of the continuous query languages
which are compatible with SPARQL. This structure is introduced over both linked data and
linked stream data. In practice, the framework is very flexible, hence enabling performance gains
of various magnitude orders over other related systems. An efficient hybrid physical data
organisation that uses a novel data structure in supporting algorithms helps to deal with high
update throughput RDF streams and large RDF datasets. Additionally, this framework also gives
provision for various adaptive optimisation algorithms. This thesis also provides extensive
experimental evaluations for the demonstration of the advantages of the CQELS and C –
SPARQL processing engines and framework regarding performance. Furthermore, these
assessments aim at covering a comprehensive set of parameters that plays a significant role in
dictating the performance of the continuous queries over both the linked data and the linked stream
data.

9
Chapter 1: Introduction
As the primary purpose of this research study is the exploration of the gravity of basing
and constructing the query optimisation process executed in RDF stream processing engines
around an efficient heuristics engine, the introduction starts with the motivation. Afterward, it
discusses the problem statement and hypotheses. Next, this chapter touches on the thesis
outcome, and lastly the thesis outline.
1.1 Motivation
It is crucial to note that the world is currently witnessing a scenario of paradigm shift
(Abdulla and Matzke 2006, p.29). In essence, the real time and data that depends on such time
continuous to become ubiquitous (MacLennan and Tang 2009, p.61). In the last few years, little
was known about things such as the sensor devices (Mueller 2009). For example, compasses,
cameras, mobile phones, GPS, accelerometer and so on. Additionally, stations for observing the
weather such as humidity, temperature and so forth are on the continuous rise in producing a
large quantity of information in the form of data streams (Cheung, Hong, and Fong 2006, p.55).
Furthermore, things like the systems that monitor the patients such as the blood pressures, heart
rate and so on, and also the systems that track the locations such as the RFID, GPS, etc., play a
vital role in this process. Moreover, the building management systems that includes the
conditions of the environment, the consumption of energy and so on, the cars that include the
both the driver and engine (Abdulla and Matzke 2006) monitoring also records an equal
tremendous increase in the production of such quality information (Cole and Conley 2009, p.53).
In addition, the web equally has several services that include the use of Facebook, Twitter, and

10
some blogs that helps in the delivery of streams of real-time data that is typically unstructured on
various topics.
1.2 Problem Statement and Hypotheses
In practice, the motivation of this kind of thesis results in larger problems of research that
always arise at the time of building an efficient Linkage Stream Data query processing engine.
One of the major problems is how to design a new declarative query language. According to
research (e.g. Abdulla and Matzke 2006, 145; Buchanan and Shortliffe 1984, p.99), this kind of a
problem mostly arises due to neither SPARQL nor the state of the art continuous query
languages could assist in querying Linkage Stream Data. In practice, a query language requires a
sound semantics and a formal data model of continuous query operators (MacLennan and Tang
2009, p.187). In essence, the data model must have the ability to represent both the Linked Data
and the Linked Stream Data in a unified view. In this case, the new data model must be an
extension of the Resource Description Framework model to allow for a transparent integration of
conventional Resource Description Framework databases (Zhang and Kollios 2007, p.85). In
light of the continuous query processing, there is a required property of a temporal aspect of the
data that has not been earlier covered by any of the Resource Description Framework
(MacLennan and Tang 2009, p.22). Alongside the data model, there must also be a definition of a
graph base query operators that have continuous semantics in the specification of the meaning of
the declarative query patterns (Buchanan and Shortliffe 1984, p.163). Worth noting, for the
primary purposes of reducing the efforts of learning, it is important to have a query pattern that
resembles an SPARQL. Furthermore, the kind of activity requires some alignment of the query
operators with the semantics of SPARQL. Additionally, this kind of alignment must be

11
compatible with the operations of the window according to its definition in the traditional
continuous queries, for example, the CQL.
It is important to note that when given the disadvantages of using unmodified triple
storages and the Data stream management systems for the Linked Stream Data, Resource
Description Framework based stream data displays new issues for the physical organization for
both the Linked Data and Linked Stream Data (MacLennan and Tang 2009, p.149). Most
importantly, a triple table storing identifiers that represent literals and URIs are the standard
models of storing bags (Abdulla and Matzke 2006, p.109). In the process, such activity combines
with mapping tables in the form of dictionaries in translating that identifies back into the form of
lexical (Cole and Conley 2009, p.203). The Linked Stream Data necessitates a high writing
throughout, on the other hand, this kind of data has the design for heavy read intensive context
(Zhang and Kollios 2007, p.148). Another important thing to note is the remedy of the Data
stream management systems for the Linked Stream that can write intensive requirement by the
use of storage of in-memory. However, this kind of data entails a Linked Data that can
sometimes be not possible in hosting in the main memory (Cole and Conley 2009, p.209).
Furthermore, Resource Description Framework based data elements such as temporal RDF
triples and RDF triples are very small. In effect, they display an enormous individual data points
in comparison to the quantity of the encoded information.
In practice, the efficiency of the raw-based data structure used in relational Data stream
management systems is very is not sufficient since it needs sizes of tuple header that can
dominate the total size of the storage (Cole and Conley 2009, p.211). It is important to note that
the raw based data structure designed for shorter and wider tables can sometimes rises
significantly the ways for processing stream. In effect, there is a need for a new physical

12
approach of organisation for processing both Linked Data and Linked Stream Data (Buchanan
and Shortliffe 1984, p.92). Resource Description Framework based operators of continuous
query typically operate on a few or one very large tables (MacLennan and Tang 2009).
Therefore, it plays a vital role in having indexes for the random data items’ access. It is also
important to note that most of the modern Resource Description Framework stores give
provision for a massive strategy of indexing in overcoming their large handicap (Cole and
Conley 2009, p.173). In essence, it is always possible in bypassing such tables since the indexes
cover all the accessing patterns. Notably, a comprehensive indexing scheme has a very high
maintenance cost hence making it impractical for stream processing. In addition, some of the
stream data indexing solutions might appear helpful but their designs only make them applicable
for relational streams (Abdulla and Matzke 2006). In effect, an investment of a hybrid solutions
that can be applicable for strategies of indexing of both stream data processing and triple
storages forms part of an interesting problem (Cole and Conley 2009, p.239). Additionally,
another issue that associates with the physical representation of Resource Description
Framework based stream data is the way of efficient evaluation of the unbound nature of streams
versus the window operators.
It is both worthy and in order to note that there are several attempts in Data stream
management systems to support the queries of the sliding windows (Cole and Conley 2009,
p.243). Most importantly, one of such efforts is the independent re-evaluation over each of the
windows from all other windows. In practice, this kind of process is referred to as the re-
evaluation computation (Abdulla and Matzke 2006, p.199). Worth noting, this approach is useful
in both the Borealis and Aurora. Additionally, there is also another method known as the
incremental evaluation computation that only plays a significant role in processing changes that

13
get expired and inserted tuples in the windows in the pipeline of the query (MacLennan and Tang
2009, p.272). In essence, this kind of approach is useful in Nile and Stream. In contexts of these
activities, there exist some shortcoming to employ incremental methods of evaluation (Cole and
Conley 2009, p.287). Practically, these methods include both the negative tuples and the direct
time stamps. Notably, the method of the negative tuple doubles the tuple number through the
pipeline of the query. On the other hand, the direct method of the timestamps requires some extra
timestamps. In practice, with the introduction of the new data structures in this thesis, the
associated effective algorithms to compute operators of windowing must always address the
unusual characteristics of data.
A Resource Description Framework triple storage has an exceptionally thin and long
table that are not standard optimization (Cole and Conley 2009, p.368). In this case, it is always
quite challenging for the traditional Data stream management systems to give statistics that are
relevant for query optimizer. In addition, this kind of challenge is still applicable to the
processing Linkage Data and Linked Stream Data. It becomes even more challenging to maintain
high dynamic datasets of statistics in the setting of stream processing (Cole and Conley 2009,
p.394). Most importantly, such type continuous query processing’s adaptivity query optimiser
becomes harder to achieve due to the unpredictivity of Resource Description Framework data
and the dynamic nature of the stream data distributions (MacLennan and Tang 2009, p.400).
Moreover, the SPARQL just like queries always consist of share query patterns posting the
requirements of optimization of the multi-query (Cole and Conley 2009, p.386). Some of the
proposed approaches for relational streams might sometimes fail to work on the Resource
Description Framework based on the stream, although there exist several efforts in the multi-
query optimisation (Abdulla and Matzke 2006, p.397). In light of these approaches, such failure

14
to work mostly results from its various natures in comparison to the relational one (Zhang and
Kollios 2007, p.391). In effect, it becomes very challenging in enabling multi-query optimisation
for Linked Data Streams.
1.3 The Outcome of the Thesis
In light of the issues stated above, the outcome this thesis would include:
1.3.1 Adaptive execution framework
This kind of framework will enable adaptivity in RSP engines: the CQELS and C –
SPARQL (Abdulla and Matzke 2006, p.402). Additionally, the framework can allow full control
of the process of execution with the flexibility of adding new algorithms and new data structure
to the query engine component (MacLennan and Tang 2009, p.433). Essentially, the framework
uses the encoding mechanisms in enabling the implementation of a small footprint and less
workload of the operators by performing only on fixed, small size integers (Buchanan and
Shortliffe 1984, p.266). It is important to note that the Linked Data parts catching solution for
subqueries helps in improving the performance and scalability of the query processing on the
collection of Linked Data (Zhang and Kollios 2007). In practice, the framework can address the
problem of scalability to integrate large static datasets with the proposed caching mechanism.
1.3.2The linked data stream adaptive processing model
This thesis recommends an adaptive processing model such as the formal definition of
query semantics, the data model, and the model of execution (Cole and Conley 2009, p.437). It is
important to note that the data model covers both the temporal aspect of Linked Data sets and the
Linked Stream Data that are yet to be addressed (Zhang and Kollios 2007, 434). On the other
hand, the query semantics get formalized by the use of both the operational and mathematical
meanings. In the first place, the precise meaning is helpful in showing the way of mapping a

15
declarative query fragment in response to the mathematical expressions (Cole and Conley 2009,
p.441). Additionally, the abstract syntaxes play a significant role in accompanying all the query
fragments to define a declarative query language with an extension from SPARQL (Buchanan
and Shortliffe 1984, p.280; Zhang and Kollios 2007, p.404). On the other hand, the operational
meanings help in the definition of the way of executing the operators in the expressions in the
physical execution plans (MacLennan and Tang 2009, p.432). In this case, the operational
semantics plays a significant role in showing the performance model for the constant execution
of the equivalent execution plans for a query expressed in both CQELS and C – SPARQL
languages (Cole and Conley 2009, p.470). In effect, this kind of operational feature helps in the
facilitation of the adaptivity of the execution engines based on the processing models (Zhang and
Kollios 2007, 355). This kind of scenario occurs due to the its ability to execute engine to
dynamically change to another equivalent execution plan from the current one for adapting to the
variations in the run-time (MacLennan and Tang 2009, p.446). In short, the CQELS language is
both the only language that get accompanied with the sound operational and mathematical
semantics and also one of the first query language for Linked Stream Data.
1.3.3 Algorithms and data structures for triple-based windowing operator incremental
evaluation
In this case, the introduction of the novel operator-aware data structures in association
with efficient additional evaluation algorithms in dealing with both the unusual properties of
query patterns and the RDF stream is helpful (Cole and Conley 2009, p.422). Most importantly,
the design of these data structures allows for the handling of intermediate mappings and small
data items contained in the processing state. Worth noting, these kind of data structures consists
of different cost indexes that have low maintenance in supporting high throughput in the

16
operations of probing that are useful in various implementations of operators (Abdulla and
Matzke 2006). In context of this kind of data there was a need for proposing various algorithms
in enabling incremental evaluation of some of the basic operators that include the elimination of
the duplicates, join, and aggregation (MacLennan and Tang 2009, p.453). In short, these kind of
algorithms aims at overcoming typical issues involved in incremental evaluation of the
windowing operators.
1.3.4 The techniques for optimization for multiway joins
In essence, this thesis explores the use of techniques of adaptive optimization to improve
the performance of the multiway joins (Abdulla and Matzke 2006, p.456). It is important to note
that this is one of the most expensive operators of query in the pipeline of query (Cole and
Conley 2009, p.472). Practically, this kind of adaptive cost model is useful in designing two
adaptive algorithms for the dynamic optimization of a query of a two-multiway join.
1.4The Outline of This Thesis
The organisation of the remaining part of this thesis is as follows: Chapter 2 explores the
general background on Linked Data processing and stream processing. Chapter 3 presents the
background to RSP engines (the CQELS and the C-SPARQL). Chapter 4 touches on the State of
the Art in LSDP or the Linked Stream Data Processing. Chapter 5: explores the optimisation
solutions for the CQELS. Chapter 6 mainly explores the RDF engine – continuous C – SPARQL.
Chapter 7 evaluates the RSP engines framework, and finally, Chapter 8 points to the future work
after the conclusion of this thesis in the same chapter.

17
Chapter 2: The General Background
This chapter explores the background techniques and concepts for Linked Data
processing and stream processing. Additionally, this background information also gives provision
for the fundamentals of stream processing that is applicable to the Linked Stream Data. In short,
the chapter discusses the representation of continuous semantics, basic techniques and models,
and the operators and the methods of optimisation, and the way of handling issues such as
memory overflow and time management (MacLennan and Tang 2009). In addition, the chapter
presents the definition of the semantics of SPARQL and Resource Description Framework data
model queries and the relevant notations. This general background also gives an overview of the
way of storing Resource Description Framework and its query by the use of SPARQL.
2.1Introduction
The term ‘heuristic’ is Greek for ‘discover’ or ‘find’ (Calhoun and Riemer 2001).
Heuristics is a common practice applied in multiple industry fields for the benefits of observing,
learning, and spotting malware errors and other problems by use of experience. For example,a
well-modelled heuristics technique is enlisted in antimalware programs to learn and spot
computer threats such as Trojan horses, viruses, and worms. The learning and observation aspect
of the heuristics framework operates by scanning computer documents capturing the signatures
they are differentiated with (Chen 2009). After reading the unique signatures of the computer
files such as tiny macros, find commands, or even subroutines, the heuristics uses its memory
and experience to identify the already read threats.
According toCIKM 2006 Workshops (2006), heuristics entail a suite of rules geared
towards enhancing the probability of identifying and ironing out problems in a given structure.

18
When applied in the computer science field, heuristics is considered as an algorithm engineered
to present viable solutions to any arising glitches in a given scenario. The heuristics discipline
generally examines how information is studied, captured, and discovered.When engaged in
artificial intelligence, computer science, or mathematical optimisation, heuristics engineswork to
decipher problems in a fast and efficient way when the conventional methods are acting up, are
not fast enough, or fail to calculate accurate solutions(Cheung et al. 2006, p. 49). If the heuristics
path is chosen in the failure of conventional methods, it is seen as a shortcut as it speeds up the
process. As Cohen (1985), says, heuristics can either work in isolation generating solutions by
themselves or in combination with optimisation algorithms all geared towards increasing the
RSP’s effectiveness (Gedik 2006). The more advanced version of heuristics thoroughly inspects
then traces the guidelines put in the codes of programs prior to passing them over to the
computer’s processing unit for execution. This will help the heuristics engine to assess and learn
the behaviour and mannerisms of that program while it runs in a virtual setting.
The current querying strategies enlisted in CQELS and C-SPAQRL waste a lot of
valuable time while performing incorrect and inept queries that may be keyed in by an end user
who is not quite familiar with the intricate querying descriptions, say Gore (1964). In as much as
the database servers within the CQELS and C-SPARQL systems may recognize these inefficient
queries, the end computer users and internet browsers are not aware of these incorrectly stated
queries, and, hence, may continue ringing them. As this happens, the entire performance and
speed of the language engines is incrementally impaired thus having less and less of total number
of data retrieval executed per unit time. In a bid to look for solution of the system downgrade, the
users opt to refer the issue to the DBA to help them code the efficient queries. Similarly, this
DBA consultation also results in time wastage as well. This is where the incorporation of a

19
heuristics engine comes into play. By assimilating a heuristics function in the querying of the
CQELS and C-SPARQL languages, a substantial amount of time and querying effort will be
saved(Cheung et al. 2006, p. 57). The heuristics function will serve as a query optimiser that will
skim through the input user query, inspecting it thoroughly to highlight and remove any detected
errors. According to Mcllroy (1998), unlike the DBA that recognizes the lapses in the queries yet
does nothing about them, the heuristics will work to automatically muster and reproduce a
correspondent but highly optimized query. By spotting and rectifying the inaccuracies inherent in
the queries input by the end users, the heuristics function will be discarding the time-consuming
processes of inaccurate query execution as well as the time expended to consult the DBA for
viable solutions. In this way, the system productivity and throughput will always be on an
upward curve. The frequency of accesses will be marginalized as the heuristics will lessen and
fully eradicate the number of tuples and columns browsed hence the data streaming processing
and querying accuracy will be on a winning streak.
An effort to integrate this kind of sources of information would enable a broad range of
application of near real time in the areas of green information technology, smart cities, e-health
and so on (Cole and Conley 2009, p.19). However, harvesting of such kind of data remains a
labour-intensive and a difficult task due to the heterogeneous nature of the vast streams. In
essence, such a process needs a lot of hand-crating methods. Worth noting, the remedy of this
kind of scenario involves the application of Resource Description Framework data or the RDF
data model (Schreiber 1977, p.38). In practice, this type of data model helps one to express
knowledge in a generic way. It is also necessary to note that it does not require any adherence to
a particular schema (MacLennan and Tang 2009, p.67). Efforts are underway to help in lifting
stream data to a level of semantic by semantic stream/ sensor and by the group of W3C semantic

20
network incubator (Maringer 2005). Essentially, the primary goal of the process is to make the
availability of stream data to the principles of Linked Data. Notably, this kind of concept is
referred to as the Linked Stream Data (Schreiber 1977, 103). Ordinarily, the Linked Data helps in
the facilitation of the process of data integration among the heterogeneous collections (Buchanan
and Shortliffe 1984). Another important thing to note is that the data streams has similar goals
concerning the Linked Stream Data (Schreiber 1977, p.89). Furthermore, it assists in bridging the
gap between more sources of static data and streams.
Besides a unified model of data representation, there is also a requirement of a processing
engine that can help in supporting a continuous query on both the Linked Data and Linked
Stream Data (Cole and Conley 2009, p.107). Moreover, there is always an assumption that data
get stored in a centralized repository and also changes infrequently before additional processing
(MacLennan and Tang 2009, 102). Ordinarily, this kind of scenario happens in a classical
Linkage Data processing. According to research (e.g. Zhang and Kollios 2007, p.51), it is evident
that there is always a limitation of an update on the dataset to just a small fraction of the same
dataset. Additionally, it is worth noting that this process only happens in a less frequent way, and
in some cases, the database gets replaced by a new version.
Both ‘one-time’ and ‘pull’ forms the traditional relational databases (Schreiber 1977,
p.139). In essence, there is an execution of the query after reading the data from the disk. Most
importantly, the output gives out a set of results for the same point in time (Cole and Conley
2009, p.137). On the other hand, Linked Stream Data produce new items continuously. In fact,
the data only becomes valid at the time of window. Additionally, it consistently gets pushed to
the processing query (Buchanan and Shortliffe 1984, p.99). In practice, the registration of queries
only happens once then a continuous evaluation over a given time against the dataset that

21
changes, in short, queries are continuous (MacLennan and Tang 2009, p.139). In effect, the
appearance of the new data results in the updates of the continuous query (Abdulla and Matzke
2006, p.97). It is important to note that such continuity of continuous queries and the temporal
aspect of the Linked Stream Data do not get considered in the processing engines of the Linked
Data query at the same moment (Cole and Conley 2009, p.148). Worth noting, better candidates
for processing continuous queries seem to be DSMSs or the Data stream management systems
(Zhang and Kollios 2007, 167). Ordinarily, a Data stream management system can be useful in
making a sub-component that deals with the steam data. in practice, the only problem is that no
any traditional Data stream management systems support the Resource Description Framework,
this makes it vital for the use of a data transformation step (Schreiber 1977, p.108). However, in
most cases, the use of such overhead of data transformation can sometimes be very costly in the
low-latency processing context of stream data (Sims and Yocom 2008, p.109). Furthermore,
losing full control over the execution of query means delegation of processing to a sub-system
such as the data stream management system (Cole and Conley 2009, p.145). Moreover, the
optimisation only can always get done locally in each of the subsystems (Schreiber 1977, p.143).
In this case, the subsystem is only optimized for the query patterns, a model of data, and also the
distribution of data since it gets used as a black box.
According to research (e.g. Buchanan and Shortliffe 1984, p.152), the difficulty of
predicting the structure of graphs of Resource Description Framework proves some challenges
for the traditional Data stream management systems. Moreover, they cannot effectively scale to
large quantities of the same Resource Description Framework data (Schreiber 1977, p.154).
Worth noting, this kind of difficulty in making predictions is also applicable to the Resource
Description Framework based data streams (Sims and Yocom 2008, p.151). In effect, it makes it

22
tough for the optimizers of Data stream management systems to handle. It is also necessary to
note that these optimisation problems of Data stream management systems were solved in some
ad-hoc and restricted scenarios (Cole and Conley 2009, p.162). Furthermore, some open
problems and challenges still present a good number of areas (MacLennan and Tang 2009,
p.173). In addition, a heuristic is the most of the optimisation algorithms, and they also prove to
work for certain kinds of data and queries.
In essence, this kind of facts played a significant role in motivating me to develop a
heuristic-based optimisation solution implementation for two RPS engines (C-SPARQL and
CQELS) by the use of a Java code for optimization with the naïve idea (Sims and Yocom 2008,
p.182). In practice, my approach aims to build engines with high processing performance for the
Linked Stream Data by a combination of algorithms, structures of re-engineering efficient data,
and techniques from both traditional Data stream management systems and Linked Data
processing. According to several research (such as Abbass and Newton 2002, p.135; Sims and
Yocom 2008, p.127), it is not a good practice to store Resource Description Framework data
elements by rotational tables. On the other hand, careful design of indexing schema and physical
storage plays a vital role for the performance of the triple storages (Schreiber 1977, p.94). It is
now important to note that this approach aims to design a native data structure that treats both the
Resource Description Framework and Resource Description Framework stream data elements as
citizens of the first class (Cole and Conley 2009, p.142). Most importantly, the continuous
changing of the data during the lifetime of query requires adaptive in its processing.
In essence, such action requires the introduction of adaptive execution framework known
as Continuous Query Evaluation for Linked Stream or the CQELS (Cole and Conley 2009,
p.177). It is important to note that this kind of framework gets designed to apply adaptive

23
techniques of processing in meeting the performance requirements of stream processing
(Buchanan and Shortliffe 1984, p.103; Zhang and Kollios 2007, p.171). Moreover, this kind of
framework allows for the full control of the execution process that is continuous where both the
optimization and scheduling can take place during the runtime (Schreiber 1977, p.67). In the
process, I had to create a new continuous query language as one of the first works in the
processing of the Linked Stream Data (Cole and Conley 2009, p.191). Worth noting, the
evaluation of the Linked Stream Data processing engines and conducting of the first survey
developed during the time of this thesis helps in providing an insight into the way of building an
efficient Linked Stream Data.
In this paper, we advance the integration of a heuristics engine in query optimisation of
the CQELS as well as C-SPAQRL to augment the query execution and data streaming processes.
The query optimisation operations will be done using a Java code that will serve as the optimizer
in both RSP engines. The implementation of a Java code as the optimiser in the RSP engines will
speed up operations and the query optimisation function in general. As MacLennan and Tang
(2009) claim, the code will allow the end users to unambiguously express their queries within the
code and reduce imprecise queries input. This will also help cut the incremental costs during the
computation process in regards to the projection, selection, and join functionalities as well as
other cost factors such as processor and communication time. As the data and ontology
constituents of the Web 3.0 have stabilised through the assimilation of golden standards such as
the OWL and RDF, the optimisation and solution implementation of heuristics-based querying is
next in the to-do-list.
The assimilation and solution implementation of the heuristics utility is outlined in this
thesis in this format: Section 1 discusses how heuristics can be employed in the query

24
optimisation to minimize the pertinent costs. In the projected heuristic algorithm, a query is
scanned and executed by use of the magic trees in the storage files which in turn demonstrates a
significant progress over the previous optimization approaches. The cost-based algorithm proves
that the system’s enhancement continues to improve as the query becomes even more interlaced
and dense as the user performs more intricate searches. Section 2 discusses how heuristics can be
enlisted in the Java code to significantly reduce the erroneous query executions by instinctively
recognizing and amending inefficiencies in the CQELS and C-SPARQL queries. The detection
and rectification of flaws within the queries will consequently save the huge amounts of time and
effort expended by the RSP engines in retrieving information, thus enhancing the overall
throughput and productivity of the engines. Section 3 demonstrates the impact of heuristics in its
competency to execute queries without involving join operations. The exclusion of join
operations in query optimisation will help to shrink operational costs in addition to making the
RDF data volume less bulky. The empirical results confirm that the projected heuristics model
overshadows the conventional querying techniques, for example the Jena, by 79%, regarding the
reduction of pointless intermediate results and a faster query processing time.
2.2 Comparative and Survey Evaluations
Essentially, the first experiments and survey is helpful in giving comparisons and insight
of the techniques of the data stream processing and also the Linked Stream Data processing
engines (Abdulla and Matzke 2006, p.487; Zhang and Kollios 2007, p.378). Additionally, the
first evaluation of the cross-system is to present the Linked Stream Data processing engines.
A scenario that integrates human-centric streaming data from the digital and physical
world similar to the live social semantics are just a total inspiration (MacLennan and Tang 2009,
p.474). Worth noting, the kinds of data from the physical world get captured and streamed

25
through tracking systems and sensors such as the wireless triangulation, RFID, and GPS, and the
integration can get done with the use of virtual streams such as the city traffic data, Twitters, and
the airport information in the delivery of up to date views or location based services of any
particular situation (Cole and Conley 2009, p.479). Furthermore, the conference scenario mainly
focuses on the problem of data integration between the streams of data from a static data set and
a tracking system (Abdulla and Matzke 2006). Worth noting, the system of tracking, similar to
the various real deployment in Live Social Semantics is useful to gather the relationship between
physical spaces and the real-world identifiers of the attendees of the conference. Moreover, the
non-stream datasets are used in correcting the tracking data (Cole and Conley 2009, p.482). For
example, the information that is online concerning the attendees such as the online profiles,
social network, records of publication and so on. In essence, there exist several benefits of
correlating the two sources of information (MacLennan and Tang 2009, p.453). Most important,
based on the number of the people and the topic of the talk, conference rooms could be
automatically assigned to the talks based on the total number of people that might show some
interest in attending it, based on their level of the profile (Cole and Conley 2009, p.491). In
addition, the attendees of the conference could also get notified about the co-authors found
within the location (Abdulla and Matzke 2006, p.423; Buchanan and Shortliffe 1984, p.403;
Zhang and Kollios 2007, p.348). It is also important to note that it can be very easy to assign a
service that the type of talk to attend based on the records of citation, profile, and the distance
found between the locations of the talk.
In practice, the spread of a social stream data of interest for a user occurs among various
platforms of social application such as the Twitter, Facebook, Foursquare and son on
(MacLennan and Tang 2009, p.496). Additionally, the social network analysis and aggression

26
platforms such as the Bottlenose require an integration of heterogeneous streams from various
feeds and social networks (Abdulla and Matzke 2006, p.437; Buchanan and Shortliffe 1984,
p.428). Most importantly, these kinds of platforms can easily use Linkage Stream Data
processing engines to deal with the issues of data integration (Cole and Conley 2009, p.504). In
the same context, this kind of scenario continues to focus on the different social stream
aggregation sources that the social network users create (MacLennan and Tang 2009, p.511).
Another important thing to note is that the social networks give provisions for rich resources of
the interesting stream data that includes the uploading of the photo and the sequence of social
discussions (Cole and Conley 2009, p.521). Additionally, the social networks get considered as
the best area of test for Resource Description Framework engines. Furthermore, the Resource
Description Framework can also exhibit its merits on how to represent graph data (MacLennan
and Tang 2009, p.527). Ordinarily, skewed distribution of data correlates with the data in real life
which mostly occurs in the social network data. Moreover, there is recognition of the efficient
handling of correlations as a very difficult problem by the engines of the database (Abdulla and
Matzke 2006, p.484; Buchanan and Shortliffe 1984, 503; Zhang and Kollios 2007, p.509). On
the other hand, it also plays a significant role in opening up many opportunities for the query
optimisation (MacLennan and Tang 2009, p.539). In the context of the scenario, it becomes
possible to build the simulator of data in exploiting different distributions of the skewed data and
the available correlations in a social network (Abdulla and Matzke 2006, p.437). As a
consequence, the data simulator is useful in generating some of the realistic test cases to evaluate
the Linked Stream Data processing engines.
It is important to note that various parts of this thesis have earlier been published as
workshop, conference and journal articles (MacLennan and Tang 2009, p.544). Furthermore,

27
there was an introduction of the first attempt of building a heuristic based query optimisation
solution implementation of RSQ engines in several research (such as Abbass and Newton 2002,
cessing engines and the Data stream management systems (MacLennan and Tang 2009, p.587).
On the other hand, the RSP engines such as the CQELS and the C-SPARQL are readily available
in studies (such as Abdulla and Matzke 2006, p.463; Buchanan and Shortliffe 1984, p.401;
Zhang and Kollios 2007, p.409).
2.3Query Optimization
Maringer (2005) describes query optimisation as an interspersed querying function in a
multitude of information systems and database frameworks. All query languages, be they
structured (SQL) or unstructured (C-SPARQL and CQELS), enlist query optimisation
functionalities to establish the most shrewd and adept channel of executing a query that has been
keyed in by a user. Such functionalities encompass query optimisers such as the PostgreSQLor
the Java code (Java Runtime Environment) that analyse and carefully assess the SQL, C-
SPARQL, or CQELS queries tackling the most effectual mechanism for query execution. The
querying of database systems happens almost every other minute of the day and, thus, query
optimisation is just as frequent(Cheung et al. 2006, p. 64). Anyone browsing the internet doing
either simple or complex kind of researches engages query optimisation in the Database
Management Systems (DBMS), when requesting for piece of information from the respective
databases. For example, if you are searching for a Social Security No., financial statements of a
company, a country’s demographics, of even trying to compute the average pay of all the civil
workers in the Department of Agriculture in your regional state, you are querying the distinctive
databases.

28
If, for instance, you are interested in investing in Ernst and Young LLP (a multinational
audit firm), you will obviously want to find out how it is performing in the market and its overall
productivity compared against other industry benchmarks. To locate such information, you will
log in to the company’s database system and request for its financial statements, ratios, and key
market/ performance indicators. A query soliciting for the financial ratios of Ernst and Young
LLP will look like this: “find the consolidated balance sheet of Ernst and Young.” Before the
balance sheet is availed onto your computer screen, there are a number of procedures that occur,
featuring a query plan. After submitting this query, the parser within the database will parse it,
and then hand it over to the query optimiser, which will then hatch several query plans in
accordance with the resource costs (Moustakas 1990). The most efficient means, in terms of
costs and time consumption, is chosen, after which the database server will access the pertinent
database data and whip up the desired results.
The prime focus of the query optimisation function of databases is centred on expeditious
and prompt query execution so as to deliver the desired results in the flash of a minute (Mueller
2009, p. 34). Time consumption is top of the list in the determination of the best query plan to
solve a given query. Any marginal time variance in alternative query plans will prompt the query
optimizer to select the option that is fastest and consumes the least amount of time. However, the
optimisation function is still lacking in regards to time efficiency and conservation as most
querying processes involve redundant executions of intermediate results within the join
operations. These join operation, together with other accompanying costs such as projection and
selection functionalities as well as the processor time, downscale the communication time of the
data results in addition to increasing the computational costs. As the selected query plan works, it
makes use of various algorithms with which it collaborates to manipulate and combine tables of

Figure 1: Semantic Web processing
29
data from the database structure so as to produce the requested knowledge material (Nirmal
1990, p.388). These manipulations and combining of data tables are called join operations and, in
the retrieval of real-time steaming data such as financial statistics, slow down the data streaming
process. Additionally, the processing of the intermediate results needed in the join operations
contributes in making the RDF data volume bulky, thus, impeding operations and the engine’s
speed in overall(Cheung et al. 2006, p.69). All these issues call for programmers to construct the
query optimisation function of the RSP engines around a heuristics solution and implement this
solution to improve on RDF stream processing.
2.4RDF Stream Processing and Semantic Web
The recent deployment of the semantic web in divergent industry sectors such as in
logistic planning in military fields, engineering analysis, health care, and life sciences has proved
its worth in data search automation and information technology upscale. According to Zhang and
Kollios 2007), the semantic web contributes to an instinctual and spontaneous web application
that browses the precise information from linked data sources. The application works by
collecting, filtering, and sampling of data items captured from differential sensor plants and
stored as ontologies in RDF formats (see Figure 1).

30
Invented by Lee Feigenbaum in 2001, the Semantic Web (Web 3.0) has,so far,
showcased some data processing differences between its database management versus other
relational databases such as the World Wide Web (Web 1.0). While the Web 1.0 operates by
dislodging the physical storage and networking layers, the Web 3.0 upgrades this tedious and
seemingly slower process by dismissing the document and application layers. In as much as the
search engines on the World Wide Web index a majority of the content stored on the Web, they
still lack in the instinctive capacity of selecting the articles and web pages that an end user really
desires. Rather than connecting documents and data structure like the Web 1.0, Web 3.0
capitalizes on its metadata base and ever evolving compilation of knowledge to connect facts and
meaning. This algorithm is what enables the Semantic Web to build on intuitiveness and self-
description that help the context-understanding programs to find the exact pages that a user is
looking for. As Sims and Yocom (2008, p.411) convey, the Web 3.0 has gained its technological
leverage over the Web 1.0 by its cutting-edge means of data storage, querying, and information
display. The data storage means incorporated in this new technique involvesmatching data
sources to ontologies that are stored in a structured form in Resource Description Frameworks
(RDF). Unlike the natural text formats that Web 1.0 utilizes in data storage and retrieval, the
Semantic Web models the data items sourced from diverse sensor plants into a comprehensive
descriptive language to make the query processes and information display easy enough and
friendly for all Internet browsers.
As Abbass and Newton (2002) illustrate in their journal article, the RDF comprises of a
descriptive structuring of data used for information exchange on the net. As the semantic
metadata reads information from sensor plants, it filters and stores this information into a format
that is easily readable by both the machine and the computer user. Engineered by the World

31
Wide Web Consortium (W3C), the RDF integrates the use of query languages and descriptive
statements and conjunctions (e.g. has, is) to provide relevant information about web resources
that a user may search for. For example, if you want to find out about the U.S. current president
(web resource), you will type in “The U.S. has a current president in office.” As seen from this
statement, there is an entity-relationship data model that is in the form of a subject-predicate-
object expression. This model is the strategy made use of by the RDF when searching for
information. Thus, the RDF refers to that language that exhibits web data by use of marginally
constraining, meaningful, and constructive expressions. To incrementally expand RDF’s
efficiency, we have to further advance the aspect of heuristics in the querying of RDF data
stream processing engines.

32
Chapter 3: Background to RSP Engines
3.1 C-SPARQL
Barbieri et al. (20109, p. 20) define C-SPARQL as an advanced language – a stretch of
the SPARQL query language that observes windows and recent triples of RDF data streams
while simultaneously allowing the streams to flow. The continuous streaming of queries by the
Continuous SPARQL (C-SPARQL) facilitates the interoperability of RDF formats and
implements crucial applications that allow researchers to access the ever-evolving information of
web resources. Wei (2011, p. 101) refers to C-SPARQL as an orthogonal extension of the
conventional SPARQL grammar, making the SPARQL a congruent component of the C-
SPARQL. The C-SPARQL builds on SPARQL by its capability of combining static RDF
together with real-time streaming data for purposes of stream reasoning. In as much as SPARQL
has cemented its viability in querying RDF repositories, Barbieri et al. analyze that it is still
lacking in producing continuous, flowing data streams (Abbass and Newton 2002, p. 21).
Stream-based data emitters encompassing stock quotations, click streams, feeds, and feeds emit
real-time continuous information. However, the SPARQL is still limited in its efficiency of
storing entire streams; therefore, the Data Stream Management Systems (DSMS) registers
consecutive queries in static forms. The invention of the C-SPARQL is thus based on its capacity
to merge the static data with the streaming data – a procedure that mobilizes logical reasoning in
actual time for those large and noisy data streams.
3.2CQELS
According to Abbass and Newton (2002), the Continuous Query Evaluation over Linked
Streams (CQELS) constitutes an adaptive and instinctive schema for supporting Linked Stream

33
Data, whose grammar is derived from the SPARQL 1.1, thus making them compatible. The
congruence of the two query languages (CQELS and SPARQL 1.1) capacitates the performance
level of the CQELS over other continuous query languages. The CQELS has been engineered
with the sole objective of enlisting the white box approach that functions by utilising the
prerequisite query operators in a native way to obviate all overhead costs plus any other
restrictions of closed system regimes (Schreiber 1977). The CQELS offers flexibility and
updatability in their execution structures as the inherent query processors continuously readjust
to the changes in the incoming data. Examples of such continuous queries are contained in
papers such as CF02, HFAE03, CDTW00, and ABB+02. These queries, however, are quite
simple and only applicable in general-purpose event processing. This paper proposes the
assimilation of heuristics in the query execution of CQELS to enable the continuous reordering
of its operators, thus, improve query applicability in complex situations, not just general-purpose.
The interspersion of the heuristics engine in the querying of RDF data streams is, hence, very
crucial and fundamental in the upscale of RDF stream processing as it greatly minimises the
lengthiness of the join operations. Besides lessening the inherent time consumption, the
heuristics will additionally help spot and rectify any flaws that occur in the queries that users
may input while searching for useful information from given databases. In general, the heuristics
functionalitywill have a double role in the query optimisation of RDF stream processing. One, to
shrink the duration of intermediate results processing for join operations, and, two, to discard the
errors contained in queries, hence, curbing flawed query execution, in turn, escalating time
saving during query optimisation.

34
Section 1: Cost-Based Heuristics Optimisation Approach
3.2.1 Introduction
The move to consolidate heuristics into the query optimisation aspect of RSP engines is
ingenious and groundbreaking, to say the least. The implementations of heuristics are geared
towards cutting computational costs during the query optimisation and join operations executed
within the C-SPARQL and CQELS languages. This section outlines in depth how enlisting the
heuristics function helps minimise the costs estimated in terms of the overall time spent by the
optimiser to select the most effectual query plan/ tree that will execute a given query in the least
time possible thus lessening the CPU and input/ output costs.
The CQELS and C-SPARQL DBMS optimisers endeavor to boil down to a single, most
feasible query plan for the given query statements. In the query optimisation world, pinning
down a suitable plan is contingent upon which mechanism has the least time duration as well as
the most minimal costs involved in terms of query execution factors like communication, the
processor, and the Input/ Output Expenses. These costs are a very critical factor and get utmost
consideration during the selection of the most ideal query plan tree (Abbass and Newton 2002).
When a query is input into an RDF database, the Database Management System (DBMS)
initiates a process a selection course geared towards determining the most potent path to follow
and give results in the shortest route possible. This course entails the optimiser devising several
paths plans from which it chooses the most ideal one to utilize. All these hatched path plans,
when followed, output equivalent data or information. However, they differ in regards to their
cost expenses, specifically, in terms of how much time each plan consumes to finalise the data
retrieval process and generate the data desired by the computer user or researcher, claims Abbass

35
and Newton (2002). The selection criterion hinges upon a critical question: Which path plan will
take the least time to reach and deliver the user information? The optimisation course revolves
around a myriad of circumstances such as how a query is stated, the access methods, the
information layout, and the data set size (Oracle Help Center 2016). The access frameworks are
quite influential in this stage of optimisation as they are the ones which dictate whether the data
should be accessed by use of index scans or full table scans. Suppose Path A requires an index
scan that will take 2 minutes while Path B requires a full table scan that will take 2.5 minutes, in
estimation, Path A will be chosen.
In as much as the conventional optimiser in the CQELS and C-SPARQL strive to hatch
the most feasible execution plan, there are still gaps in this feature. Lots of processor time and
communication time as well as input/ output costs are still considerably high. This section
outlines the trends in query optimisation observed before and after the assimilation of heuristics,
thus approving the positive cost-saving impact achieved after its integration. When a query is
submitted to the database server, it undergoes a certain traverse within the DBMS modules; it
adheres to this sequence until the final results are generated (see Figure 2). These constituent
DBMS modules consist of a scanner, parser, query optimiser, code generator, and query
processor.As Abbass and Newton (2002) explains, the scanner scrutinises the inherent language
tokens, for example, the relation names and CQELS/ C-SPARQL keywords in the context of the
query statement. The parser then follows by certifying the query syntax, its validity, and if the
attribute names are semantically correct. After this, it transforms the query expression into an
internal representation that is machine-readable using a query tree or even sometimes a query
graph. The tree’s data structure is sketched by means of a calculus expression(Abbass and
Newton 2002). The query optimiser comes into play by reading the machine-readable instruction

36
and then forming a multitude of execution plan strategies. The optimizer finally chooses the most
amenable path by assessing all pertinent algebraic expressions relating to the input query,
favoring the cheapest and shortest one. The code generator then works to create a viable code
that requests the query processor to execute that plan projected by the optimizer (MacLennan and
Tang 2009, p.242).
Scanner
Parser
Optimizer
Code generator
Query processor

37
Figure 2: Query flow through a DBMS
As mentioned above, the query optimizer explores relevant algebraic expressions
contained within various algorithms generated by the DBMS in query searches. The traditional
algorithms have always zeroed in on exhaustively enumerating all alternatives available to
empower query searches. However, as explained by Abbass and Newton (2002), this exhaustive
technique is defective when it comes to solving for complex queries as the algorithms cannot
make it to enumerate all possible (millions of) options in a short, convenient timing. Rather, the
timing is quite long and tiring even for the user waiting for the results. This occurrence is evident
when an algorithm has to enumerate join orders for a query whose resulting data is contained in
50 tables. The process of enumerating all these 50 tables and joining the data items can take up
several minutes before results are delivered, thus failing in fastness and cost efficiency. To solve
this drawback, a heuristics solution has been implemented in both the CQELS and C-SPARQL
optimisation processes. This heuristics solution activates an algorithm that basically checks the
storage file in the DBMS to confirm if there is a ready-to-use query plan that matches the new
input query. If the ready-to-use query plan exists in the storage file, the algorithm uses this to
execute the new query expression, thus eradicating the need to create a new query plan. This
ultimately saves the processing time meant for developing the new query plan as well as the
input/ output costs (MacLennan and Tang 2009, p.42). Also, the communication time spanning
between the input of the query and output of the data results is also shortened. This improvement
in processor time/ cost and communication time continues to increase as time proceeds and even
as queries get more intricate.
3.2.2 Proposed heuristics approach

Figure 3: Binary tree
38
The heuristics solution proposed in this thesis advocates for a change in the sequence of
query execution from a normal binary tree to a magic tree that is stored in the given storage file.
The move to change the sequence of execution steps allows for the DBMS to save computational
costs and time as well (MacLennan and Tang 2009, p.221). In the absence of heuristics, the
query optimiser normally formulates a binary query tree (see Figure 3) which it uses to derive
numerous path plans before choosing the most optimal alternative. The formulation of the binary
tree calls for redundant operations such as the join, filter, and projection functionalities every
time a query search is initiated within the DBMS. This redundancy contributes majorly to the
compilation of operational expenses (join, filter, and projection), time involved in the
performance of these functionalities, as well as the processor and communication time. Frequent
join executions, particularly, make the RDF data volume being accessed extremely voluminous
and bulky, which in turn even strains the manipulation of data depositories more complicated.
However, the addition of heuristics ensures that these binary trees are replaced with a
much more efficient methodology, the magic tree. The magic tree differs from the conventional

Figure 4: Magic tree
39
binary tree by its innovation way of setting all the constituent variables (join, filter, and
projection) to only one wing of the tree (see Figure 4). Each of these distinctive variables is then
allocated a specific weight by the algorithm, after which the total weight is used to calculate the
cost of the variables in the tree. The criterion of assigning the individual weight is dependent on
the amount of time spent by each variable during the query processing, therefore, the
computational time correlates with the attached weights (MacLennan and Tang 2009, p.232). The
magic tree reorders marked variables such as the projection stem of the binary query tree and
eliminates the redundancy implemented in binary projection mechanisms. For example, the
applicable costs within the projection stem is x units. Therefore, if we administer a projection
fifteen times on a nested query, the aggregate cost will be 15 * x units, in the customary binary
tree. The proposed heuristics magic tree, however, shifts the projection facet to one state such
that if the projection operation is to be administered on the same nested query, the projection
administration would only need to be once, thus the total cost of processing would be x units
only. Figure 5 below depicts the algorithm proposed by the heuristics solution.

40
Table 1: Algorithm 1
Function: Compose a Magic Tree.
a) Query parsing.
b) Transformation of the query expression into a machine-readable statement.
c) Forming a query tree or graph, depending on the calculus expression used.
d) Selection entity shifts to the head nodule of the query tree.
e) Elimination of all candidate selection entities available.
f) Formation of all the dependent groupings. These are shifted to one wing of
the tree.
g) All the leaf nodules are relations; they are therefore halted once the process
reaches the leaf.
h) The query processor begins the search query course of action.
i) Once the query processor discovers the data target, it heads over to the
projection stem where all the other pertinent functionalities are conducted.
As MacLennan and Tang (2009, p.144) claim, heuristics has always been a viable
solution for modern computational problems, more so those that deal with voluminous data sets
such as telecommunication and industrial plants streaming data. The algorithms embedded in
heuristics functions help solve for entity optimisation and complex real-world issues as it
improves on time, costs, and space required in deciphering computational inquiries. In our case,
the effect of heuristics may not be felt or seen immediately, but after a while, the cost-saving
impacts will surely become visible. This is because of the working psychology assumed by
heuristics. As explained above, during the early implementation stages of the heuristics, the
entity operates by first monitoring how applications work. It performs meticulous appraisals and

41
evaluations of how program applications, in this case the query optimisation process, are run and
traces all these moves and formulas onto its memory. By this, it has created a virtual image of the
functioning of all the steps involved during a query search, from when the query is input to when
data results are displayed on the screen. The more advanced version of heuristics thoroughly
inspects then traces the guidelines put in the codes of programs prior to passing them over to the
computer’s processing unit for execution. This will help the heuristics engine to assess and learn
the behaviour and mannerisms of that program while it runs in a virtual setting.
As soon as its memory is packed with the application performance information, it starts
using this information to revamp activities and even cultivate better channels for enhanced task
execution. In the case of the RDF stream processing, a user can input the same query over and
over again over a given period of time, say for example, when retrieving information about a
certain tweet or when researching about the manufacture status of a phone from its manufacturer.
For every single time that a query search is initiated for such a research function, the parser must
form a query tree for each search before handing it over to the query optimiser and code
generator to formulate a code needed in the actual processing of the query statement. Building a
query tree for each and every query search of the same research question consumes an awful lot
of communication time and processing expenses as well, in the absence of a heuristics engine
(MacLennan and Tang 2009, p.39). This time, physical storage space, and processing costs is
what we all aim to eradicate in our RDF streaming processing. In a heuristics environment,
however, the redundant formations of the same query tree, their optimisations, and final query
processing, is noted in the heuristics’ memory. Hence, if the same research question is entered
yet again, the parser will just proceed to the heuristics’ memory and retrieve the query tree that
was noted before, instead of building a new one all over again. Therefore, the time that could

42
have otherwise been expended in the query tree formation has been saved and, in turn, also the
communication time has been minimised too. The query search proposed by this heuristic is as
shown in Figure 6.
Table 2: Algorithm 2
Function: The Projected Heuristics Query Search.
a) A query tree is crafted for each query expression that is submitted into the database
system.
b) Then, the heuristics function reads and stores this binary tree in a dedicated storage folder
for that particular query tree.
c) The storage folder is then assigned a unique company usage factor for easy identification
by the parser, such that the maximum quantity of storage folders generated equal the
company usage factor (c.u.f).
d) Following this, the heuristics devises a unique magic tree that shifts all the dependent
variables (join, select, and projection) in the binary tree to one side of the tree.
e) When a similar query is submitted by a user, the parser first confirms from the storage
folder if there is an equivalent query tree that can be utilized for that input inquiry.
f)If there is an equivalent stored tree, it will hence proceed to the precise branch node
required for processing the inquiry at hand, and perform all the relevant courses of action.
g) However, if there is no suchlike tree, it will consult the magic tree stored there, and if
successful, it will halt further searches and perform all the relevant courses of action
necessitated.
h) However, if all these searches fail such that there is no equivalent branch node even in the
magic tree, the parser will now resort to generate a new magic tree as depicted in the first

43
algorithm, thus increasing the storage folder counter.
Lastly, the database server will refresh the folder in the event that the counter is less than the
company usage factor. This is commendable because the number of folders should be equal
to the company usage factor (MacLennan and Tang 2009, p.19).
3.2.3 Results simulation
This section puts into actual practice, through simulation, this theoretical novel approach
of heuristics assimilation into an RDF stream processing engine to confirm if the prototype
makes good of its promise. The RDF engines tested herewith consist of the CQELS and C-
SPARQL languages. Simulation here refers to the manner in which the heuristics replication was
conducted over a specified period of time (6 months). A model of the heuristics query
optimization engine was replicated in a Java Runtime Environment (JRE) running on a computer
powered by the Windows Operating System. With the help of the JRE, we codified some core
Java codes, which were later, compiled and run in a Java eclipse environment to execute the
given RDF data streams. The codification was written in Java and employed the concept of class
handling. The data structuring integrated in the query tree went hand in hand with dynamic
memory appropriation that primarily used linked lists. The outcome of the analysis was as
expected; the integration of heuristics across the RSP engines board improved cost-saving by
shrinking the processor operational costs. A heuristics approach was implemented in the CQELS
and C – SPARQL query languages to form magic trees and also perform the selections earlier.
As MacLennan and Tang (2009, p.66) explains, the heuristics database engine is exploited in the
early performance of selections. This action considerably reduces the size and magnitude of the
RDF graph databases hence speeding up the query search process in overall. For example, if we
reflect on the following CQELSand C – SPARQL query expressions (see Figure 7), applying

44
heuristics is beneficial in terms of how it executes the selection entities very early in the process
hence minimizing the communication time.
Table 3: Query 1
The customary query processing of these CQELSand C – SPARQL query expressions
would have initiated the formation of a binary query tree as depicted in Figure 3. With heuristics,
however, the database engine will form a magic tree (see Figure 4) that will shift the selection
variable to one side of the tree. As MacLennan and Tang (2009, p.41) inform, yes, the initial
query processing stages of the heuristics approach will absorb come costs in constructing as well
as searching the magic tree. Nonetheless, these costs will be significantly lower as compared to
those expended in the formation and execution of the binary trees. The implementation of the
magic tree likewise reduces all other computational costs involved since also the frequency of
the selection variables also decrease. This cost-saving is evident in the comparison of the
estimated cost calculations of both methods: the binary tree and magic tree query processing. As
for the traditional binary tree, its aggregate running costs are 100 units while the incurred
expenses for the magic tree are 50 units only.Supposing a new query is input for the first time by
a user, the database server will incur seemingly high expenditures in both the formation of the
binary tree as well as the conversion of this binary tree into a magic tree. However, in the next

45
round, there will be no conversion costs as the magic tree will be readily available in the
heuristics’ storage folder. Additionally, the communication and processor will reduce in the same
degree as the conversion costs, as the parser will automatically reach for the magic tree branch
nodes. Figure 5 demonstrates the cost versus time chart comparing the conventional query
processing versus our projected heuristics-based CQELSand C – SPARQL query optimization
strategies.
Figure 5: Cost versus time graph
As it is shown on Figure 5, the preliminary costs are somewhat high, but as the heuristics
functionality continues to track, learn, and store the magic trees in its folders, the overall
computational expenditures decrease with time(Cheung et al. 2006, p. 43). To elucidate this
phenomenon, as a new query is fed into an RDF format database, all the constituent stages
conducted during a tree match search are carried out: parsing, query tree building, syntax
checking, attribute name confirmation,optimisation, and code generation. These activities
contribute to the evidently high cost expenditures as well as huge time consumptions
(MacLennan and Tang 2009, p.71). As time goes by, the heuristics entity monitors the query
search procedure, identifying the redundant parsing and optimisation sequences, and creating a
way out. It achieves a way out by tracing a particular binary tree in its storage folder and, from

46
this, derives a magic tree that is equivalent matches it. Therefore, in the subsequent standard
query searches, there will be no need to create yet another new binary tree for a similar inquiry
(MacLennan and Tang 2009, p.83). Instead, the magic tree will be retrieved from the storage file
for a duplicate tree matching, hence saving the computational conversion time and costs. The
heuristics application becomes even better with the execution of nested queries as the data results
are delivered much faster and more efficiently (see Figure 9). Further simulations of the
heuristics algorithm can also be enlisted in extending join properties such as the right and left
joins.
Figure 6: Performance versus complexity
3.2.4 The performance comparison graph between new improved model and the previous
version of CQELS and C-SPARQL
Most of the considered systems work in progress and are scientific prototypes.
Unsurprisingly, those are not able to support all the query patterns and features. The outputs of
the new improved model and the previous version of the CQELS and C – SPARQL are
significantly different because of their differences in implementation. These differences in
performance mainly result from the technical issues of intrinsic concerning the methods of
handling streaming data such as potential fluctuating execution environment and time
management.

47
Table 4: ThePerformance Comparison by Features
Special support for Input Extras
C – SPARQL TF RDF and RDF
streams
CQELS NEST, Vos RDF and RDF
streams
Disk spilling
Streaming SPARQL RDF streams
SPARQL stream NEST Relational stream Ontology-based mapping
EP – SPARQL EVENT, TF RDF and RDF
streams
Event operators
EVENT: Even pattern, VoS: Variables on stream, TF: Built in time function, NEST: Nested
patterns.
Table 5: Performance Comparison by the Mechanism of Execution
Re-execution Optimisation Architecture Scheduling
C – SPARQL Periodical Static and algebraic Black box Logic plan
CQELS Eager Adaptive & physical White box Adaptive
physical plans
Streaming SPARQL Periodical Static and algebraic White box Logic plans
SPARQL stream Periodical Externalised Black box External call
EP – SPARQL Eager Externalised Black box Logic program

48
Figure 7: Graphical performance comparison
As the graphs shows, the throughput of scalability and performance tests of C- SPARQL
are considerably lower than that of the CQELS and JTALIS. For this reason, it is clear that the
recurrent execution is likely to waste significant resources of computing. A sliding window
extracts the recurrences, and the outputs can be incrementally computed as a stream. Notably, the
outputs of JTALIS and CQELS are useful in answering the recurrent queries.
Query 1 involves counting the number of items over a tumbling window of one-second.
Of note, however, this query uses a physical time window. For statistical and significant robust
results, the computation is done as an average of twenty executions. The main reason for doing
this activity is because of the variable time of execution that also depends on the condition of the
machine.

49
Notice that CQELS performs better than JTALIS because it uses both the adaptive and
native approach. The JTALIS and C – SPARQL performance heavily depends on of some of the
underlying systems that include prolog engine and a relational stream processing engine
respectively. In similar fashion, CQELS is likely to benefit from a more sophisticated algorithm
that is optimised as compared to the current one. The only system that indexes and precomputes
the intermediate results over the static data from sub-queries is the CQELS. However, both the C
– SPARQL and the CQELS do not scale well at the time they increase the number of queries
such as sharing data windows and similar patterns. Additionally, they testify that neither of the
systems uses the techniques of multiple query optimisations to avoid redundant computations
among the queries that share computing memory and blocks. In this case, the optimisation only
occurs at statically and algebraic level since streaming both C – SPARQL and SPARL schedule
the execution at a logical level(MacLennan and Tang 2009, p.102). On the contrary, CQELS can
choose alternative plans of execution that get composed from the available operators’ physical
implementations. In effect, the optimiser adaptively optimises the execution at the physical level.

50
Both SPARQL stream and EP-SPARQL schedule the execution through a logic program or a
declarative query. In this case, they fully delegate the optimisation to other systems (Seshadri
and Leung1998). The technique used in improving the result involves the definition of mappings,
triple pattern, RDF triple, and other operations on mappings and the reuse of notations.
Under the Instantaneous RDF dataset and RDF stream, the temporal nature of data is
essential and requires capturing in the representation of data in the continuous processing of
dynamic data. This applies to both the sources of data because the collections in linked data
updates are also possible. It is an instantaneous RDF dataset. G (t + 1) = G(t) for all the values of
t ≥ 0 and G(t) = G for all t = N. Pattern matching is the main primitive operation on both the
instantaneous RDF dataset and RDF stream(MacLennan and Tang 2009, p.88). Notice that, triple
pattern of SPARQL semantics extends the pattern matching. As a consequence, the use of
notations of denotational semantics becomes helpful for the formal definition of query patterns
of the processing model. The denotations are the meaning functions of the semantics
compositions of abstract syntax. These compositions comprise a total of three operators namely
relational, pattern matching, and stream operators. Pattern matching operators extract triples
from a dataset or an RDF stream that are valid and match a given triple pattern at a certain time t
as shown below
Pattern matching operator’s abstract syntax
The meaning of triple matching pattern operator PG gets defined in the same way as SPARQL
on an RDF dataset at a given timestamp t as follows

51
Next is the definition of the window-based triple matching operator on an RDF stream.
The denotational semantics composability results in the definition of the abstract syntax
for the compound query pattern constructed from both the logical operators and matching
operators. Additionally, the definition of the aggregation operator comes before the definition of
the syntax and its semantics(MacLennan and Tang 2009, p.99). Notice that a uniform mapping
contains only the mappings that have similar domains. In this case, a consistent mapping gets
defined in an aggregate operator setΩ. The relational operators’ abstract syntax is therefore
defined recursively as shown below.
The mapping of the operators therefore becomes
Under the streaming operators abstract, the streaming operator becomes either an RDF
stream or relational stream from the above relational operators.

52
Next is the definition of the declarative query language CQELS-QL or CELS query
language for the execution framework of CQELS. Additionally, the SPARL grammar in the
notation of EBNF helps in the definition of the CQELS-QL. The first thing is the addition of the
query pattern for the representation of window operators on RDF stream.

53
Chapter 4: State of The Art in LSDP or the Linked Stream Data Processing
According to Gedik (2006), Linked Stream Data derived its usefulness in bridging the
gap between Linked Data and stream, and also in the facilitation of the integration of data among
Description Framework data streams enables the query processor to participate in treating the
RDF elements of stream nodes and also allows for both the access to get access to RDF streams
in the form of the materialized data (Abdulla and Matzke 2006, p.907; Buchanan and Shortliffe
1984, p.777; Cole and Conley 2009, p.809; Zhang and Kollios 2007, p.733). Notably, the whole
process makes it possible for the application of other SPARQL query patterns (Cheung et al.
2006, p.444). In short, this chapter explores both the techniques and concepts of processing
streams and the introduction of Linked Stream Data Processing engines (Calhoun and Riemer
2001, p.447). Additionally, the inclusion of the CQELS engine in the chapter helps in the
clarification of the contribution of this field.
4.1 Query Semantics and Data Models
This section mainly explores the possible ways of formalizing the data model for
Resource Description Framework datasets and the Resource Description Framework streams in a
continuous context (Cole and Conley 2009, p.931). Additionally, it touches on the continuous
query semantics.
4.2 Data Model
It is important to note that the modelling of Linked Stream Data occurs by extending the
meaning of both the RDF triples and RDF nodes (Cohen 1985, p.303). A stream of RDF refers to
a bag of different elements, while an RDF triple just denotes a temporal annotation such as time
interval or time stamp. A pair of time stamps includes an interval based label. In common cases,

54
natural numbers help in representing logical time (Eastwood 2008, p.278). Things such as ‘start'
and ‘ends' represent a pair of timestamps, and they are also useful in specifying the valid interval
in which the Resource Description Framework triple (Dean 2009, p.264). On the other hand, a
point based label refers to just a single natural number that represents the received or the
recorded point in time of the triple (Buchanan and Shortliffe 1984, p.708). One may see a point
based labels to be looking less efficient and redundant as compared to the interval based labels.
Further, point based labels are less expensive than the interval based labels because the former
gets considered to be an important and special case of the latter. For example, start = end.
According to research (e.g. Abbass and Newton 2002, p.946), streaming SPARQL find labels
useful in representing its EP-SPARQL and the items of the physical data stream in the
representation of the triple based events.
For the purposes of streaming data source, a point base labels out more practical results
because it allows for the instantaneous and unexpected generation of a triple. It is a good
example of the use of a tracking system to detect people at an office (Buchanan and Shortliffe
1984, p.707). Notably, this kind of activity results in the generation of a triple using a timestamp
at any time it receives any reading from a sensor. For further processing of the information, the
system must do further processing and buffer the reading in order to help in the generation of the
interval of the valid triple (Bolton 1996, p.407). Furthermore, the instantaneous point based
labels play a vital role for the applications that require the processing of the data immediately it
arrives in the system. Additionally, the concept of the Resource Description Framework must be
included in the model of data to enable the integration of stream data without stream data.
primarily, the Resource Description Framework dataset always get considered as a static data
source by the current state of the art. In light of the findings (e.g. by Abbass and Newton 2002,

55
p.944), it is important to note that the data stream applications can always run for any given
number of period that ranges from days to years. In addition, the changes in the Resource
Description Framework dataset during the lifetime of query must be reflected in the continuous
query's outputs.
4.3 Query Semantics
Semantics extend to explore things like approaches of the current state of the query
operators of SPARQL-like union, join, and filter. In practice, these operators output and consume
mappings (Abbass and Newton 2002, p.556). In addition, they also take part in introducing the
operators on the Resource Description Framework streams to the output mappings. Worth noting,
C-SPARQL defines its stream operator to access a Resource Description Framework stream that
is identified by its IRI (Cohen 1985, p.301). Additionally, the window operator gets defined to
help in accessing a Resource Description Framework stream based on certain windows.
Essentially, the window operator is useful in adopting the window operator on Resource
Description Framework streams in relation to the CQL (Cole and Conley 2009, p.954). It is also
important to note that the semantics of continuous query on Resource Description Framework
get defined as query operator composition. Practically, a query gets composed as an operator
graph in streaming both the C-SPARQL and SPARQL (Dean 2009, p.237). The SPARQL helps
to base the definition of the query graph on the query operator.
4.4 Query Languages
There is a need for the introduction of a query pattern for expressing the primitive
operators in order to fully define a declarative Linked Stream Data's query language (Abdulla
and Matzke 2006, p.956; Buchanan and Shortliffe 1984, p.561; Zhang and Kollios 2007, p.654).
In practice, this kind of data is window matching, triple matching, and sequential operators

56
(Eastwood 2008, p.509). In addition, the composition of these basic query patterns can later get
expressed by things such as OPT, AND, filter patterns of SPARQL, and UNION. Another
important thing to note is that, these patterns, corresponds to the operators in earlier definitions.
In support of the aggregation operators, several types of research (e.g. Abdulla and Matzke 2006,
p.966; Buchanan and Shortliffe 1984, p.906; Zhang and Kollios 2007, p.749), define their
semantics with the AGG query pattern. This kind of pattern is compatible with another type of
SPARQL patterns. The definition of the evaluation of query pattern AGG is [[P]]/ [[A]] = [[P
AGG A]], whereby A refers to the aggregate function consuming output of an SPARQL query
pattern P in returning the set of mappings. By letting, P, P1, and P2 to be the composite or basic
query patterns, then the declarative query gets composed recursively by the use of this kind of
rules:
[[P1]]/ [[P2]] = [[P1 UNION P2]],
[[P1]]/ [[P2]] = [[P1 AND P2]],
[[P1]]/ [[P2]] = [[P1 AND P2]],
[[P]]/ [[A]] = [[P AGG A]],
andfµ 2 [[P]] jµ = [[P FILTER R]].
In practice, these type of patterns helps to extend the grammar of SPARQL for the continuous
queries.
It is important to note that the use of C-SPARQL is helpful for extending the SPARQL by
ion Framework stream output is the triple patterns of this kind of CONSTRUCT. In essence, the
grammars that are helpful in streaming both the C – SPARQL and SPARQL are the same. In
practice, the use of databases is always manifold (Jeuring 2012, p.417). In fact, they give a
provision for a means of retrieving either parts of the records or the entire records and in the

57
performance of the different kind of calculations before displaying the outcomes (Abdulla and
Matzke 2006, p.504; Buchanan and Shortliffe 1984, p.703; Cole and Conley 2009, p.968; Zhang
and Kollios 2007, p.974). Practically, the query language is the interface that specifies such kind
of manipulations (Lucas 2010, p.608). On the other hand, the early query languages were
initially very complex making the interaction with electronic databases to get done by the
individuals with some special knowledge (MacLennan and Tang 2009, p.673). Ordinarily, the
more user-friendly interfaces are the modern ones, in addition, they also allow for the casual
users to access the information of the database.
A good example of the main types of this kind of query modes is the fill in the blank, the
menu, and the structured query (Gedik 2006, p.422). Most importantly, the menu needs an
individual to choose from various alternatives that get displayed on a monitor that are
particularly suitable for novices (Maringer 2005, p.342). On the other hand, the technique of the
fill in the blank refers to one that allows the user get a promotion to enter the key words such as
the statements (Moustakas 1990, p.623). Worth noting, the approach of the structured query is
very effective with the databases that are relational. In simple terms, it has a powerful syntax that
is formal and, in practice, a programming language. Additionally, it can accommodate logical
operators (Mueller 2009, p.506). Furthermore, the Structured Query Language or the SQL has
some various forms during the implementation of this kind of approach. Some of the various
forms include: selecting [[field Fa, Fb, Fc..., Fn]], on the other hand, where [[Fa Field = abc]]
and [[field Fb = def]], and from [[ database Da, Db, Dc… Dn]]. Several studies (e.g. Abdulla and
Matzke 2006, p.678; Buchanan and Shortliffe 1984, p.985; Zhang and Kollios 2007, p.992),
shows that it is important to note that the structured query language is supporting the searching
of the database and also other activities by the use of various commands such as ‘sum’, ‘print’,

58
‘find’, ‘delete’ and so on (Nirmal 1990, p.496). Ordinarily, the natural language looks like the
sentence structure of a Structured Query Language except that the syntax of the SQL instead uses
the statement of Structured Query Language. Additionally, it is also possible to show a
representation of the queries in the form of tables.
The technique is known as the QBE (or the query by example) helps in the displaying of
an empty form. According to Mcllroy (1998), this kind of process continues to occur expecting
the searcher to enter the appropriate specification of the search into the appropriate columns.
This kind of SQL type of query then gets constructed by the program from the table as it does the
execution (Zhang and Kollios 2007, p.997). In practice, the natural language shows the most
flexible query language (Abdulla and Matzke 2006, p.911; Buchanan and Shortliffe 1984, p.703;
Zhang and Kollios 2007, p.707). Most importantly, some commercial database management
software allows the use of sentences of the natural language in the form of constraints to search
the databases (Schreiber 1977, p.781). In essence, these kinds of programs recognize the
synonyms and the action words of syntax after its parse (Abdulla and Matzke 2006, p.1002;
Buchanan and Shortliffe 1984, p.734; Zhang and Kollios 2007, p. 836). In addition, the programs
records identify the names of the files, perform, and field the required logical operations
(Seshadri and Leung 1998, p.699). Furthermore, there has been some development in the natural
language queries in the spoken voice due to the acceptance of such experimental systems (Sims
and Yocom 2008, p.1003). However, the ability to employ the unrestricted natural language in
query unstructured information that further needs advances in the understanding of the machine
of natural language (Wei 2011, p.354). This kind of activity mainly presents in the representation
of the programmatic and semantic context of ideas.

59
Chapter 5: The Optimization Solutions for the CQELS
In essence, this kind of execution framework helps in supporting adaptive and native
query execution over RDF datasets and RDF streams (Bolton 1996, p.404). Worth noting, the
framework’s white box architecture accepts both the RDF datasets and RDF streams as inputs
and also returns the outputs as either the relational streams or the RDF streams in the result
format of SPARQL (Abdulla and Matzke 2006, p.702; Buchanan and Shortliffe 1984, p.497). In
practice, it is possible to feed the output RDF streams into any CQELS engine (Wei 2011,
p.4078). On the other hand, the relational streams can be useful to other relational stream
processing systems (Cheung et al. 2006, p.497). Notably, the working processing involves the
following: the pushing of the stream data to the input manager and using the encoder for
encoding it into the normalised input stream representation (Cole and Conley 2009, p.1007).
Practically, the dynamic executor is able to consume this kind of encoder. Another important
aspect to note is that the decoder has to decode the outputs of the dynamic executor by streaming
it to the receiver (Abdulla and Matzke 2006 p.749). Mostly, the decoder and the encoder share a
dictionary for the decoding and the encoding operations. Additionally, the dynamic executor
accesses the static RDF datasets via the cache fetcher. Furthermore, the SPARQL endpoints can
be useful in hosting the decoder and encoder in either the remote RDF stores or the local RDF
stores (Cole and Conley 2009, p.1011). On the other hand, the cache fetcher plays a vital role in
retrieving the crucial data then encodes the same data for the cache manager by the use of the
encoder (Wei 2011, p.507). Worth noting, the normalized representation is helpful in representing
the encoded data of the intermediate results for sharing the same dictionary with input stream.

My thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to My thesis

Similar to My thesis (20)

More from William Aruga

More from William Aruga (20)

Recently uploaded

Recently uploaded (20)

My thesis