Extending Solr: Building a Cloud-like Knowledge Discovery Platform
SURE Research Report
1. Searching a Native XML Website
Using XML Technologies
Alex Sumner
Sophomore, Computer Science Major
Winston-Salem State University
Dr. Mustafa Atay
Associate Professor, Department of Computer Science
Winston-Salem State University
2. 2
Abstract
XML (Extensible Markup Language) is a standard to represent and exchange data. XML
separates content from its style. The separation of content and formatting should allow
web programmers to come up with efficient document search modules for XML-based
websites. In this research project, we aim to develop a client-side search module for a
native XML website using technologies and standards such as Extensible Stylesheet
Language Transformations (XSLT), XML Path Language (XPath), Document Object
Model (DOM), Regular Expressions and JavaScript.
We have developed JavaScript code for client-side searching which isplaced and invoked
within XSLT code files. The JavaScript code navigated the DOM hierarchy of the
underlying document. We have used string match operator along with regular expressions
for effective and targeted search operations. We used our XML-based website previously
developed for WSSU SURE program as the test bed. Our developed application and
observations showed that a native XML website can be effectively searched using XML
technologies combined with a scripting language.
The method of this research project includes the following steps: (i) Review XML and
other technologies and tools available such as XSLT, XPath and DOM to support
searching a native XML website. (ii) Implement a client-side search utility using the
selected technologies and tools and a scripting language on a test bed XML website. (iii)
Observe and report the applicability, effectiveness and challenges of using XML to
incorporate a client-side search utility over XML websites.
Keywords: XML, XSLT, JavaScript, DOM, client-side search, Regular Expressions
3. 3
Table of Contents:
Abstract ....................................................................................................2
Introduction...............................................................................................4-6
Background Information.......................................................4-6
Problem Statement...............................................................6
Materials and Methods ............................................................................6-7
Research Project.....................................................................................7-10
Basic Search.........................................................................7-8
Advanced Search.................................................................8-10
Conclusions..............................................................................................10
Future Work..............................................................................................10-11
Experiences .............................................................................................11
Acknowledgements .................................................................................11
References ..............................................................................................12
4. 4
1. Introduction
XML is used to store and transport data and is a recommendation of the World Wide Web
Consortium. The focus of this research is to successfully introduce a client-side search
utility on a native XML website. To accomplish this we used XML technologies, such as
XSLT, XPath, DOM, and Regular Expressions, and a scripting language, JavaScript.
1.1 Background Information
1.1.1 XML
XML is a markup language like HTML; however, it is not used the same way as HTML.
XML is used for the storage and transportation of data. It is a W3C (World Wide Web
Consortium) recommendation. This means that it promotes fairness and quality on the
web and is a web standard. XML does not have predefined tags which allows the user
to create their own, self-descriptive, tags.1
1.1.2 XSLT
XSLT is a stylesheet language for XML. It converts XML documents into another form
such as HTML or XHTML. This transformation is necessary in order for the document to
be read properly and displayed by the browser. XSLT became a W3C recommendation
in 1999. It is supported by all of the major browsers and has the ability to incorporate
many of the other XML technologies into its code. 2
1.1.3 JavaScript
5. 5
JavaScript is a popular programming language for the web. It is a necessity in all modern
HTML web pages. Along with HTML and CSS, JavaScript is one of the languages that
all successful web developers must know. It has the ability to change, create, delete, and
copy HTML elements. There are specific tags that must be used with JavaScript.
JavaScript must be placed between the script tags, and it has to either be placed within
the head or body tags. 3
1.1.4 XPath
XPath is used to navigate through an XML document. It can be an extremely useful tool
in an XSLT document. By using XPath to navigate through the XML document an XSLT
document can easily access and modify the data. XPath is also a W3C standard as of
1999 and it is supported by all major browsers. 4
1.1.5 DOM
There are three types of DOM; Core, XML, and HTML. In this research we used the DOM
to help with our navigation in our JavaScript code and accessing specific data in the XSLT
code. This was possible by using Navigation Nodes and DOM Methods such as:
firstChild, nextSibling, getElementsByTagName(), and getElementsById(). These
methods and nodes were used simultaneously in the JavaScript code to locate where the
search would take place in the XSLT code. 5
1.1.6 Regular Expression
Regular Expressions are search patterns formed by a series of characters. They are
primarily used for searching text, but can also be used for replacing text. Patterns can
6. 6
contain single characters, part of a word, or a whole word. Patterns work in coherence
with modifiers. These modifiers are single characters that are capable of making the
search case insensitive, global, or multiline.
Examples:
var patt = /pattern/modifier
var search = new RegExp(key,’i’); 6
1.1.7 HTML Forms
In this research there was only one aspect of HTML that we used. HTML Forms and
Inputs were used to display the radio buttons, text boxes, checkboxes, submit buttons,
and reset buttons. Each input created one of the listed items for our search module. Each
module is one form. These forms are able to pass data to the server which was used by
the JavaScript code to display the results of the search. 7
1.2 Problem Statement
Can a native XML (Extensible Markup Language) website be effectively augmented with
a client-side search utility using XML technologies and a scripting language?
2. Materials and Methods
2.1 Materials & Software
We used the XML website developed for the WSSU SURE program as our test bed
website. We used Notepad++ as our JavaScript program development editor.
7. 7
2.2 Methods
In our process we first had to review and learn some of the XML technologies and
JavaScript. Next we applied what we learned to creating the client-side search utilities
on our test bed. After creating the search utilities we fixed debugged the codes to make
sure everything worked properly. Finally, we introduced the code to the other pages on
our XML test bed, and resolved issues as time allowed.
3. Research Project
3.1 Basic Search
The first contribution to the website is the Basic Search. This search is exactly as the
name implies. The user is able to search using a keyword from the participants page.
Figure 1: 2014 Participants Page
8. 8
This keyword could be a series of letters or numbers, the participants name, their
discipline, the advisor’s name, or the advisor’s phone number. With this search is also
the option for the search to be case sensitive or insensitive. The primary concept in the
development of this search utility was to allow the user to search through all of the text
fields using one text box. The code for the basic search is capable of searching through
all four columns of the table through the user’s input.
Figure 2: Basic Search Code
9. 9
3.2 Advanced Search
The other search utility that we introduced is the Advanced Search. This search has
introduced more options than the Basic Search. With this search the user is able to
choose which column they want to search through. There is the option of searching
through the advisor’s name, the participant’s name, or even the research’s discipline. The
user can also search through the three columns at the same time to refine their search.
Once again there is also the option for the search to be case sensitive or insensitive. In
our research the advanced search was our ultimate goal. This search utilizes three
different HTML input types including: radio buttons, text boxes, a checkbox, and two
buttons. The radio buttons are in place for the user to search through the discipline in
which the research takes place. The first text box is used to search through the names of
Figure 3: Basic Search Results
10. 10
the advisors. The second text box is used to search through the names of the
participants. Like the basic search, the advanced search also contains a checkbox with
the capability of making the search case sensitive or insensitive. The first button is a
submit button used to call the doSearch() function for searching through the table. The
second button is a reset button that calls the Reset() function for resetting the HTML form.
Figure 4: Advanced Search Code
11. 11
4. Conclusions
Our developed Basic and Advanced Search modules showed that an XML website can
be successfully augmented with a client-side search utility using JavaScript for effective
searching. We made use of HTML DOM and simple Regular Expressions in our search
modules. We plan to extend our research work with the use complex Regular Expressions
for finer filtering, XML DOM and server side search utility in the future.
5. Future Work
In the future we would like to enhance the basic search in order to operate it on the home,
application, and personnel pages. We also want to enhance the advanced search to
Figure 5: Advanced Search Results
12. 12
operate on the personnel pages. Finally, we want introduce a global search to our test
bed.
6. Experiences
While working on this project we came across a multitude of major and minor problems.
The first problem we encountered was trying to link the input from our first text box to the
column containing the advisors’ names. Once we were able to solve this problem it made
other parts of the project significantly easier. Another problem we faced was creating the
basic search. It seems like a simple task because it’s similar to splitting the code like the
advanced search; however, it was a more difficult task than we anticipated. Although we
may have ran into quite a few problems, I personally gained many positive experiences.
As a result of this research I have gotten a better understanding of XML, XSLT,
JavaScript, XPath, and HTML. I have also learned some aspects of DOM and Regular
Expressions. In addition, I’ve learned how to search through a document using JavaScript
and other XML technologies which benefits me as a Computer Science Major.
7. Acknowledgements
This project is funded and supported by the NSF HBCU-UP Implementation Grant:
Raising Achievement in Mathematics and Science (RAMS) with Award #0927905. It has
also encouraged me to pursue a graduate degree in the future.
13. 13
8. References
1. “Introduction to XML.” XML Introduction. World Wide Web Consortium, Web. 21
May 2014.
2. “XSLT Tutorial.” XSLT Tutorial. World Wide Web Consortium, Web. 21 May 2014.
3. “JavaScript Tutorial.” JavaScript Tutorial. World Wide Web Consortium, Web. 21
May 2014.
4. "XPath Introduction." XPath Introduction. World Wide Web Consortium, Web. 22
May 2014.
5. “JavaScript HTML DOM Navigation.” JavaScript HTML DOM Navigation. World
Wide Web Consortium, Web. 22 May 2014.
6. “JavaScript RegExp Object.” JavaScript RegExp Object. World Wide Web
Consortium, Web. 22 May 2014.
7. “HTML Forms and Input.” HTML Forms and Input. World Wide Web Consortium,
Web. 22 May 2014.