BrainSpa Paper

395 views

Published on

BrainSpa is a web application for exploring knowledge using the SPARQL query language for RDFs.

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
395
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
2
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

BrainSpa Paper

  1. 1. BrainSpa – A Web Application for Exploring Knowledge using SPARQL Eugen Ignat1, Sabin Pochiscan1, Radu Simionescu1, Simona – Adina Toderas1 1 Artificial Intelligence and Computational Linguistics Department, Faculty of Computer Science - Iasi, Romania eugen.ignat@info.uaic.ro sabin.pochiscan@info.uaic.ro radu.simionescu@info.uaic.ro simona.toderas@info.uaic.ro Abstract. The World Wide Web is a dynamic environment that everyone (from user to expert) is excited about. Now entering a third stage of life, it faces new challenges: finding good ways to model knowledge about things by attaching meta-data to data itself. This paper focuses on how the WWW is becoming a less ambiguous space and what are the many ways to take advantage of new features becoming available for no cost except ones interest. A particular case study is done on BrainSpa – our own web application for exploring various SPARQL endpoints and sharing queries with members of the semantic lovers community. Keywords: RDF, SPARQL, web application, PHP, concepts, modeling, RAP, OAuth, query, tag, endpoint, prefix, semantic, WWW.1 IntroductionSome well known web jokes flirt with the idea of Google-ing your car keys ormismatched socks. Fortunately for the on-line man, that era may not be so far away,given the recent effort put into moving the WWW into a semantic driven zone formodeling concepts and linking information. In order to satisfy the need for organized, unambiguous information, certainorganizations like W3C have taken the initiative to develop specifications, languagesand technologies that are free to use and more than appealing for the task ofannotating knowledge from any domain. Making use of such innovative technologiesallows users to develop all kinds of interesting and creative applications that certainlyprove useful in a wide range of domains (given todays demand for automating asmany processes as possible). BrainSpa adheres to the above mentioned group of applications – it is aninteresting tool in the form of a web application that allows its consumer to exploreknowledge available in the World Wide Web (in the form of RDF files) usingSPARQL without explicitly having to know the query language. The querying is doneby completing an on-line form with requested information either in an anonymous
  2. 2. way or by logging in to ones account. Obviously, having an account provides morebenefits than using BrainSpa in the anonymous fashion: registered users are presentedwith the opportunity to save or share their queries on the server side, and even storequeries and results on their local computers. Also, registering for this service isextremely easy and does not require memorizing another pair of user-name /password credentials; anyone can log in using an existing Twitter, Gmail, Yahoo! orYouTube account, thanks to the advantages brought by OAuth [1] – an open protocolthat allows secure API authorization in a simple manner. This way, BrainSpa is notjust a client – server solution for browsing the annotated data available online, but thefoundation for a community-driven environment. More about the technologies involved in the project, among with otherinformation, will be presented in the remaining of this paper, as following: coming upnext is a study regarding the current situation of semantic resources and applicationsavailable so far, while section 3 lists and overviews everything involved in the actualdevelopment of BrainSpa; in the fourth chapter, a few use-cases for the applicationare mentioned, together with the relevant diagrams; the paper concludes with chapter5 followed by a list of references.2 OverviewThe World Wide Web is slowly moving into a semantic-driven zone and this caneasily be proven by presenting the new technologies and services freely available toattend to the task of modeling knowledge, annotating data and exploring concepts.Specifications for semantic markup / modeling like RDF or the lightweight micro-formats are gaining popularity as more and more tools that try to improve the“internet surfing” experience make their way on the market. For example, Firefoxextensions (Tails, Operator) have been developed for exploring or operating withmicro-formats. Similarly to the HTML Validator, W3C offers validating andvisualizing services for RDF documents. Also, there is a big number of recentlydeveloped semantic frameworks available for many different programminglanguages: • D2R Server, Joseki, Sesame and Mulgara for Java, • RAP and ARC for PHP, • 4store, OpenLink Virtuoso and Oracle Spatial 11g for C/C++, • RDFStore for C and Perl,and many more, all having specific methods for reading / parsing data from semanticformats, storing RDF triples to a database, creating queries or accessing endpoints. Just like for storing information in the traditional database manner, RDF knowledgecomes with a querying solution – the SPARQL query language. SPARQL (named asa recursive acronym that stands for SPARQL Protocol and RDF Query Language) canbe tasted at the various endpoints that offer user interface for this purpose. Thebiggest project offering a query solution is DBpedia – a project aimed at extractingstructured information from available Wikipedia information. W3C offers an up-to-date and accurate list of SPARQL endpoints for exploring content from a wide rangeof domains.
  3. 3. All these are just a few basic examples of technological advancement achieved inthe semantic web area. Of course, more tools are available and can be created bydevelopers willing to contribute in the progress of the WWW, and BrainSpa tries tobe such a tool.3 ArchitectureBrainSpa was created by following the general guidelines of software engineering.After an in-depth analysis of the requirements came the architectural and detaileddesign of the desired software product. The coding process evolved in a modularstyle, followed by an incremental integration of the system. In the validation step, twoquestions were asked and successfully answered - “Are we building the rightproduct?” and “Are we building the product right?” - in order to test if the initialrequirements were fully respected and implemented in a functional manner. The laststep, maintenance – having the longest lifespan – starts after the product is deployedand ends with the authors loss of interest in it. The following sub-chapters provide more information regarding everythinginvolved in the actual development of BrainSpa.3.1 TechnologiesThis section provides an overview of all technologies involved in the creation ofBrainSpa. They were chosen based on accessibility, (lack of) price, interoperabilityand position towards freeware / open-source.3.1.1 DropboxIn the planning stage, one of the first issues we had to deal with was how to sharefiles (for source code, scripts, images, documentation, etc.) between the developers ina versatile yet time saving manner. With the though in mind that BrainSpa is arelatively small project compared to the mammoths of the IT industry, we consideredthat adopting version control in an SVN manner would more likely separate themembers of the team rather than making them work together. So we found a (at leastin our opinion) better solution in Dropbox [2] – a free service for sharing files amongusers. Since we were already using Dropbox for personal projects, the existing accountsneeded only some new shared folders which would store files for specific tasks(images, actual projects with libraries or documentation). Because Dropbox has file
  4. 4. history and versioning and supports operations like restore, the shared data is exposedto no risk of accidentally deleting / changing any vital information.3.1.2 CreatelyAnother issue encountered in the planning stage was finding the best tool for taskslike visually modeling the database schema or creating use-case diagrams. We hadexperience with ArgoUML and Creately [3] from which we chose the later because(unlike ArgoUML) it is available as a web application, it provides visual elements forcreating a wide range of diagrams (that can be exported as images or PDFs) and itprovides the opportunity to share either files or entire projects among Creately users. Taking advantage of this service enables developers to save time by focusing moreon how to project / model information in a visual way without having to worry aboutversioning, sharing or letting someone know that a diagram content has / needs to bechanged somewhere on the trunk of the project. Last but not least, Creately isappealing due to its modern, eye-candy design of layout and components (anadvantage presented in the form of relevant diagrams in the following sections of thispaper).3.1.3 280 SlidesAnother extremely useful tool for on-line, collaborative work is 280 Slides [4] – thefree web application for creating, saving, editing and sharing presentations in theeasiest way possible. 280 Slides proved useful in creating a beautiful, quality presentation for theBrainSpa project, presentation that was contributed to easily by every member of theteam, since it was always backed-up and available online.3.1.4 Open OfficeA considerable rival for Microsoft Office, OpenOffice [5] is the “free and openproductivity suite” available for just a download and the execution of a clean installer.Due to its lack of price and the fact that it is open-source, this office solution is not
  5. 5. only very popular, but also compatible with any operating system or bureaucratictask. Among the available applications of OpenOffice, Writer was used for the creationof the present document that conforms to LNCS standards.3.1.5 CodeIgniterBecause the server-side of BrainSpa, developed in PHP, must deal with complex tasksfor database operations, session / cookies management, keeping the views and thedata separated with the use of controllers, the project cannot do without a powerfulPHP framework for web applications. CodeIgniter [6] is the best candidate as it provides a simple yet powerfulenvironment with minimal configuration and maximal resourcefulness through itslarge number of libraries. The framework not only proves excellent performanceresults, but also provides all necessary resources for completing tasks like databaseadministration and session management. Also, it is open-source and based on theModel-View-Controller design pattern, very popular especially when it comes tobuilding web applications.3.1.6 Zend FrameworkThe Zend Framework [7] is another powerful solution for building PHP webapplications. It also provides significant resources for common database or sessionmanagement tasks in the OOP and MVC fashion, but it is most popular for its “use-it-all” framework statute. Because Zend Framework has a more than friendly attitude towards the modern,Web 2.0 applications and web services – it provides ways for consuming widelyavailable APIs from leading vendors like Google, Amazon, Yahoo! or Flickr – ourproject makes use of it, together with OAuth, for enabling users to log in on BrainSpausing an existing account from Yahoo!, Google, Twitter or YouTube.
  6. 6. 3.1.7 OAuthOAuth is an open protocol for secure API authorization in a simple and standardmethod from desktop and web applications. What it does is allowing the user to grantaccess to his private resources (located in one site – the Service Provide) to anothersite (the Consumer) without sharing the users identity. The work-flow of OAuth implementations is consistent for most service providersand adheres to the following steps: • the developers signs up to the service provider in order to get a consumer key and a shared secret; • the provider gives the developer a request token; • the application redirects the user to the service provider web site in order to obtain user authorization; • given the user authorization, the service provider redirects back to the application; • upon receiving a request token and OAuth verifier, the service provider grants an access token and a token secret that can be taken advantage of until they expire.3.1.8 RAPBecause BrainSpa is a semantic-oriented web application, some extra operations areinvolved in the overall functionality of the system, operations regarding sending aSPARQL query to an endpoint and receiving an RDF result that will be transformedinto visual-appealing format. This is were RAP [8] can play its role as a powerfulRDF API for PHP with some interesting features like: • methods for manipulating RDF models as a set of RDF triples or resources or through vocabulary specific methods, • integrated RDF/XML, N3, N-TRIPLE, TriX parsers and serializers, • in-memory / database storage, • SPARQL query engine and client library, • integrated RDF server (similar to the Joseki RDF server), these being just a few. RAP is the most suitable software package for parsing, querying, manipulating,serializing and saving RDF models.
  7. 7. 3.2 DevelopmentRegarding the model used in developing BrainSpa, a predominant XP (ExtremeProgramming) technique was adopted by the team. There was no hierarchicaldistribution among the members, a collaborative working style was encouraged, andeach of us was able to bring their contribution to the project by turning to profitpersonal skills. The initial task was devised into a number of issues that we couldwork on alone or in pairs, and we met regularly (both on-line and in person) todiscuss so far progress and future directions to follow.3.2.1 Responsibilities In order to adhere to IT standards and survive on the market, the project, initiallycalled WebSpa, needs a strong identity. All marketing aspects (name change, logo,diagrams, documentation, presentation, speeches) together with some architecturalresponsibilities, database design, testing and research were handled by AdinaToderas. The User Interface (developed using HTML + CSS, JavaScript and jQuery) andclient-side aspects were Sabin Pochiscans responsibilities. Last but not least, server-side aspects (querying, saving, OAuth, RAP, etc.) weredealt with by the pair of the last two members in the team, Eugen Ignat and RaduSimionescu, with occasional help from Adina Toderas (for testing). Each of the four authors had the opportunity to make use of their personal skillsand work on what they enjoyed most / were good at. This is the biggest immaterialreward one can ask for when it comes to school or career.3.2.2 Coding In order to make use of advantages like modularity, re-usability, polymorphism,inheritance and abstraction, the well known Object Oriented coding style wasadopted. Also, because BrainSpa is a web application that benefits of data persistence(saving user information, queries, tags, descriptions to the database on the server side)and having a rather complex user interface, it is implemented in the guidelines of theModel-View-Controller design pattern. The MVC states that data and view should beseparated within a software entity, and should only communicate with each other
  8. 8. using a special controller developed specifically for that data and that view. In otherwords, while the Model and the View are quite often reusable, the Controller is not. In our project, information regarding users and their queries is saved in a databasestorage system (the Model of MVC). Figure 1 depicts the schema for the mentioneddatabase. The View of the MVC is the user interface itself – a visual interactive space thatthe user utilizes in order to communicate with BrainSpa and take advantage of itsfeatures and capabilities. The UI (Fig. 2, 3 and 4) is composed of a web page whichhandles different functionality tasks like logging in, registering, querying an endpointor browsing through existing public queries. The project interface is developed as aRIA (Rich Interface Application) – similarly to a desktop application, it provides asmuch functionality as possible within the same window of interaction. Also, the mainmodule of BrainSpa, which handles the construction of SPARQL queries, is inspiredfrom the “View” module in Drupal (Fig. 5) that has similar functionality - generatinga MySQL query without explicitly knowing the MySQL query language. Development for BrainSpa was done mostly using NetBeans, the Java integrateddevelopment environment that can be user for coding in many other languagesbesides Java, languages such as JavaScript, PHP, Python, Ruby, C, C++, Scala orClojure. Because the IDE works anywhere if there is a Java Virtual Machine installed,it is a platform independent working environment. A screen-shot of the project inNetBeans is shown in Figure 6.
  9. 9. Fig. 1. Diagram for the database schema (done using Creately service) of the BrainSpa project.
  10. 10. Fig. 2. BrainSpa user interface – query builder.Fig. 3. BrainSpa user interface – query builder in action.
  11. 11. Fig. 4. BrainSpa user interface – results.Fig. 5. Drupal “View” module for generating MySQL queries.
  12. 12. Fig. 6. BrainSpa source files as seen in NetBeans. As it was mentioned in a previous section, the sharing and version control processfor BrainSpa source files was handled by Dropbox, a free, lightweight service for on-line backup and file sync. Because of this, a collaborative working style was adoptedby the team members – frequent meetings (both on-line and in person), working inpairs, etc. The last diagram of this sub-chapter, Figure 7, represents a detailed deconstructionof the regular SPARQL query; all aspects involved in the query are shown in a tree-like structure in order to reflect and argument the display of user interface elementsinvolved in generating an interrogation.
  13. 13. Fig. 7. Detailed deconstruction of a SPARQL query.4 Use-casesNo matter how efficient and functional a software package is, it must prove to havesome meaningful use to the target audience, it must practically answer the question toa problem that is of interest to a certain group of people. The presented project aims tooffer solutions for exploring knowledge modeled with the use of the RDFspecifications, knowledge available at endpoints that the query will reach andinterrogate, thus obtaining a result to give back to the user. The target audience iscomposed of users having a small amount of technical knowledge in IT and that are
  14. 14. fond of web and semantic technologies, but it can be extended to a wider class ofusers with no IT background that want to come across valid knowledge. BrainSpa can be invoked either in an anonymous manner or with an account, thelater being preferred since it provides more feature that are community oriented.4.1 Anonymous useUsers can access the BrainSpa web application in an anonymous fashion (withoutregistering with an existing account) but the functionality is limited, as the projectaims to be community oriented. The only available option is filling the query form inorder to compose and send a SPARQL request to an endpoint and receive the results(displayed to the user in table manner). A relevant use-case diagram is presented inFigure 8, showing how the actors involved in the scenario (the User, the System –BrainSpa – and a SPARQL query endpoint interact with each other.Fig. 8. Use-case diagram for an anonymous connection to BrainSpa web application.4.2 Registered useOne can make use of the BrainSpa services fully by using an account. One of themost interesting parts of the project is the fact that a user does not need to actually
  15. 15. register with BrainSpa and memorize another pair of user-name / passwordcredentials. The project takes advantage of the OAuth protocol which means thatanyone having a Yahoo!, Google, YouTube or Twitter account can log in to the webapplication using that account. Once logged in, the number of options availableincreases. A possible use-case scenario is the following: somebody wants to find accurateinformation regarding a certain subject (for example, comments about Romania) and,after obtaining the results, store them together with the query on the local computer.All one needs to do is complete the form available online in order to generate a querylike the following: PREFIX rdfs: <http://www.w3.org/2000/01/rdf- schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?info WHERE { <http://dbpedia.org/resource/Romania> rdfs:comment ?info . } After the endpoint processes the received query, it sends a result back to the webapplication that displays the received information in a table fashion. Both the results(as an RDF file) and the query can be saved to the local file system only with a fewclicks. Another use-case scenario is the following: a registered user wants to save hisqueries in order to use them in the future (Since the information available on-line andmodeled as RDF will continue to change and hopefully change, it is evident that atodays result to a query will look different then the result of the same query executedin a month from now.). Also, the user wants to attach both tags and a description tohis query in order to distinguish his saved information more easily. This is alsopossible at the cost of just a few clicks in the user interface. Even more, the user ispresented with the option of saving his query either private or public, which brings usto the next use-case scenario. A user wants to browse through the existing shared queries. This is made possiblethrough the help of a search form that can receive as input the tags or / anddescription key words one wants to filter by. Upon searching, the application willinterrogate the public queries stored in the database and display the results to the userin the user interface. After reviewing them, he can eventually chose to execute andsee the results. But this is not all that BrainSpa has to offer. Other important features are thepossibility of favoring specific query information like endpoints and prefixes: eachuser is provided with lists of his favorite endpoints and prefixes, and the possibility ofadding or removing entries from those lists is of course made available. Figure 9 presents the complete use-case scenarios diagram. Each major operationpossible within BrainSpa is represented by an oval use-case element, while the arrowsindicate the direction followed by each operation.
  16. 16. Fig. 9. Use-case diagram for a user logging in with an account to BrainSpa web application.
  17. 17. 5 ConclusionsThe idea of browsing the World Wide Web in an intelligent, concept driven manner isextremely appealing but seems to be far from happening in the next few years.However, important progress has been made in domains closely related to theproblem at hand, and we are not that far from using a search engine than knows howto distinguish between the Java programming language, the Java island and the Javacoffee. Certain organizations like W3C have taken the initiative to develop specifications,languages and technologies that are free to use and more than appealing for the taskof annotating knowledge from any domain. Making use of such innovativetechnologies allows users to develop all kinds of interesting and creative applicationsthat certainly prove useful in a wide range of domains (given todays demand forautomating as many processes as possible). BrainSpa adheres to the above mentioned group of applications – it is aninteresting tool in the form of a web application that allows its consumer to exploreknowledge available in the World Wide Web (in the form of RDF files) usingSPARQL without explicitly having to know the query language. The authors meant todevelop a tool that tries to improve the on-line experience of a user fond of web andsemantic technologies. The development process of the project was complex andhelped every team member enrich their knowledge and technical experience.Research has been done on a large amount of technologies (besides the onesmentioned in the present paper), so that the most suitable of them may be chosen tohelp obtain good functionality and performance within the software application. As the use-cases demonstrated, BrainSpa represents the first step in building acommunity of users that are interested in innovation and technological advancement.Hopefully, with time, it will evolve more features and gain a large number ofmembers. Even at the current stage, the authors believe it to be an interesting anduseful tool that can be later integrated in solving more complex problems encounteredin the semantic web area of research.References1. OAuth, http://oauth.net/2. Dropbox, https://www.dropbox.com3. Creately, http://creately.com/4. 280 Slides, http://280slides.com/5. OpenOffice, http://www.openoffice.org/6. CodeIgniter, http://codeigniter.com/7. Zend Framework, http://zendframework.com/8. RAP, an RDF API for PHP, http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/9. Twitter API Wiki, http://apiwiki.twitter.com10.Authentication and Authorization for Google APIs, http://code.google.com/apis/accounts/docs/OAuth.html11.Programmers Reference Guide to Zend Framework and OAuth, http://framework.zend.com/manual/en/zend.oauth.introduction.html
  18. 18. 12.Yahoo! OAuth authorization model, http://developer.yahoo.com/oauth/13.Developers guide to Youtube Data API, http://code.google.com/apis/youtube/2.0/developers_guide_protocol.html14.Code Recipes, http://code.activestate.com/recipes/15.JSON in JavaScript, http://www.json.org/js.html

×