Semantic User Profile 1 Marius Barat – firstname.lastname@example.org 1 Adrian Stefan Popescu –Adrian.email@example.com 1 Alin Alexandru – firstname.lastname@example.org 1 Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iasi 16, Berthlot Street – 700483 Iasi, Romania Abstract. Due to overgrowing number of web social-networks and digital identity management service a single individual can have a high number of accounts and authentication information The possibility o forgetting the authentication information, loosing much time on checking the news from every social-network and the lack of connection between the well known social- network lead us to create a WebService with a single authentication method, a single web link and a single web page in which a large number of social-networks and other web services for identity management could be reached. Using a high number of identity information lead to creating scores for a user (like Klout) and for known persons of it. Also a possibility for reaching a SPARQL database was mandatory to overgrowing of semantic web and SPARQL endpoints. Keywords. Semantic Web, Windows Azure, Protégé, WCF, Cloud Service, WebService, Virtuoso, EndPoint, RDF, SPARQL.1 Introduction This application started from the necessity of existing a social web platform able tomerge all the information that belongs to a user from the multitude of social websites
available nowadays. This platform should be smart enough to build hierarchies betweenusers, and also to provide statistics about their activities and their interests. We built this application using the .Net Framework 4.0 which comes with a veryimproved version of Entity Framework then the one from .Net Framework 3.5 (a dataaccess library that lives in the system.Data.Entity namespace). Entity framework enables a more code-centric option called “code first develop-ment”. It enables a different approach that enables to develop without ever having to openXML mapping file, create and configure a database. During this application development Entity Framework was used to store Web userand their profiles (from social networks like Facebook or Twitter) and to ensure authentica-tion. Also we store the access tokens for profiles information in the database. Beside .Net Framework 4.0, we used many others tools and technologies, in orderto ensure a strong modularity for this application: stable independent modules, which canbe easily modified, update, without any impact on the other ones, and because some ofthem are exposed as web services, they offer strong code reusability. The main core of this application work like this: a user wants to create a new ac-count on this platform; due the use of OAuth technology, he can use his Facebook or Twit-ter account to access the platform. Once he gets logged in, some processing start in cloud,and all the information gathered by them are stored as a RDF graph on a remote server.This remote server has installed a Virtuoso open source distribution, and is able to storeRDF graphs and also provide support for SPARQL queries for them. In the next sections we will describe some of the most important technologies weused during developing this application, a work flow for the application, and some futurework directions.2 Overview We present an overview of our application and the connection between the mainparts in Fig. 1.
3 Technologies Right from the start of the project we realized the magnitude of the project and acorrect software engineering approach was well recommended for the purpose of goodmanagement of the project. Using a software engineering approach and for a overview ofthe workflow we present the technologies in the order of appearance in the engineeringscheme with correlation with the normal workflow.3.1 Source control In order to work on the same project, in the same time, without conflict regardingsources, and to keep a track of all the modifications we did at the project, we used a sourcecontrol. Fig. 1. Worklow of the SUP application
Source control is the management of changes to documents, programs, large websites and other information stored as computer files. It is most commonly used is softwaredevelopment, where a team of people may modify the same files. Changes are usually identified by a number or letter code, called the “revision num-ber”, or simply “revision”, finding an older version for a source code file being really simpli-fied. We used the Visual SVN source control. The project and all the sources that belong toit are available at the following address: https://svn.info.uaic.ro/repos/sup/. It is not a publicsource control, as read/write access are allowed only after a user access is granted, and onlyafter inserting the credentials from the info.uaic.ro account.3.2 OAuth SUP project supports both OAuth 1.0a and 2.0 and it is fully configurable to add newendpoints for authentications. In the current version of the solution we have configuredTwitter ( that uses OAuth 1.0) and Facebook ( for OAuth 2.0). Fig. 2. . Source control using Visual SVN
OAuth allows users to hand out tokens instead of credentials to their data hosted bya given service provider. Each token grants access to a specific site (e.g., a video editingsite) for specific resources (e.g., just videos from a specific album) and for a defined dura-tion (e.g., the next 2 hours). This allows a user to grant a third party site access to their information stored withanother service provider, without sharing their access permissions or the full extent of theirdata. During the development process we had two major difficulties:1. Add possibility for a user to add multiple data sources and to be able to use any of the added pro- viders to log in.2. Because of the load balancer that is available in Cloud, the authentication process could start on an instance and finish on another. In that case we had to store all the temporally information in a table. Also we didn’t have a solution to determine the current URI in order to set the callback used in the process. The solution was to take advantage of the different configuration files that Visual Studio and Windows Azure uses. That means we could set in web.config file one callback url for development process and another for deployment.3.3 Windows Azure Windows Azure is the platform used for hosting the project because it offers thepossibility to scale each role individually. Also for us an advantage was that Visual Studiooffers an emulator for this cloud platform. We used Web role to host the website and worker roles ( one for getting the datafrom sources and one that processes the information into a RDF format using an OWLschema. For storing data we used SQL Azure and Azure Tables. The first one was used forstoring persistent data like user access tokens. Azure Tables is used for storing temporallydata for Datadigest Role.
We could give access to the Azure Table store to other developers that could usethat information to create their own ontology. That information could be accessed usingOData protocol.3.4 Ontologies and ProtégéIn computer science and particularly in our case ontology represents the knowledge forma-lized as a set of concepts, properties and rules within a domain and the relationships be-tween the concepts. And ontology was needed for a good management and storage of the informationthat is send by Windows Azures WorkerRoles from the Cloud and then classified in a con-sistent way in the Virtuoso database. For creating the ontology we choose OWLRDF modelbecause of the well known compatibility with a high number of web services and especiallybecause of the simple correlation between RDF and Triples that are used in our case ofVirtuoso storage model. We used for building the OWLRDF model a well known software Protégé that of-fered us the possibility in a user-friendly approach to create classes, properties and to con-nect them in a GUI interface. Fig. 3. Protégé GUI interface
The ontology was created using top-down approach engineering, first step wasasking us what questions the ontology should know to answer, after we had all the answerswe started to model the basic classes (User, Information, Profile) and taking the modeldeeper in specifications. Fig. 4. Ontology graph The relationships between classes and proprieties were designed at the same timewith creating the database tables and the information that we can retrieve from the differ-ent social-networks that we integrated in our system. We encountered different problemsin designed the ontology because the rules and classes were constantly changing in orderwith the new requests imposed by the social-networks.3.5 Virtuoso OpenLink Virtuoso is a cross platform universal server to implement Web, File, andDatabase server functionality alongside Native XML Storage, and Universal Data AccessMiddleware, as a single server solution. It includes support for key Internet, Web, and DataAccess standards such as: XML, XPATH, XSLT, SOAP, WSDL, UDDI, SMTP, ODBS, JDBC, etc.
It provides a high-performance virtual database engine for the Distributed Compu-ting Age. It is a core universal data access technology set to accelerate our advances intothe emerging Information Age. It provides transparent access to existing data sources,which are typically databases from different database vendors. Fig. 5. Virtuoso endpoint Service We have installed a Virtuoso open source distribution on a remote server, in orderto store there the rdf graphs with information about all the users. Virtuoso provides anendpoint for sparql queries, which can be performed on any of the graphs stored on theremote server. Due the fact that we are dealing with a remote server, dedicated only forthis service: graphs storage and offering the possibility of making sparql queries using theendpoint it provides, this remote server could be used not only for this social platform thatwe developed, but also by any other application that need rdf data storage, and also needto perform sparql queries for the rdf information.
We encountered some problems when we started to install this software, due thefact that the remote server we installed it on has 64bit processor architecture, and the Vir-tuoso distributions for the 64bit architecture are not really stable. Once installed, the dis-tribution we worked with (version 6.3) has provided a lot of test database samples, andalso many useful usage examples.3.6 WCF WebService As we already said in the previous section, the Virtuoso server is installed on a re-mote server machine, and its functionalities are available also for any other application, notonly for the social platform we wanted to build. All these functionalities are accessed via web services, which are build using WCF –Windows Communication Foundation. WCF is a unified programming model for buildingservice-oriented applications. It enables developers to build secure, reliable, transactedsolutions that integrate across platforms and interoperate with existing investments.The functionalities that are implemented through WCF web services are the next ones: The possibility of creating a new rdf graph Insert a new rdf triple in an existing rdf graph Delete a rdf triple in an existing rdf graph Initialize a rdf graph Execute sprawl queries for existing rdf graphThe communication between the WCF web service`s code and the virtuoso framework isdone using a public module available on the internet: dotNetRDF.dll; this provides the pos-sibility of inserting a new rdf graph in tha database, to update an existing one, or perform-ing a sparql query.All these web services are published using the Internet Information Services (IIS) Manager.Being public web services, they can be used by any user, not only by the WorkerRoles fromcloud.
4 Workflow In this section we will present a usage scenario: whenever a new user wants to joinour social web platform, all he has to do is to insert its Facebook or twitter credentials. Atthis moment, it has to wait a few moments, time while the WorkerRoles from cloud start toinspect his Facebook and his twitter account for friends, posts, likes, and statuses. All theseinformation are gathered and they are stored as a rdf graph in a database from the virtuo-so server via a web service. After this part is ended, using some sparql queries that are per-formed via another web service published from the server where the rdf graph is stored, ascore is computed for this user, and he has the possibility to see his score and also thescore for other user, for his friends who already have an account on our platform. Also, again using the web service that provide access to the endpoint service fromthe web server where the virtuoso distribution is installed, this user has access to severalstatistics regarding his activity in the social networks he has account on.5 Scalability SUP project is designed to be able to scale each component as it needs. For exampleit is possible that at some point to have a large amount of data to transform in RDF format.In this case we need to increase the number of in-stances for DataDigest Worker Role. Windows Azure offers an API from which we can monitor the load on each moduleand be able to increase or decrease the number of instances pro-grammatically In the presented solution Virtuoso server is the only point that is not sailable but wecould install it in Amazon EC2 which is the cloud solution from Amazon.6 Future work The project is opened to different improvements and new features added in any ofthe main points of the application. One of the possibilities in improving the SUP application is by making suggestions toa user with different perspectives. We can add different artificial intelligence algorithms in
order to build groups of users which may have common interests, based on the tags fromtheir posts and from the posts he "like" Regarding the authentication method, in the future work, a linked in authenticationmethod will be implemented, and also the information from this social network will begathered. In order to quickly extend our platform, and to increase the number of users, it is re-ally necessary to implement a module to invite our user’s friends from the other socialnetworks to join our platform
7 Bibliography Hosting a web service using IIS webserver http://beyondrelational.com/blogs/dhananjaykumar/archive/2011/02/11/walkthrough- on-creating-wcf-4-0-service-and-hosting-in-iis-7-5.aspx Module for communication between C# and Virtuoso http://www.dotnetrdf.org/content.asp?pageID=Using%20Virtuoso%20Universal%20Serv er Virtuoso documentation http://wikis.openlinksw.com/dataspace/owiki/wiki/MetaWiki/ OAuth Login http://oauth.net/ Azure Cloud http://www.windowsazure.com/en-us/ Owl modeling http://protege.stanford.edu/ WCF services http://msdn.microsoft.com/en-us/library/ms731082.aspx