Public Transportation Path Finder

1,381 views
1,311 views

Published on

Published in: Technology, Business, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,381
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Public Transportation Path Finder

  1. 1. Finding the Shortest Path in Transit Networks Victor Chircu Faculty of Computer Science, “AlexandruIoanCuza” University, Iasi, Romania victor.chircu@info.uaic.ro Abstract.More and more governments provide Sparql endpoints to query over public data, which sometimes includes route configuration for transit agencies. Knowing the route configuration and geolocation for each route stop in a transit network, one can run an algorithm to find the (k-)shortest path(s) from point A to point B. Since the Romanian Government does not provide this kind of information, this article describes a way one can access this data, store it in a triple server, and expose it in Sparql endpoint. Keywords: semantic web, rdf, sparql, triple store, virtuoso universal server, dotNetRdf, transit network, public transportation network, k-shortest path, A*1 IntroductionThe focus of this paper is to provide a way to acquire, store and expose transit data, ifthis data is not found open over the Web. In recent years, public transportation agen-cies have been providing transit data freely over the internet, data that can be used bydevelopers to build useful applications. In some cases1, the agencies even encouragedevelopers to do so, by organizing applications contests. This trend started in 2005,when TriMet2and Google developed the Google Transit Feed Specification (GTFS)3format that will be used for finding routes using the Google Transit Trip Planner4.Over the years, more transit agencies incorporated trip planners in their websites, andprovided data in GTFS format to be used with the Google Transit Trip Planner. Unfortunately, this is not the case for Romania, where public transportation(linked) data is still closed. For example, the RATB5(the main bus transit agency inBucharest) websitedoes not provide a trip planner, but does provide an HTML viewof all the routes, and stops per routes. On the other hand, Metrorex6 (subway transit1 http://mtaappquest.com/2 http://trimet.org/3 https://developers.google.com/transit/gtfs/reference4 http://www.google.com/intl/en/landing/transit/#mdy5 http://www.ratb.ro/6 http://www.metrorex.ro
  2. 2. agency in Bucharest) does provide a trip planner, but provides only a visual routemap, which cannot be parsed using a computer. But Metorex’s route planner wouldneed some improvement, since it provides routes only from station to station (theaverage user would prefer to enter the origin and destination by clicking on a map) enterand does not connect with the RATB routes. Thus, there is a need for a city level tran- trasit route planner that uses routes from all agencies in a city. This is the main purposeof my dissertation paper, but this paper focuses only on getting the data, organizing it buin a manner that fits my purpose and exposing it in a Sparql endpoint, so it can beused by other developers.2 Getting the dataI have managed to get part of the data I need as an SQL database. This database co This con-tains most of the routes (name, geometry) and stops (name, location) in the Bucharesttransit network. The routes are managed by the two biggest transit agencies in thecity, RATB and Metrorex. This data is incomplete, because I need to know the routestops in each route and their sequence. To get this information, I can use a techniquecalledweb scarping, because the RATB website does provide a way for human users ,to see what stops are in each route. As defined in [1], web scraping is“is a computer omputersoftware technique of extracting information from websites websites”.Part of the page for one of the RATB routes on the RATB website is represented inFigure 1. Fig.1.Route page on the RATB website. Fig.
  3. 3. As you can see, you can select the route from one of the drop down lists (each listrepresents a type of route: rail, trolley, bus). The selected route name must be sent asa parameter to the server. To find out how, we can use a Net monitoring tool, likeFirebug7for Mozila’s Firefox browser. So, for example, let’s say we have chosen 106 irefoxfrom the bus route drop down list (the third from the left). This selection automatica automatical-ly triggers a postback. The post parameters are shown in Figure 2. 2.Using Firebug to detect post parameters. Fig.2As you can see, the route name is indeed sent as a post parameter, with the nametlin3. Knowing this, I can build a small (PowerShell) script that makes a GET request (to http://www.ratb.ro/v_trasee.php, gets all the options from the HTML select element http://www.ratb.ro/v_trasee.php,that I am interested in (e.g.: the bus route list), and for each option value, makes anHTTP Post requestwith the required post parameters (e.g.: if we choose the third drop questwithdown list, then thetlin1, tlin2, tlin4, tlin5 parameters are set to 0, and tlin3 is set to the tlin1,option value). The POST returns the page in HTML format. Now, I have to parse thepage, extract the information from the table and write it to disk. The next step is to nextwrite the route steps to the existing SQL database. This can be done manually, orautomatically, by matching the stop name from the file with the stop name in the d da-tabase. Following these steps I have found the complete route itinerary for three throutes.7 http://getfirebug.com/
  4. 4. 3 Modeling the data3.1 General InformationTaking in account the classes of entities, and the relationships between these classes,the domain I want to model is shown in Figure 3. Fig.3.Transit Domain Model.Because I want the dataset that I am exposing to be extensible and interoperable withother datasets, I have chosen to model the data using RDF triples, store it in a TripleStore and expose it in a SPARQL endpoint.In the semantic Web, a knowledge base co contains a set ofRDF (Resource Description RDFFramework) [2] triples of the form <subject, predicate, object>. Resources are identi- identfied by a unique URI. Subjects and predicates are always resources, while objects canbe either resources or literals (having an associated data type). Using this model,any akind of domain can be represented. Because in semantic web it is strongly advised to use, if possible, existing RDF v use vo-cabularies, and not create a new one, I have searched for one that would suit my ,needs. I have found the T Transit8vocabulary, which models exactly the domain I am8 http://vocab.org/transit/terms/
  5. 5. trying to describe. Transit is “a vocabulary for describing transit systems and routes . a routes”and is based on the General Transit Feed Specification published by Google. The s Google.Transit vocabulary core classes are shown in Figure 4. re The Transit vocabulary is already used to expose MTA New York city transit data 9and bus route data in Southampton, UK10 , so it definitely is the most known transit SouthamptonRDF vocabulary. Fig. 4.Transit Vocabulary Core Classes.As you can see, the classes in this vocabulary map over the classes in my domainmodel. There is one class that I do not use in my domain module, the Schedule class.This is done because in most of the cities in Romania, public transportation does notrun according to a fixed schedule. By eliminating this class, I have greatly minimizedthe amount of data I have to manipulate.9 http://kasabi.com/dataset/mta aset/mta-new-york-city-transit10 http://data.southampton.ac.uk/bus-routes.html http://data.southampton.ac.uk/bus
  6. 6. 3.2 Transit Core Classes and PropertiesIn this section, I will go over the main Transit classes and properties that I am usinganddefine them according tothe Transit vocabulary document11. You may notice that Iuse only a subset of the classes and properties defined in this document, because I usea model a little less general, by eliminating the Schedule and Service classes.Note: Each resource’s URI is written in the page footnote sectionClasses Agency12 - an agency is an organization that oversees public transportation for a city or region (e.g. RATB, Metrorex). Route13 - a public transportation route; some of its subclasses are: ─ Bus Route14 ─ Rail Route15 ─ Subway Route16 Stop17 - a location where passengers board or disembark from a transit vehicle Route Stop18 - a location where passengers board or disembark from a transit ve- hicle for a specific route.Properties agency19 - the agency that operates this public transportation route route20 - A route associated with the given resource routeStop21 - Links a route to a particular stop and the sequence of that stop in the route stop22 - the physical stop associated with this route stop (Note: this property is not used according to the Transit vocabulary, because having this property implied be- ing a ServiceStop. In my domain, having this property implies being a RouteStop, because I do not need to use the ServiceStop class.11 http://vocab.org/transit/terms12 http://vocab.org/transit/terms/Agency13 http://vocab.org/transit/terms/Route14 http://vocab.org/transit/terms/BusRoute15 http://vocab.org/transit/terms/RailRoute16 http://vocab.org/transit/terms/SubwayRoute17 http://vocab.org/transit/terms/Stop18 http://vocab.org/transit/terms/RouteStop19 http://vocab.org/transit/terms/agency20 http://vocab.org/transit/terms/route21 http://vocab.org/transit/terms/routeStop22 http://vocab.org/transit/terms/stop
  7. 7. 3.3 Other VocabulariesSome of the other RDF vocabularies I am using are: FOAF23, for the name and page properties RDF24, for the type property RDF Schema25, for the label property Geometry Ontology 26 , for describing the geometry of a route in Well-Known Text27 (WKT) format. WGS84 Geo Positioning28, for representing latitude and longitude information in the WGS84 geodetic reference datum4 Technical Information4.1 General InformationBecause of my hands-on experience with the Microsoft development stack, I havechosen to develop the application using .Net/C#. This might be a challenge, becausemost of the semantic web tools focus on non-Windows client, as seen in the pie chartrepresented in Figure 5, taken from [3]. The same source[3] shows that this trend isconstant, since in 2011 there were not developed any new semantic tools for the .Netplatform.I have taken in consideration switching to the Java platform, which provides a set ofpowerful tools for working with RDF and OWL, most notably the Jena Framework29,developed by Apache, but after doing some research I have found a tool written forthe .Net platform which suits my needs. This tool will be presented in detail in section4.4 of this paper.23 http://xmlns.com/foaf/0.1/24 http://www.w3.org/1999/02/22-rdf-syntax-ns#25 http://www.w3.org/2000/01/rdf-schema#26 http://data.ordnancesurvey.co.uk/ontology/geometry/27 http://edndoc.esri.com/arcsde/9.0/general_topics/wkt_representation.htm28 http://www.w3.org/2003/01/geo/wgs84_pos#29 http://incubator.apache.org/jena/
  8. 8. Fig.4. Semantic Web Tools by Programming Language4.2 Knowledge baseOne of the disadvantages of using 3rd party triple stores is that there aren’t any open open-source products. But because of the nature of my problem, I could not use an in in-memory triple store, I needed an efficient one, with a powerful query engine. Upon ponresearching different options, I have decided to use OpenLink’s Virtuoso Universal option ,Server30 as a Triple Store. My option was based on Virtuoso maturity, and its RDFGraph Model features31: Backward ackward Chaining OWL Reasoner covering: rdfs:subClassOf, rdfs:subPropertyOf, owl:sameAs, owl:equivalentClass, owl:equivalentProperty, bPropertyOf, owl:InverseFunctionalProperty, owl:inverseOf, owl:SymmetricalProperty, and owl:TransitiveProperty SPARQL 1.1 Query Language, Protocol, and Results Serialization support SPARQL Create, Update, and Delete (SPARUL)30 http://virtuoso.openlinksw.com/31 http://virtuoso.openlinksw.com/rdf-quad-store/ http://virtuoso.openlinksw.com/rdf
  9. 9.  Supports data broad range of RDF model data representation formats: HTML+RDFa, RDF-JSON, N3, Turtle, TriG, TriX, and RDF/XML REST interfaces for Create, Read, Update, and Delete operations RDF Data is accessible also accessible via ODBC, JDBC, ADO.NET (Entity Frameworks compatible), OLE DB, and XMLA data providers / drivers.Because the application that I’m building is a non-commercial application, I am notinterested in acquiring a Virtuoso commercial license at the moment. OpenLink doesprovide 2 X 15 days trial of the software. I have used the first one while configuringand testing the Virtuoso Server, and will use the second one on demos later on. Whileworking on the application, I will use in-memory triple stores, loaded and saved to thefile system.4.3 .Net API for working with RDFWhile looking for a .Net API for working with RDF I have found three possible can-didates: Linq2RDF32, a LINQ query provider that converts queries into the SPARQL query language. Unfortunately, it is not a mature enough API and the last update for this project was in august 2008, so it is not under development anymore. Jena .NET33, a flexible .NET port of the Jena semantic web toolkit. Unfortunately this project is abandoned too, while still a beta 0.3 release. dotNetRDF34, an open-source semantic web/RDF library for C#/.Net. Even if this is just a beta 0.5 release, it is still under development, which is a big advantage over the other two options. This API will be described in the next section.4.4 dotNetRDFGeneral InformationSome of the points of interest regarding the API are: currently a beta release (version 0.5.1) works on .Net 3.5 (but according to the project’s Issue Tracker35, moving the li- brary to .Net 4.0 is a top priority) "simple but powerful API for working with RDF" operates primarily with Triples, Graphs and Triple Stores has limited support for Inference no support for OWL32 http://code.google.com/p/linqtordf/33 http://semanticweb.org/wiki/Jena_.NET34 http://www.dotnetrdf.org/35 http://www.dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=22
  10. 10. Known formatsThe library can read RDF fragments (including Graphs and Triple Stores) fromstrings, files and even URIs. It can also write RDF fragments to files and strings.Reading and writing can be done in all of the most used RDF formats: RDF/XML,RDF/JSON, NTriples, Turtle, Notation 3, XHTML + RDFa/GraphsThe API has support for getting Nodes and Triples from a Graph by a given criteria(which is a combination of subject, predicate and object), merging graphs and compu-ting graph difference and equality.Triple StoresThe library can work with both in-memory Triple Stores and native Triple Stores.It provides support for working with: in-memory triple stores, loaded and saved from and to disk in two ways: ─ a folder, where each files represents a single Graph, and there is an additional index file ─ a single file, using one the following formats: TriG, TriX and NQuads simple SQL based stores with MySQL and Microsoft SQL Server databases native 3rd party Triple Store: AllegroGraph36, Dydra37, 4store38, Fuseki39, Joseki40, Sesame 41 (any Sesame based store e.g. BigOWLIM 42 ), SPARQL Graph Store HTTP Protocol for compliant stores, Stardog43, the Talis Platform44 and Virtuoso.By providing an easy way to work with Virtuoso based Triple Stores, dotNetRDFproves to be the right choice.QueryingUsing dotNetRDF one can query easily over: in-memory Graph using the library’s SPARQL implementation remote SPARQL endpoints 3rd party Triple Stores, using native query (this is very important, since we can rely on the more powerful Virtuoso query engine and not on the weaker dot- NetRDF implementation.36 http://www.franz.com/agraph/allegrograph/37 http://dydra.com/38 http://4store.org/39 http://incubator.apache.org/jena/documentation/serving_data/index.html40 http://www.joseki.org/41 http://www.openrdf.org/42 http://www.ontotext.com/owlim43 http://stardog.com/44 http://www.talis.com/platform/
  11. 11. The query mechanism is compatible with the current draft45 of the SPARQL 1.1 stan-dard.Inference and ReasoningThe current version provided three types of reasoners: RDFS Reasoner, which does not apply the full range of possible RDFS based infe- rencing but does do the following: ─ asserts additional type triples for anything which has a type which is a sub-class of another type ─ asserts additional triples where the property (predicate) is a sub-property of another property ─ asserts additional type triples based on the domain and range of properties SKOS Reasoner is a simple concept hierarchy reasoner which can infer additional triples where the subject has an object which is a skos:Concept in the taxonomy by following skos:narrower and skos:broader links as appropriate. Simple N3 Rules Reasoneris a reasoner that is able to apply simple N3 RuleUnfortunately, there is no API support for using inference with 3rd party TripleStores. Because of this, the reasoner that comes with the Virtuoso Universal Servercannot be used.ConfigurationThe library comes with a very useful Configuration API that can be used to load dy-namically commonly used objects (such as Graphs, connections to Triple Stores etc.),and a couple of tools for deploying RDF enabled ASP.NET Web Applications. Be-cause of these last two features, exposing a SPARQL endpoint is a trivial task.5 Implementation Details5.1 Populating the Virtuoso RDF Triple StoreAs mentioned in section 2, I have the data in a MS SQL Server Database, and, asmentioned in section 3, I have the RDF vocabulary. What I have to do is migrate thedata from the SQL database into the Virtuoso Triple Store. I have done this in twosteps: Write the data from the SQL database into a set of files. To do this in the simplest way, I have used the RAD features of the Visual Studio 2010 IDE together with Entity Framework: the Entity Framework46 has created a set of classes, based on45 http://www.w3.org/TR/sparql11-query/46 http://msdn.microsoft.com/en-us/data/ee712906
  12. 12. the database’s tables. In code, I got the data from the DB, using these classes, and wrote the data to a set of files. The second step was to write the data from the files to a Graph. This could not be done because EF 4.1 is not compatible with .NET 3.5, and the dotNetRDF library is built on .NET 3.5. After this step, I have written the data from the Graph to the Virtuoso Triple Store (dotNetRDF makes this task easier).5.2 Exposing the data through a SPARQL EndpointTo expose the data in the Virtuoso Triple Store through a SPARQL Endpoint, Icreated a new ASP.NET Web Application, and Added in the App_Data folder, a con-figuration file with the following content:@prefix dnr: <http://www.dotnetrdf.org/configuration#> .# Firstly note that our Handler must have a subject whichis a specialdotNetRDF URI as discussed in Configuration API - HTTPHandlers<dotnetrdf:/sparql> a dnr:HttpHandler ;dnr:type "VDS.RDF.Web.QueryHandler" ; # States that wereusing theQueryHandlerdnr:queryProcessor _:proc ._:proc a dnr:SparqlQueryProcessor ;dnr:type "VDS.RDF.Query.SimpleQueryProcessor" ;dnr:usingStore _:store ._:store a dnr:TripleStore ;dnr:type "VDS.RDF.NativeTripleStore" ;dnr:genericManager _:manager .# Register the Virtuoso Ffactory_:virtuosoFactory a dnr:ObjectFactory ; dnr:type "VDS.RDF.Configuration.VirtuosoObjectFactory, dot-NetRDF.Data.Virtuoso" .# Now we define the initial dataset_:manager adnr:GenericIOManager ;dnr:type "VDS.RDF.Storage.VirtuosoManager, dot-NetRDF.Data.Virtuoso" ;dnr:server "myIp" ;dnr:port "1111" ;
  13. 13. dnr:database "DB" ;dnr:user "user" ;dnr:password<appSettings:VirtuosoPassword> .As you can see, we register an Http Handler47 of type QueryHandler. To this handler,we associate a QueryProcessor. The QueryProcessor use a Native Triple Store, whichis defined by a manager that points to the Virtuoso Universal Server instance.Now all I have to do is to register the handlers in the web application’s configurationfiel (web.config), which is done automatically by the rdfDeploy tool that comes withdotNetRDF.6 Future DevelopmentAs described in [4], [5], any algorithm that aims to get the shortest path in a networktransit system needs some sort of pre-processing. This is needed since web usagestudies have shown that the path computation time should be less than 7 seconds [6],[7]. The pre-processed data needs to be stored in the knowledge base, and since it isalgorithm dependent, I might have to extend the Transit vocabulary with a couple ofnew classes and properties in order to store the information.47 http://msdn.microsoft.com/en-us/library/aa479332.aspx
  14. 14. 7 Bibliography1. ***, Web Scraping, http://en.wikipedia.org/wiki/Web_scraping2. Allemang, D., Hendler j., Semantic Web for the Working Ontologist, Morgan Kaufmann, 20083. ***, The State of Tooling for Semantic Technolo- gies,http://www.mkbergman.com/991/the-state-of-tooling-for-semantic-technologies/4. J. Jariyasunant, D. Work, B. Kerkez, R. Sengupta, S. Glaser, A. Bayen., Mobile Transit Trip Planning with Real–Time Data. Presented at the Transportation Research Board , 20105. Qiujin Wu, Joanna Hartley, Using K-Shortest Paths Algorithms To Accommodate User Preferences In The Optimization Of Public Transport Travel, Applications of Advanced Technologies in Transportation Engineering, Proceedings of the Eighth International Con- ference, pp. 181-186, 20046. R. Jain, T. Raleigh, C. Graff, and M. Bereschinsky, Mobile internet access and qos guaran- tees using mobile ip and rsvp with location registers, IEEE Int. Conf. Commun., vol. 3, pp. 1690–1695, 1998.7. T. Erl, Ed., Service-oriented architecture (SOA): concepts, technology, and design. Pren- tice Hall, 2005.

×