Linked Open Data Implementation Proposal to iDA Singapore
Upcoming SlideShare
Loading in...5

Linked Open Data Implementation Proposal to iDA Singapore



Project: Effective government data utilization for knowledge discovery and value co-creation (e- government) ...

Project: Effective government data utilization for knowledge discovery and value co-creation (e- government)

•Ontology modeling, interlinked vocabulary derivation, URI formulation, RDF (Resource Description Framework) creation and mapping from RDBMS and knowledge discovery from Singapore Land Authority and Department of Statistics open data by applying the derived linked data migration framework for iDA.

•Currently working on the use case automating the linking of “vacant sites for sale” data with “demographics and consumer trend specific to locality” data for helping entrepreneurs and business executives with more insights on the purchase deal.



Total Views
Views on SlideShare
Embed Views



1 Embed 5 5


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Linked Open Data Implementation Proposal to iDA Singapore Linked Open Data Implementation Proposal to iDA Singapore Document Transcript

  • NANYANG TECHNOLOGICAL UNIVERSITY Wee Kim Wee School of Communication & InformationK6299 – Critical Inquiry in Knowledge ManagementProposal for Designing a Linked Data Migrational Framework for effectiveutilization of Singapore Government Data Sets Under the guidance of Dr. Khoo Soo Guan, Christopher (Assoc Prof) Submitted by SESAGIRI RAAMKUMAR ARAVIND (G1101761F) THANGAVELU MUTHU KUMAAR (G1101765E) KALEESWARAN SUDARSAN (G1001065F) With input from Page 1 of 11
  • Introduction“„Where is the wisdom we have lost in knowledge?Where is the knowledge we have lost in information?”As a thought provoking quote often runs through the minds of information science and computerresearchers, this project with iDA (Infocomm Development Authority, Singapore) tries to unveil thehidden knowledge with effective utilization of open government data and information.In present day‟s scenario, the multiple folds of information in various structured and unstructured formson internet pose a challenge of intelligent organization at source level (data) and discover domain specificknowledge to build useful applications. This can be accomplished by standardizing data in a commonformat and adding more semantics. The evolution of such a web technology can be visualized as follows: Fig 1: Evolution of Web Technologies“….the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machinestalking to machines. The „intelligent agents‟ people have touted for ages will finally materialize”(Berners-Lee). His concept of Semantic Web serves as an integral boundary prescribing the standards forcontent linking and publishing on the web. The new representation provides more scope for interlinkingacross domains, creating avenues for multi-point data usage and knowledge discovery with softwareapplications built over it. Page 2 of 11
  • The most interesting large scale application of Semantic web taken for exploration is the eGovernment(eGov) initiatives of US, UK and many other nations to publish their Open Governmental Data pertainingto governance and public affairs for transparency and value co-creation to empower people withappropriate knowledge. The recent Open Government Partnership1 mandates nations to publish theirOpen Governmental Data in linked data format. Many nations have started to publish their data in theform of linked data, the latest being Brazil data portal The start of the Linked datamovement spurred the release of new data sets highlighted by the Linked Open Data cloud3 maintained byComprehensive Knowledge Archive Network4 (CKAN) registry.US and UK governments have realizedthe benefits by releasing selective data sets in the linked data format in the portals data.gov5 respectively. Well-defined relationships between these datasets and ready-made applicationsguide public‟s daily activities related to transport, business and other needs. Some of the existingapplications are Numberhood7, FixMyTransport8, BIS Research Funding Explorer9, SemaPlorer10 and“Linking Wildland Fire and Government Budget” mashup11.The current Open Governmental Data scenario in Singapore doesn‟t make use of Linked Data standards.This proposal aims at suggesting a migrational framework from the existing system of data publishing. Astudy is being done on the current ecosystem in Singapore as a starting point. Infocomm Development 12Authority of Singapore maintains the portal that handles data collated from differentgovernment agencies (Chee Hean, 2011). The data portal aims to meet Singapore public‟s data needs andalso to establish a co-creative environment. The data is provided in different structured and unstructuredformats such as txt, excel, pdf, xml, webpages, maps and also in the form of agency specific ApplicationProgramming Interfaces (APIs) and web services. There are multiple endpoints for data consumption.Prominent examples include, OneMap API14, Singapore Statistics15,mytransport.sg16 and1 Open Government Partnership Brazil Data Portal data.gov6 http://www.Numberhood.net8 http://www.onemap.sg15 Page 3 of 11
  • Integrated Land Information Services17. There is some level of redundancy in data spanning across thedifferent sources in the current OGD ecosystem with limited interlinking and re-use capabilities. Thevocabularies used by the agencies are specific to their own with limited standardisation of commonlyused terms. The process of building a mash-up application leveraging data across agencies is complex.This study has indicated the scope for the application of linked data as it requires standardised datarepresentation at source level and common interface at publication level with the data sets linked byinterconnected vocabularies. The implementation at high level can be represented with the fig 2. Fig 2: Linked Data implementation over current DGS (DATA.GOV.SG) EcosystemObjectives of the StudyThe current study aims to build a linked data migrational framework that could be used by iDA andSingapore Government agencies to publish their data sets in the form of linked data. A multi-stepmethodology would be devised with clearly defined activities and deliverables at each step based on thecurrent ecosystem of and other Open Government Data publishing portals in Singapore. Apilot study has been done with Geographical and Statistical data for describing each step in theframework.The framework build process is based on the metadata and specifications provided by iDA andgovernment agencies. The current study focuses on linking the internal data sets. Additionally, it aims toprovide recommendations on a few use-cases that leverage the utility of external linked data. The working16 http://mytransport.sg17 Page 4 of 11
  • of the framework will be validated with Geographical and Statistical data provided by UrbanRedevelopment Authority and Department Of Statistics.Other objectives of the study are as follows:- 1) Explore case studies in other countries pertaining to implementation of Linked Open Government data, particularly RDF conversion and building applications. 2) Prepare an inventory of tools by assessing different linked data tools, technical frameworks and processes for building ontology, converting into RDF and publishing data. 3) Develop recommendations for linked data implementation based on the iDA‟s system for publishing open data and agency‟s nature of information feed. 4) Build an Ontology Network model (Haase, Rudolph, Wang et al, 2006) meant to unify vocabularies from different agency domains. 5) Build a Proof-Of-Concept application based on the devised methodology to validate its applicability. This objective is subject to availability of sufficient time and infrastructure.The core objective of this research is „migration‟ from existing data architecture to linked dataarchitecture. This includes building a framework with Data level Origin to Destination (OD) mapping,agency and iDA touch points, technology used, RDF Conversion, data publishing and access strategies.The framework helps to realize the proposed linked data architecture, elaborating on the migration stepswith appropriate conversion tools by applying the selected business use case on the open data. This isfollowed by a final evaluation study of existing and suggested architectures based on usability, resourcevalue and other perspectives of business outcome as well as iDA and other agencies infrastructure andfeasibility.Literature ReviewLinked Data has its roots from the concept of Semantic Web that aims to create a web of actionable datastandardized with a common format on the internet. In other words, Semantic web is a systematic way ofdiscovering knowledge by representing data with relationships as defined by ontology (Berners-Lee,Hendler & Lassila, 2001). To realize its implementation, Berners-Lee propagated simple principles forlinking data on the web- Identify resources and assign URI18 considering usage of data on a common18 Uniform Resource Identifiers (URIs) are short strings that identify resources in the web: documents, images,downloadable files, services, electronic mailboxes, and other resources. They make resources available under avariety of naming schemes and access methods such as HTTP, FTP, and Internet mail addressable in the samesimple way Page 5 of 11
  • user‟s perspective. Make it dereferenceable with HTTP. Publish data in a common format called RDF19(Resource Description Framework). Include RDF statements that link to external URIs so that they candiscover related things (Shadbolt, Hall & Berners-Lee, 2006). This initiative of publishing data in a„linked data format‟ became widely popular in libraries and government organizations that contain datafrom different domains, with a need to be classified in a common format for easy search, retrieval ofinformation and to enhance the productivity of data with mash ups, map based and real time applications.(Halb, Raimond & Hausenblas, 2007)RDF is the core standard in Linked Data. (Tennison, 2010). ). The concept of triple based data storage inRDF as Subject, Predicate and Object focuses on the relationship within and between data. Tosystematically identify the relationship, we consider an ontology20 modelling approach by associatingproperties and values. The overall architecture can be visualized with the fig 3. Fig 3: Basic Linked Data ArchitectureThis methodology of semantic web implementation with linked data standards has encouragedGovernment to transform their Open Governmental Data system. Projects such as LOD221 initiative aims19 RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even ifthe underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring allthe data consumers to be changed Ontologies or vocabularies define the concepts and relationships (also referred to as “terms”) used to describe andrepresent an area of concern. LOD2 Project Page 6 of 11
  • to build a Linked Open Data stack of products, frameworks and processes that aim to accelerate theimplementation of linked data across the globe.W3C has setup two committees22 to provide best practicesand recommendations for governments to publish their Open Governmental Data in standardised linkeddata format. (Bizer, Heath, Idehen & Berners-Lee, 2008), (Villazón, Vilches, Corcho & Gómez-Pérez,2011) and (Hyland & Wood, 2011) provide cookbooks and guidelines for understanding the general stepsand tools required in converting and publishing Open Governmental Data in Linked Data formatconversion to Linked Data format. Governments that are new entrants in adopting Linked Datapublication strategy need a tailored migrational framework specific to the local Open Governmental Dataecosystem. The customized framework could be used by the government steering committee to expeditethe migration to Linked Open Government Data format.MethodologyPlanning 1. The project team has been discussing with iDA, SLA and NIIT (the IT vendor supporting platform) staff prior to the proposal to understand the current architecture, work flow and to identify the components that could accommodate changes as a part of this study.Analysis 2. The mechanisms for data storage and retrieval, interfaces and system integration logic were studied with iDA system design document sections 3, 1 & 4 respectively (Chaudhari, 2011). We could infer the following (i) Data from the agencies are stored in the iDA system in the destination perspective (column entities X and Y dimensions) rather than source perspective (domain based classification). Meta data is provided only at the dataset level and not at the data level. (ii) iDA publishes the agency data in various formats like pdf, xml, csv, shp with the notion of easing application development. A common format for standardizing these existing (say RDF) has not been considered yet. (iii) iDA provides various interfaces for accessing data – One Map API, Singstat, My transport. A developer cannot utilize data from all these interfaces simultaneously as there isn‟t a common vocabulary (data classification language and Meta data). For eg. Category represented in iDA meta data and one map API are different. The time period, frequency of data across interfaces lack synchronization.22 and Page 7 of 11
  • Implementation 3. The scope of implementation is decided with these findings. iDA publishes data in different formats with different standards of publication. It also lacks relationships within and across datasets. A linked data approach has been suggested to unify the data publishing and access mechanism across interfaces on the top of the existing system. 4. Primary data for migration study was provided by iDA, SLA and URA. The data sets selected are indicated in the below table 1.1. The entire data sets would not be used for the study instead the latest year‟s data would be used for the study.Data set Agency Category Data typeResident Population by DGP Zone/ Department of Population and TextualSubzone and Age Group, Type of Statistics HouseholdDwelling, Ethnic Group CharacteristicsSites Sold by URA - Details Urban Redevelopment Housing and Urban Textual Authority (URA) Planning Table 1.1: Primary datasets used for the study 5. The secondary data for the research study would be extracted from Linked Open Government Data statistical and geospatial data sets from the portal for building the framework. The migrational framework will be customized based on the current architecture of because the steps will be devised based on the understanding of the different layers in DGS and still the framework will be generic enough to be applicable for other cases. 6. The framework formulation would be based on the alignment of government agency data publication standards and iDA‟s existing architecture with linked data implementation approaches put forth by Linked Open Government Data activists, researchers and practitioners. Each step in the framework will be sequential, comprising of sub steps covering intrinsic activities. For example, object modelling of the different data objects in the selected data sets is a step that precedes the RDF modelling and Ontology/Vocabulary building steps. The steps will be substantiated with sample implementations using the primary data. Suggestions from W3C LOGD steering groups will be taken into account for framework formulation. The tools that will be identified as part of the inventory will be used for the activities such as RDF creation, RDF storage and Ontology re-use/modelling in the framework. Page 8 of 11
  • Difficulties and Issues 1. Agencies do not provide raw data to iDA. Aggregated report data is split into X dimensions representing columns, Y dimensions representing rows and data points representing cells. These fields are provided in an XML file and sent to iDA on a periodic basis. There is no separate master data file. The hierarchy in master data dimensions is not explicitly set or provided. Therefore, a mechanism to identify the master data and the relationship between different levels in the master data dimensions needs to be devised. This mechanism may not serve as a generic transformation applicable for all agencies due to the implicit nature of data representation in the files. 2. The data conversion to RDF formats will not be done at the agency level instead it will be done on top of the data model in iDA data store. This leads to data duplication as the data is converted to RDF format for Linked data implementation. 3. There is no master data management system in place right now that standardises the dimension values across agencies. Standardisation is required to link common data in the data sets used in the study. This might be a complex task due to the different versions of master data values in a single data set and also across data sets. 4. The current OGD ecosystem of Singapore provides multiple end points to the users such as API, web services and files. A common endpoint in the form of Linked data API would mean building different wrappers over the end points. The below diagram from (Bizer , Heath, Idehen, & Berners-Lee, 2008) illustrates the different approaches of linked data implementation over existing systems.ScheduleThe schedule for the study is covered in the embedded Gantt chart. Gantt Chart-iDALinked Data Project.xlsx Page 9 of 11
  • Proposed Report OutlineThe proposed final report will be structured in the following format. 1. Abstract 2. Introduction a. Introduction to Linked Data and its relevance to Open Government Data and eGov b. Overview of SG OGD Ecosystem 3. Literature Review a. Government Linked Data Implementation Cookbooks, Guidelines and Recommendations i.URI formulation ii.RDF creation iii.Ontology Formulation iv.Publication and Exploitation 4. Migrational Framework a. Multi-step methodology i.Formulation and Description ii.Examples 5. Implementation Results and Observations a. POC details b. Description of issues faced in implementation 6. Limitations 7. Conclusion and RecommendationsFew new sections and sub-sections might be added in the final report.Dissemination of ResultsThe migrational framework will be published in the form of a report subject to review by NTU Supervisorfollowed by submission to iDA. The researchers plan to publish the report in the form of a conferencepaper in the later part of the year. Page 10 of 11
  • ReferencesBerners-Lee, T., Hendler, J., & Lassila, O. (2001). THE SEMANTIC WEB. Scientific American, 284(5), 34Berners-Lee, T. (2006). Linked Data. Available: Last accessed 11th Jan 2012Shadbolt, N., Hall, W., Berners-Lee, T. (2006). The Semantic Web Revisited . IEEE Computer Society. 21 (3), 96-101.Chee Hean, T. (2011). Keynote Address by Mr Teo Chee Hean, Deputy Prime Minister, Coordinating Minister for National Security and Minister for Home Affairs at the e-Gov Global Exchange 2011. Available: Last accessed 11th Jan 2012Bizer , C., Heath, T., Idehen, K., & Berners-Lee, T. (2008). Linked Data: Evolving the Web into a Global Data Space. (J. Hendler & F. Van Harmelen, Eds.)Proceeding of the 17th international conference on World Wide Web WWW 08 (Vol. 1, p. 1265). ACM Press.Sheridan, J., Tennison, J.,. (2010). Linking uk government data.WWW2010 workshop: Linked Data on the Web (LDOW2010), ACMVillazón-Terrazas, B., Vilches-Blázquez, L., Corcho, O., and Gómez-Pérez, A. (2011). Methodological guidelines for publishing government linked data linking government data. In Wood, D., editor, Linking Government Data, chapter 2, pages 27-49. Springer New York, New York, NY.Halb, W., Raimond,Y.,Hausenblas,M . (2008). Building Linked Data For Both Humans and Machines.. WWW 2008 workshop : Linked Data On the Web.Hyland, B. and Wood, D. (2011). The joy of data - a cookbook for publishing linked government data on the web linking government data. In Wood, D., editor, Linking Government Data, chapter 1, pages 3- 26. Springer New York, New York, NY.Haase, P., Rudolph, S., Wang, Y., Brockmans, S., Palma, R., Euzenat, J., & d‟ Aquin, M. (2006, November). Networked Ontology Model. Technical Report, NeOn project deliverable D1.1.1iDA Reference:Chaudhari, P., (2011, September). System Design Specification For, Infocomm Development Authority, Document ID: IDA/NIIT/DGS/SDD R1.0, Release 1.0, 28-Sept-11 Page 11 of 11