• Basics of Linked Data
• Purpose of this project
• Migrational Framework
• Eight Steps
• Conclusion
What is Linked Data?
• Linked Data is an alternative data representation
format.
• Actually, its just a repackaging of Semantic Web
elements
• It is different from relational database concepts
such as tables, rows, columns…
RDF
Subject-Predicate -Object
Jurong belongs to the West Zone
Linked Data Representation Format
http://data.gov.sg/resource/area/Jurong_West
http://data.gov.sg/ontology/property/has_zone
http://data.gov.sg/resource/zone/West
Subject
Predicate
Object
http://w3.org/2003/01/geo/wgs84_pos#/lat http://w3.org/2003/01/geo/wgs84_pos#/long
1°20'040.2"N
103°42'24.54"E
Traditional representation - Tables
Linked Data Components
• Data talks about itself. Humans and Machines
both understand data - How?
• URIs - lots of them
(http://data.gov.sg/PlanningArea/Kallang)
• RDF - Data model (Jurong Point is a location)
• Ontologies - Enforces a structure to data (Land
Hierarchy) – represented as RDFs
• SPARQL - Does the same job as SQL and a bit
more...
See the Difference?
Linked Data Cloud (Web of Data)
Linked Data becomes Linked Open Data(LOD) by
publishing it with “appropriate” license
Provides opportunity to link with other useful data
sets
Provides variety of information about the same
resource
Linked Data and Government Data - a
natural compatibility
• Why?
• Govt data is used by all
• Govt data needs to be transparent and easily
understandable
• Govt data is mainly factual – a direct fit!
• Standardized representation of Govt data across
the globe can facilitate comparison without
hassles.
• Best way to propagate a useful agenda to the
private arena...
Who have implemented Linked Data?
• UK, US, Brazil Governments
• Private Corporations? Yes
– BBC
– Nature
– World Bank
– New York Times
– FAO
– CIA Factbook
?Provide Links?
http://wheredoesmymoneygo.org/bubbletree-map.html#/~/grand-total--2010-
Sample Linked Data Usecase in UK
ABC Water Proj (R)
Agency Websites
Singstat
publicationsMINISTRIES
XLS
HTML
PDF
Accountant-General's Department
Accounting and Corporate Regulatory Authority
Agency For Science, Technology & Research
Attorney-General’s Chambers
Building & Construction Authority
Central Narcotics Bureau
Central Provident Fund Board
Civil Aviation Authority of Singapore
Department of Statistics
Economic Development Board
Energy Market Authority
Health Sciences Authority
Housing & Development Board
Immigration & Checkpoints Authority
Infocomm Development Authority of Singapore
Inland Revenue Authority of Singapore
Institute of Technical Education
Intellectual Property Office of Singapore
JTC Corporation
Judiciary, Subordinate Courts
Judiciary, Supreme Court
Land Transport Authority
Majlis Ugama Islam Singapura
Maritime & Port Authority of Singapore
Monetary Authority of Singapore
Nanyang Polytechnic
National Environment Agency
National Heritage Board
National Library Board
National Parks Board
Ngee Ann Polytechnic
People's Association
Public Service Division
Public Transport Council
Public Utilities Board
Republic Polytechnic
Sentosa Development Corporation
Singapore Civil Defence Force
Singapore Customs
Singapore Land Authority
Singapore Police Force
Singapore Polytechnic
Singapore Sports Council
Singapore Workforce Development Agency
Spring Singapore
Temasek Polytechnic
Urban Redevelopment Authority
Ministry of Community Development, Youth & Sports
Ministry of Education
Ministry of Foreign Affairs
Ministry of Health
Ministry of Law –Community Mediation Unit
Ministry of Manpower
Ministry of Transport
Media Development Authority
BFABuildings(C)
GreenBuilding(E)
C- Community
Cul - Culture
E- Environment
Emp- Employment
Edu - Education
H- Health
F- Family
R- Recreation
S- Sports
Breast Screen (H)
Cervical Screen (H)
Healthier Dining (H)
Quit Centers (H)
Infocomm Access (C)
Silver infocomm (C)
Wireless Hotspots (R)
Child care (F)
Disability (F)
Elder care (F)
Family (F)
Family Friendly Estab (F)
Student Care (F)
Comm Mediation Center (C)
After Death Facilities (E)
Funeral Palours (E)
Dengue Cluster (H)
Hawker Center (E)
NEA Offices (E)
Recycling Bins (E)
Waste Disposal Site (E)
Waste Treatment (E)
Heritage sites(Cul)
Monuments(Cul)
Museums(Cul)
Libraries (Cul)
Streets and Places(Cul)
CD Councils (C)
Community Clubs (C)
Constituency offices (C)
Other facilities (C)
Other Pan networks (C)
PA head quarters (C)
Residents Committee(C)
Water Venture (C)
National Parks (R)
Skyrise greenery (E)
Sports clubs (S)
CET Centers(Emp)
WDA Service points(Emp)
Kindergartens (Edu)
Get TokenAddress
SearchAgency Data
SearchStatic Map
Get Layer InfoMashup
Get Related Data
Get Directions
Public Transportation
Reverse Geocode
Map-related APIs from various agencies
Traffic-related APIs from Land Transport Authority
Tourism-related APIs from the Singapore Tourism Board
Environment-related APIs from the National Environment Agency
Library-related data feeds & web services from National Library Board
DGS Eco System
SG DATA
TEXTUAL
SPATIAL
API
THEMES OPERATIONSCATEGORIES
UNSTRUCTURED DATA
STRUCTURED DATA
STRUCTURED DATA
STATUTORY
BOARDS
SG Government Data Eco System
Different levels of
granularity
Multiple End points
Meta data only at
data set levels
Data already
cooked !!
Hierarchies not
captured
Vocabulary Conflict
in spatial and
textual data
Few design issues spotted through the Linked Data lens
Benefits of using Linked Data for iDA
Singapore
• An opportunity to standardize common terms
across agencies
• Re-use of resources (through URIs) ex:
http://data.gov.sg/zone/central
• Centralized control?
• Single endpoint for all govt data - Linked Data
API
• Very convenient for developers to join data
from different agencies. eg: combining data
from SLA and URA
URA Sites for Sales dataset(Urban Planning)
DOS Population and Household Characteristics dataset (Population Demographics)
Age Pyramid of Resident Population
Old Age Support Ratio
Datasets Used for Framework Evaluation
Framework Formulation Process
• Work was split into three phases – Analysis, Design
and Evaluation
• Based on study of Linked Data Migration Research
Papers and cookbooks published by the World Wide
Web Consortium(W3C)
• Analysis of Linked Data implementations in UK ,US
and Brazil
• Evaluation of Linked Data tools with Singapore data
sets for recommendation in each step of the
framework
• Contemplating on probable issues that could be
faced during implementation
Proposed Linked Data Migrational
Framework for DGS
Specification Identfication Analysis
Object Modeling
Ontology Modeling
URI Naming
RDF Creation
External Linking
Datasets Publication
Discovery & Exploitation
Re-use Create
S2R D2R A2R

Govt Agencies and IDA
Govt Agencies Domain
Matter Experts
Ontology Modelers
IDA and Web Architects
Developers
Developers and Domain
Experts
Developers
Web Architects
Objectives
Specifications
Project Duration
Dataset Prioritization
Dataset License Setting
Impln Mode Selection
Roadmap
Architecture
Overview
Relational Model
Dataset Overview
Drawing Objects in
Whiteboard
Conceptual View
Conceptual View
Public Vocabularies
Re-use of Existing
Vocabularies
Creation of New
Vocabularies
OWL, RDFS, RDF
Vocabulary files
Resources
Class and Properties
Visualization of URI
mining process
URI Administration
URI Lifecycle
ER Model
Spreadsheets,
DBMS, API
Conversion to RDF triples
using Mapping files
RDF Triples
Government and
external data sets
Linking based on
Similarity Algorithms
Outbound Links
RDF Triples
Ontologies
SPARQL, API
Data Insertion
VOID Modeling
Data Retrieval
API to SPARQL conversion
VOID Triples
JSON data
Actual Data
Existing Apps
Gamification
Crowdsourcing
Catalog Registration
External Reference
New Apps
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
INPUT
PROCESS
OUTPUT
Resource
Allocation
10
Resource
Allocation
15
Resource
Allocation
15
Resource
Allocation
5
Resource
Allocation
20
Resource
Allocation
5
Resource
Allocation
15
Resource
Allocation
15
1
2
3
4
5
6
7
8
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Specification Home
1) Design the High Level Architecture
2) Set the “Migration Potential” for data sets
3) Decide the “Perspective” – Vertical vs Horizontal -> Agency vs Application (We recommend
Agency perspective)
Data set
Data set
URL Data Type Agency
Utility
Level
Interlinking
Possibility
Potential
Level
Annual Vehicle Population by Type of
Fuel Use URL
Textual
(PDF) LTA H L M
Administrative Data - Employment
Statistic URL
Textual
(HTML) MOM H M H
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Specification Home
4) Setting up of License for data sets
5) Implementation Method – “Linked Data + RDF”
Other options - 1) Just URIs 2) URI for data sets only
Analysis of Data sets
 Study of System specifications, design & integration documents (including database) of
the selected data sets
• Understand Metadata, Schema design and Entity Relationship (ER) models
Data Set
Data Set
URL Data Type Agency License
Access
Rights Data Access Modes
Annual Vehicle Population by Type of Fuel
Use URL Textual (PDF) LTA PDDL R
API, SPARQL, RDF
Dump
Administrative Data - Employment
Statistic URL
Textual
(HTML) MOM PDDL R
API, SPARQL, RDF
Dump
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Object Modeling
This is modeling without usage context.
*Requires normalization of database model in 3NF form
Issues
Possibility of applying high abstraction and
high granularity to objects
Key Learning
Ease in identifying the use of common
objects across data sets
Facilitates brainstorming of relationships
between objects
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Ontology Modeling
Takes the conceptual diagram from Object Modeling as input.
Design Ontologies
1. Identify classes and subclasses
2. Identify hierarchy structure
3. Connect classes through relationship
4. Create rules for inference (optional)
5. Output OWL vocabulary files
Ontology modelling is carried out in two ways:- 1) Using and extending public
ontologies 2) Designing a local ontology from scratch
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Ontology Modeling
Date fields, location fields and fields related to
measurements in DGS have scope for
vocabulary re-use
Vocabulary for the identified data sets
(developed using Protege) with screenshots
List of vocabularies required for LOGD
implementation
List of tools used for ontology modeling
OUTPUT?
ALLOCATION PERCENTAGE?
PERSONNEL INVOLVED
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
URI Naming
ABOX TBOX
http://data.gov.sg/ontology/Ministry/ http://data.gov.sg/ministry/MOH
http://data.gov.sg/ontology/Agency/ http://data.gov.sg/agency/SLA
http://data.gov.sg/ontology/SiteLocation http://data.gov.sg/location/pioneer_road_north
http://data.gov.sg/ontology/Race http://data.gov.sg/race/chinese
Dataset ID URAstaticfile001
Dataset http://data.gov.sg/dataset/ URAstaticfile001/
Class http://data.gov.sg/terms/class/URAstaticfile001/sitesforsale
Property http://data.gov.sg/terms/property/URAstaticfile001/time
Row 1 http://data.gov.sg/dataset/URAstaticfile001/1
Row 1 - A generic column http://data.gov.sg/dataset/URAstaticfile001/1/columnName
Dataset URIs
Home
1) “URI Administration” Mode
Maintained centrally in the DGS platform (resultant URIs will start with http://data.gov.sg/)
-> RECOMMENDED
vs
Maintained by individual agencies (resultant URIs will start with http://ura.gov.sg or
http://sla.gov.sg).
vs
Maintained externally by third party platforms such as Kasabi (resultant URIs will start with
http://data.kasabi.org) – No longer valid as Kasabi service has been shut down
2) Setup of URI Taxonomy
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
RDF Creation Home
RDF triples are generated by converting data from source format with the necessary transformation
Type Nature Example of Singapore data sets Source format
S2R (Static Files) Static
URA Site for Sales, Singstat’s Population
Household Characteristics XLS, CSV, TXT files and other static files
D2R (RDBMS) Dynamic DGS tables RDBMS
A2R (APIs/Web
Services) Dynamc
OneMap API, myTransport API, NLB web
services
Application Programming Interface (API) and Web
Services(SOAP, REST)
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
RDF Creation Home
Evaluated 3 tools for each mode of conversion
Google Refine - S2R
RDF Views - D2R
RDF Sponger - A2R
Google Refine Demo for S2R!
 ER models from RDBMS are to be converted into corresponding
vocabularies/Ontologies for D2R process using STDTrip methodology
 For A2R, External Cartridges (mapping files) are to be created for mapping API
parameters to vocabularies. This can be done in RDF Sponger
“We feel that Linked Data is best suited for data from Static files and not for data that is
real-time and dynamic in nature unless conformity to structure can be trusted”
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
External Linking
External Linking is connecting with other data sets in the web of data
Data.gov.sg
WorldBank
CIA World
Factbook
DBpedia FAO Geonames
Supreme
Court
Flickr
<http://data.gov.sg/location/bugis> <owl:sameAs> <http://www.dbpedia.org/resource/Bugis>
<http://data.gov.sg/race/malay> <owl:sameAs> <http://www.dbpedia.org/resource/Malay_race>
Issues
•The outbound links made to data sets outside of IDA’s purview can be risky
•Dead links are a vivid possibility during the change of resource URIs or system
downtime
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Datasets Publication Home
 Triple Store or RDF Store is the data structure used to store Linked Data.
• We used Virtuoso Universal Server’s built-in triple store for evaluation
• It is visualized that the triple store will be centralized at iDA
 SPARQL (pronounced as SPARKLE) will be the main output terminal for Linked Data
• SPARQL can be used to SELECT, INSERT , DELETE, UPDATE data
• SPARQL is gateway to any operation on Linked Data. APIs and Applications are
built on top of it
Triple Store and SPARQL Demo!
We had some information about External Linked Data Hosting but we had to remove it
as the major provider Talis has closed its own hosting service Kasabi!
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Datasets Publication Home
Linked Data API is the common API endpoint that will be used by developers and public
users to access government data.
- This solves the problem of maintaining multiple end points!
ex: http://gov.tso.co.uk/transport/api/transport/doc/bus-stop-point.xml
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Discovery & Exploitation
Key Theme
1) Internal discovery within Singapore for local citizens – idea4apps (link)
2) External discovery for attracting usage of Singapore government data in
international economic & political research and global issues(water scarcity, Carbon
Footprint etc.)
• Entry in CKAN registry ->http://thedatahub.org/tag/registry
Home
Gamification? Promoted by LinkedGov.org
Original
data
provided
by URA
Possible because of
the re-use of the
common resource
URI Pasir Ris across
data sets
Similarly, location based
data from OneMap API is
retrieved for Pasir Ris
Interlinked Datasets Post-Migration
Other Interesting Use Cases
Definitely not Science Fiction!
Q & A Engine that works on top of government linked data. Inspired by www.trueknowledge.com
Sense-Making
Question: Which recent year had a growth rate close to 50% for majority of Singapore
based SME?
Step1: Spot the resources in this query
Dbpedia Spotlight does just that! – Semantic Information Extraction
Which recent year had a growth rate close to 50% for majority of Singapore based SME
Step2: Identify the relationship between the resources
SME is instance of the Organization class Organization class comes under Singapore country
Growth rate is a property of Sales class Year is a class by itself
Majority is subset of Group class
Step3: Use NLP technique – Syntactic Analysis (Stanford Parser) followed by Focus
Extraction for understanding the question
2010 is retuned as the result!
Step 4: Look for RDF triples that meet the criteria
Syntactic Parse tree is generated followed by Access Pattern
Key Challenges
• Dense data - lot of additional RDF triples will get created along with the
required RDF triples as a resource belongs to multiple ontologies
Demographics dataset stats:-
Rows:~300 Columns:16 in excel file
Resultant triples count in RDF/Triple store:13711
Reason: Majority of the generated triples are for machine understanding.
• URI administration could be an intense activity as dead URIs can cause
damage to applications eg: what will happen if
http://data.gov.sg/area/jurong doesn't work?
• Changes to structure of static files and RDMBS tables require changes in
RDF mapping files - might be a long process if not properly regulated
• Not readily suitable for real-time data
Summary
Four in-person
discussion sessions
with IDA, NIIT and SLA
Analysis of Five
data.gov.sg system
specifications
Evaluation of Four
existing Migration
Frameworks
Prototyping with Six
core Linked Data Tools
Dataset Publication
Virtuoso Universal Server Linked Data API
External Linking
SILK LIMES
RDF Creation
Google Refine RDF Views RDF Sponger
URI Naming
Pubby
Ontology Modeling
Protégé
Object Modeling
Concept Map
Summary
• Applicability of the framework to Singapore
Government Data
• Issues identified in existing Data Eco System
• Recommended tools and best practices for each step
• Launchpad for SG Linked Data implementation
Final Thoughts…
• ROI is not a key metric for Linked Data implementation
• Benefits of moving to Linked Data is intangible and may
not be immediately realizable
• Volume of work is huge compared to traditional
systems
We are thankful to Prof Chris Khoo
for his supervision and iDA staff Soy Boon Lim
for providing overview of data.gov.sg and also
for furnishing DGS design documents...
Why are we doing this project?
To prescribe a Linked Data migrational framework for
data.gov.sg (DGS) data sets
First hand view of the required migration activities
Issues anticipated at each step
Evaluation & Recommendation on Linked Data tools
To help IDA in realizing - What more can be done with existing
data ? A closer look at Government counterparts – UK and US !
In totality, iDA can use this report as a guide for the various
aspects related to Linked Data implementation
Basic Thought Process of Linked Data
Publishing
• Select data sets that appear apt for Linked
Data
• Identify the data sources for the data sets
• Find out what type of transformations are
needed
• Publish it!
iDA Singapore launched Data.gov.sg portal and mGov@SG public services during June 2011
Data.gov.sg provides 5000+ public data sets from 50 government agencies
Purpose: Building applications, research and for creating applications using the data
Data.Gov.Sg

Semantic web design for www.data.gov.sg - Presentation

  • 2.
    • Basics ofLinked Data • Purpose of this project • Migrational Framework • Eight Steps • Conclusion
  • 3.
    What is LinkedData? • Linked Data is an alternative data representation format. • Actually, its just a repackaging of Semantic Web elements • It is different from relational database concepts such as tables, rows, columns…
  • 4.
    RDF Subject-Predicate -Object Jurong belongsto the West Zone Linked Data Representation Format http://data.gov.sg/resource/area/Jurong_West http://data.gov.sg/ontology/property/has_zone http://data.gov.sg/resource/zone/West Subject Predicate Object http://w3.org/2003/01/geo/wgs84_pos#/lat http://w3.org/2003/01/geo/wgs84_pos#/long 1°20'040.2"N 103°42'24.54"E Traditional representation - Tables
  • 5.
    Linked Data Components •Data talks about itself. Humans and Machines both understand data - How? • URIs - lots of them (http://data.gov.sg/PlanningArea/Kallang) • RDF - Data model (Jurong Point is a location) • Ontologies - Enforces a structure to data (Land Hierarchy) – represented as RDFs • SPARQL - Does the same job as SQL and a bit more...
  • 6.
  • 7.
    Linked Data Cloud(Web of Data) Linked Data becomes Linked Open Data(LOD) by publishing it with “appropriate” license Provides opportunity to link with other useful data sets Provides variety of information about the same resource
  • 8.
    Linked Data andGovernment Data - a natural compatibility • Why? • Govt data is used by all • Govt data needs to be transparent and easily understandable • Govt data is mainly factual – a direct fit! • Standardized representation of Govt data across the globe can facilitate comparison without hassles. • Best way to propagate a useful agenda to the private arena...
  • 9.
    Who have implementedLinked Data? • UK, US, Brazil Governments • Private Corporations? Yes – BBC – Nature – World Bank – New York Times – FAO – CIA Factbook ?Provide Links?
  • 10.
  • 11.
    ABC Water Proj(R) Agency Websites Singstat publicationsMINISTRIES XLS HTML PDF Accountant-General's Department Accounting and Corporate Regulatory Authority Agency For Science, Technology & Research Attorney-General’s Chambers Building & Construction Authority Central Narcotics Bureau Central Provident Fund Board Civil Aviation Authority of Singapore Department of Statistics Economic Development Board Energy Market Authority Health Sciences Authority Housing & Development Board Immigration & Checkpoints Authority Infocomm Development Authority of Singapore Inland Revenue Authority of Singapore Institute of Technical Education Intellectual Property Office of Singapore JTC Corporation Judiciary, Subordinate Courts Judiciary, Supreme Court Land Transport Authority Majlis Ugama Islam Singapura Maritime & Port Authority of Singapore Monetary Authority of Singapore Nanyang Polytechnic National Environment Agency National Heritage Board National Library Board National Parks Board Ngee Ann Polytechnic People's Association Public Service Division Public Transport Council Public Utilities Board Republic Polytechnic Sentosa Development Corporation Singapore Civil Defence Force Singapore Customs Singapore Land Authority Singapore Police Force Singapore Polytechnic Singapore Sports Council Singapore Workforce Development Agency Spring Singapore Temasek Polytechnic Urban Redevelopment Authority Ministry of Community Development, Youth & Sports Ministry of Education Ministry of Foreign Affairs Ministry of Health Ministry of Law –Community Mediation Unit Ministry of Manpower Ministry of Transport Media Development Authority BFABuildings(C) GreenBuilding(E) C- Community Cul - Culture E- Environment Emp- Employment Edu - Education H- Health F- Family R- Recreation S- Sports Breast Screen (H) Cervical Screen (H) Healthier Dining (H) Quit Centers (H) Infocomm Access (C) Silver infocomm (C) Wireless Hotspots (R) Child care (F) Disability (F) Elder care (F) Family (F) Family Friendly Estab (F) Student Care (F) Comm Mediation Center (C) After Death Facilities (E) Funeral Palours (E) Dengue Cluster (H) Hawker Center (E) NEA Offices (E) Recycling Bins (E) Waste Disposal Site (E) Waste Treatment (E) Heritage sites(Cul) Monuments(Cul) Museums(Cul) Libraries (Cul) Streets and Places(Cul) CD Councils (C) Community Clubs (C) Constituency offices (C) Other facilities (C) Other Pan networks (C) PA head quarters (C) Residents Committee(C) Water Venture (C) National Parks (R) Skyrise greenery (E) Sports clubs (S) CET Centers(Emp) WDA Service points(Emp) Kindergartens (Edu) Get TokenAddress SearchAgency Data SearchStatic Map Get Layer InfoMashup Get Related Data Get Directions Public Transportation Reverse Geocode Map-related APIs from various agencies Traffic-related APIs from Land Transport Authority Tourism-related APIs from the Singapore Tourism Board Environment-related APIs from the National Environment Agency Library-related data feeds & web services from National Library Board DGS Eco System SG DATA TEXTUAL SPATIAL API THEMES OPERATIONSCATEGORIES UNSTRUCTURED DATA STRUCTURED DATA STRUCTURED DATA STATUTORY BOARDS SG Government Data Eco System
  • 12.
    Different levels of granularity MultipleEnd points Meta data only at data set levels Data already cooked !! Hierarchies not captured Vocabulary Conflict in spatial and textual data Few design issues spotted through the Linked Data lens
  • 13.
    Benefits of usingLinked Data for iDA Singapore • An opportunity to standardize common terms across agencies • Re-use of resources (through URIs) ex: http://data.gov.sg/zone/central • Centralized control? • Single endpoint for all govt data - Linked Data API • Very convenient for developers to join data from different agencies. eg: combining data from SLA and URA
  • 14.
    URA Sites forSales dataset(Urban Planning) DOS Population and Household Characteristics dataset (Population Demographics) Age Pyramid of Resident Population Old Age Support Ratio Datasets Used for Framework Evaluation
  • 15.
    Framework Formulation Process •Work was split into three phases – Analysis, Design and Evaluation • Based on study of Linked Data Migration Research Papers and cookbooks published by the World Wide Web Consortium(W3C) • Analysis of Linked Data implementations in UK ,US and Brazil • Evaluation of Linked Data tools with Singapore data sets for recommendation in each step of the framework • Contemplating on probable issues that could be faced during implementation
  • 16.
    Proposed Linked DataMigrational Framework for DGS Specification Identfication Analysis Object Modeling Ontology Modeling URI Naming RDF Creation External Linking Datasets Publication Discovery & Exploitation Re-use Create S2R D2R A2R Govt Agencies and IDA Govt Agencies Domain Matter Experts Ontology Modelers IDA and Web Architects Developers Developers and Domain Experts Developers Web Architects Objectives Specifications Project Duration Dataset Prioritization Dataset License Setting Impln Mode Selection Roadmap Architecture Overview Relational Model Dataset Overview Drawing Objects in Whiteboard Conceptual View Conceptual View Public Vocabularies Re-use of Existing Vocabularies Creation of New Vocabularies OWL, RDFS, RDF Vocabulary files Resources Class and Properties Visualization of URI mining process URI Administration URI Lifecycle ER Model Spreadsheets, DBMS, API Conversion to RDF triples using Mapping files RDF Triples Government and external data sets Linking based on Similarity Algorithms Outbound Links RDF Triples Ontologies SPARQL, API Data Insertion VOID Modeling Data Retrieval API to SPARQL conversion VOID Triples JSON data Actual Data Existing Apps Gamification Crowdsourcing Catalog Registration External Reference New Apps INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT Resource Allocation 10 Resource Allocation 15 Resource Allocation 15 Resource Allocation 5 Resource Allocation 20 Resource Allocation 5 Resource Allocation 15 Resource Allocation 15 1 2 3 4 5 6 7 8
  • 17.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Specification Home 1) Design the High Level Architecture 2) Set the “Migration Potential” for data sets 3) Decide the “Perspective” – Vertical vs Horizontal -> Agency vs Application (We recommend Agency perspective) Data set Data set URL Data Type Agency Utility Level Interlinking Possibility Potential Level Annual Vehicle Population by Type of Fuel Use URL Textual (PDF) LTA H L M Administrative Data - Employment Statistic URL Textual (HTML) MOM H M H
  • 18.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Specification Home 4) Setting up of License for data sets 5) Implementation Method – “Linked Data + RDF” Other options - 1) Just URIs 2) URI for data sets only Analysis of Data sets  Study of System specifications, design & integration documents (including database) of the selected data sets • Understand Metadata, Schema design and Entity Relationship (ER) models Data Set Data Set URL Data Type Agency License Access Rights Data Access Modes Annual Vehicle Population by Type of Fuel Use URL Textual (PDF) LTA PDDL R API, SPARQL, RDF Dump Administrative Data - Employment Statistic URL Textual (HTML) MOM PDDL R API, SPARQL, RDF Dump
  • 19.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Object Modeling This is modeling without usage context. *Requires normalization of database model in 3NF form Issues Possibility of applying high abstraction and high granularity to objects Key Learning Ease in identifying the use of common objects across data sets Facilitates brainstorming of relationships between objects Home
  • 20.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Ontology Modeling Takes the conceptual diagram from Object Modeling as input. Design Ontologies 1. Identify classes and subclasses 2. Identify hierarchy structure 3. Connect classes through relationship 4. Create rules for inference (optional) 5. Output OWL vocabulary files Ontology modelling is carried out in two ways:- 1) Using and extending public ontologies 2) Designing a local ontology from scratch Home
  • 21.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Ontology Modeling Date fields, location fields and fields related to measurements in DGS have scope for vocabulary re-use Vocabulary for the identified data sets (developed using Protege) with screenshots List of vocabularies required for LOGD implementation List of tools used for ontology modeling OUTPUT? ALLOCATION PERCENTAGE? PERSONNEL INVOLVED Home
  • 22.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 URI Naming ABOX TBOX http://data.gov.sg/ontology/Ministry/ http://data.gov.sg/ministry/MOH http://data.gov.sg/ontology/Agency/ http://data.gov.sg/agency/SLA http://data.gov.sg/ontology/SiteLocation http://data.gov.sg/location/pioneer_road_north http://data.gov.sg/ontology/Race http://data.gov.sg/race/chinese Dataset ID URAstaticfile001 Dataset http://data.gov.sg/dataset/ URAstaticfile001/ Class http://data.gov.sg/terms/class/URAstaticfile001/sitesforsale Property http://data.gov.sg/terms/property/URAstaticfile001/time Row 1 http://data.gov.sg/dataset/URAstaticfile001/1 Row 1 - A generic column http://data.gov.sg/dataset/URAstaticfile001/1/columnName Dataset URIs Home 1) “URI Administration” Mode Maintained centrally in the DGS platform (resultant URIs will start with http://data.gov.sg/) -> RECOMMENDED vs Maintained by individual agencies (resultant URIs will start with http://ura.gov.sg or http://sla.gov.sg). vs Maintained externally by third party platforms such as Kasabi (resultant URIs will start with http://data.kasabi.org) – No longer valid as Kasabi service has been shut down 2) Setup of URI Taxonomy
  • 23.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 RDF Creation Home RDF triples are generated by converting data from source format with the necessary transformation Type Nature Example of Singapore data sets Source format S2R (Static Files) Static URA Site for Sales, Singstat’s Population Household Characteristics XLS, CSV, TXT files and other static files D2R (RDBMS) Dynamic DGS tables RDBMS A2R (APIs/Web Services) Dynamc OneMap API, myTransport API, NLB web services Application Programming Interface (API) and Web Services(SOAP, REST)
  • 24.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 RDF Creation Home Evaluated 3 tools for each mode of conversion Google Refine - S2R RDF Views - D2R RDF Sponger - A2R Google Refine Demo for S2R!  ER models from RDBMS are to be converted into corresponding vocabularies/Ontologies for D2R process using STDTrip methodology  For A2R, External Cartridges (mapping files) are to be created for mapping API parameters to vocabularies. This can be done in RDF Sponger “We feel that Linked Data is best suited for data from Static files and not for data that is real-time and dynamic in nature unless conformity to structure can be trusted”
  • 25.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 External Linking External Linking is connecting with other data sets in the web of data Data.gov.sg WorldBank CIA World Factbook DBpedia FAO Geonames Supreme Court Flickr <http://data.gov.sg/location/bugis> <owl:sameAs> <http://www.dbpedia.org/resource/Bugis> <http://data.gov.sg/race/malay> <owl:sameAs> <http://www.dbpedia.org/resource/Malay_race> Issues •The outbound links made to data sets outside of IDA’s purview can be risky •Dead links are a vivid possibility during the change of resource URIs or system downtime Home
  • 26.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Datasets Publication Home  Triple Store or RDF Store is the data structure used to store Linked Data. • We used Virtuoso Universal Server’s built-in triple store for evaluation • It is visualized that the triple store will be centralized at iDA  SPARQL (pronounced as SPARKLE) will be the main output terminal for Linked Data • SPARQL can be used to SELECT, INSERT , DELETE, UPDATE data • SPARQL is gateway to any operation on Linked Data. APIs and Applications are built on top of it Triple Store and SPARQL Demo! We had some information about External Linked Data Hosting but we had to remove it as the major provider Talis has closed its own hosting service Kasabi!
  • 27.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Datasets Publication Home Linked Data API is the common API endpoint that will be used by developers and public users to access government data. - This solves the problem of maintaining multiple end points! ex: http://gov.tso.co.uk/transport/api/transport/doc/bus-stop-point.xml
  • 28.
    Step 1 Step2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Discovery & Exploitation Key Theme 1) Internal discovery within Singapore for local citizens – idea4apps (link) 2) External discovery for attracting usage of Singapore government data in international economic & political research and global issues(water scarcity, Carbon Footprint etc.) • Entry in CKAN registry ->http://thedatahub.org/tag/registry Home Gamification? Promoted by LinkedGov.org
  • 29.
    Original data provided by URA Possible becauseof the re-use of the common resource URI Pasir Ris across data sets Similarly, location based data from OneMap API is retrieved for Pasir Ris Interlinked Datasets Post-Migration
  • 30.
    Other Interesting UseCases Definitely not Science Fiction! Q & A Engine that works on top of government linked data. Inspired by www.trueknowledge.com
  • 31.
    Sense-Making Question: Which recentyear had a growth rate close to 50% for majority of Singapore based SME? Step1: Spot the resources in this query Dbpedia Spotlight does just that! – Semantic Information Extraction Which recent year had a growth rate close to 50% for majority of Singapore based SME Step2: Identify the relationship between the resources SME is instance of the Organization class Organization class comes under Singapore country Growth rate is a property of Sales class Year is a class by itself Majority is subset of Group class Step3: Use NLP technique – Syntactic Analysis (Stanford Parser) followed by Focus Extraction for understanding the question 2010 is retuned as the result! Step 4: Look for RDF triples that meet the criteria Syntactic Parse tree is generated followed by Access Pattern
  • 32.
    Key Challenges • Densedata - lot of additional RDF triples will get created along with the required RDF triples as a resource belongs to multiple ontologies Demographics dataset stats:- Rows:~300 Columns:16 in excel file Resultant triples count in RDF/Triple store:13711 Reason: Majority of the generated triples are for machine understanding. • URI administration could be an intense activity as dead URIs can cause damage to applications eg: what will happen if http://data.gov.sg/area/jurong doesn't work? • Changes to structure of static files and RDMBS tables require changes in RDF mapping files - might be a long process if not properly regulated • Not readily suitable for real-time data
  • 33.
    Summary Four in-person discussion sessions withIDA, NIIT and SLA Analysis of Five data.gov.sg system specifications Evaluation of Four existing Migration Frameworks Prototyping with Six core Linked Data Tools Dataset Publication Virtuoso Universal Server Linked Data API External Linking SILK LIMES RDF Creation Google Refine RDF Views RDF Sponger URI Naming Pubby Ontology Modeling Protégé Object Modeling Concept Map
  • 34.
    Summary • Applicability ofthe framework to Singapore Government Data • Issues identified in existing Data Eco System • Recommended tools and best practices for each step • Launchpad for SG Linked Data implementation Final Thoughts… • ROI is not a key metric for Linked Data implementation • Benefits of moving to Linked Data is intangible and may not be immediately realizable • Volume of work is huge compared to traditional systems
  • 35.
    We are thankfulto Prof Chris Khoo for his supervision and iDA staff Soy Boon Lim for providing overview of data.gov.sg and also for furnishing DGS design documents...
  • 36.
    Why are wedoing this project? To prescribe a Linked Data migrational framework for data.gov.sg (DGS) data sets First hand view of the required migration activities Issues anticipated at each step Evaluation & Recommendation on Linked Data tools To help IDA in realizing - What more can be done with existing data ? A closer look at Government counterparts – UK and US ! In totality, iDA can use this report as a guide for the various aspects related to Linked Data implementation
  • 37.
    Basic Thought Processof Linked Data Publishing • Select data sets that appear apt for Linked Data • Identify the data sources for the data sets • Find out what type of transformations are needed • Publish it!
  • 38.
    iDA Singapore launchedData.gov.sg portal and mGov@SG public services during June 2011 Data.gov.sg provides 5000+ public data sets from 50 government agencies Purpose: Building applications, research and for creating applications using the data Data.Gov.Sg

Editor's Notes

  • #26 Dbpedia – Places and Events CIA and World bank- Economic Analysis Flickr – places FAO – export and import commodities Supreme Court – Facts