Linked Open Government Data
(LOGD): Ontology Usage
Experimental Results
Second Presentation
Nooshin Allahyari
1
Outlines
• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
2
Categorizing data provider
• US Government Agencies
• Dividing agencies based on US Federal Government
Reference Model
• Each agency is in charge of publishing related datasets
• Data.gov catalog also provide topic related categorization
Nooshin Allahyari
3
Outlines
• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
4
Dataset Collection
• All 25 Datasets collected from Data.gov
• Datasets are in RDF format
• Difficulties running huge datasets
• Using different tools As endpoint
▫ Virtuoso commercial version as SPARQL endpoint
 Easy to Install
 GUI
 Lots of visual tools
 SQL,SQL tools and connection tools.
• Increasing dataset number for reliability
Nooshin Allahyari
5
Outlines
• Categorizing data provider
• Dataset collection
• Dataset Composition Characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
6
NameSpace
Nooshin Allahyari
7
• Same Namespace usage for all datasets
Ontology Vocabulary Usage
• FEA Reference Model Ontology(RMO)
• Vocabulary Related to Government Context
▫ General Vocabulary
 Country
 State
 City
▫ Government programs, Services:
 Health Program
 Cultural Program
Nooshin Allahyari
8
Annotation Property
• Useful to provide additional information about
datasets. All datasets have:
▫ rdfs:lable
▫ Rdfs:comments
▫ No language tag or metadata
 Some datsets from Italy dataset catalog in TWC LOGD
contain Language Tag .
Nooshin Allahyari
9
Outlines
• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
10
Concept Coverage
• Same Concept in all datasets
• Metadata for Data.gov wiki and TWC LOGD
Nooshin Allahyari
11
Prefix Concept
foaf Homepage
rdfs isDefinedBy
dcterms Source
dgtwc uses-property
dgtwc number-of-triples
dgtwc number-of-properties
dgtwc number-of-enteries
Concept Coverage
• General Concept Related Government
• Low Coverage of concept
• Multi-name concepts
Nooshin Allahyari
12
Concept Coverage(percentage)
State 48%
City 32%
State-Abbreviation 16%
Region 12%
Zip 12%
Country 8%
Country origin code 8%
Area code 8%
Outlines
• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
13
Case-Based Analysis
• Three dataset from same agency in same
category
▫ Department of Veterans Affairs
 dataset1213
 dataset1288
 Dataset1290
• Result of each dataset queries shows all three of
them have similar concepts
 State
 City
 VISN
 Station
Nooshin Allahyari
14
Case-Based Analysis-1288
• The query lists all station with their specific code(VISN)
in each city and determine the state in which the city is
located in:
Nooshin Allahyari
15
SELECT DISTINCT ?city ?station ?visn ?st
WHERE
{
?s <http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#city> ?city
OPTIONAL{ ?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#station> ?station}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#visn> ?visn}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#st> ?st}
}
State VISN Station City
"NJ" "3" "561" "East Orange"
"NY" "3" "620" "Montrose"
"NY" "3" "630"
"New York
Harbor"
"NY" "3" "632" "Northport"
"DE" "4" "460" "Wilmington"
"PA" "4" "503" "Altoona"
"PA" "4" "529" "Butler"
"WV" "4" "540" "Clarksburg"
Case-Based Analysis-1290
• The query lists all station with their specific code(VISN)
in each city and determine the state in which the city is
located in:
Nooshin Allahyari
16
SELECT DISTINCT ?city ?station ?visn ?st
WHERE
{
?s <http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#city> ?city
OPTIONAL{ ?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#station> ?station}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#visn> ?visn}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#st> ?st}
}
State VISN Station City
"ME" "1" "402" "Togus"
"VT" "1" "405"
"White River
Junction"
"MA" "1" "518" "Bedford"
"MA" "1" "523" "West Roxbury"
"NH" "1" "608" "Manchester"
"MA" "1" "631" "Northampton"
"RI" "1" "650" "Providence"
"CT" "1" "689" "West Haven"
Case-Based Analysis-1213
• The query lists all station with their specific code(VISN)
in each city and determine the state in which the city is
located in:
Nooshin Allahyari
17
SELECT DISTINCT ?visn ?city ?state
WHERE
{
?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#visn>
?visn.
?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#city>
?city.
?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#state>
?state
}
State VISN City
"CT" "1" "West Haven"
"MA" "1" "Bedford"
"MA" "1" "West Roxbury"
"MA" "1" "Northampton"
"ME" "1" "Togus"
"NH" "1" "Manchester"
"RI" "1" "Providence"
"VT" "1" "White River Junction"
Case-Based Analysis-1206
• Dataset 1206 similarities
Nooshin Allahyari
18
VISN STATE Facility-name City
"1" "CT" "VA Connecticut HCS" "West Haven"
"1" "MA"
"Edith Nourse Rogers Memorial
Veterans Hospital"
"Bedford"
"1" "MA"
"VA Boston HCSW Roxbury Brockton
Jamaica Plns"
"West Roxbury"
"1" "MA" "VAMC" "Northampton"
"1" "ME" "VAMC/RO" "Togus"
"1" "NH" "VAMC" "Manchester"
"1" "RI" "VAMC" "Providence"
"1" "VT" "VAM/ROC"
"White River
Junction"
SELECT DISTINCT ?state ?facilityname ?city
?visn
WHERE
{
?s
<http://www.data.gov/semantic/data/alpha/12
06/dataset-1206.rdf#visn> ?visn.
?s
<http://www.data.gov/semantic/data/alpha/12
06/dataset-1206.rdf#state> ?state.
?s
<http://www.data.gov/semantic/data/alpha/12
06/dataset-1206.rdf#city> ?city.
?s
<http://www.data.gov/semantic/data/alpha/12
06/dataset-1206.rdf#facility_name>
?facilityname
}
Case-Based Analysis-Comparison
• We need to explicitly define “owl:sameAs” property for
similar properties in order to get query results:
Nooshin Allahyari
19
SELECT DISTINCT ?state ?city
WHERE
{ GRAPH <http://localhost8890/vad/dataset1288>
{
?s1 <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st >?state.
?s1 <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city> ?city .
<http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st>
owl:sameAs
<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> .
http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city
Owl:sameAs
http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city.
}
GRAPH <http://localhost8890/vad/dataset1290>
{
?s2 <<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> ?st.
?s2 <http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city> ?city.
<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st>
owl:sameAs
<http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st>.
<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city>
Owl:sameAs
<http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city>.
}}order by ?state
State City
"CT" "West Haven"
"MA" "Bedford"
"MA" "West Roxbury"
"MA" "Northampton"
"ME" "Togus"
"NH" "Manchester"
"RI" "Providence"
"VT" "White River Junction"
Outlines
• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
20
Conclusion
• No Government ontology have been used in
experimental datasets
• Weak vocabulary usage in US Government
• Multi-vocabulary usage for same concept
• Multi-vocabulary usage in same government agency
• Lack of well defined, coherent, and consistent
government ontology.
Nooshin Allahyari
21
Thank you
Nooshin Allahyari
22

Linked Open Government Data (LOGD)

  • 1.
    Linked Open GovernmentData (LOGD): Ontology Usage Experimental Results Second Presentation Nooshin Allahyari 1
  • 2.
    Outlines • Categorizing dataprovider • Dataset collection • Dataset characteristics ▫ Namespace ▫ Ontology Usage ▫ Annotation property • Concept Coverage • Case-Based Analysis • Conclusion Nooshin Allahyari 2
  • 3.
    Categorizing data provider •US Government Agencies • Dividing agencies based on US Federal Government Reference Model • Each agency is in charge of publishing related datasets • Data.gov catalog also provide topic related categorization Nooshin Allahyari 3
  • 4.
    Outlines • Categorizing dataprovider • Dataset collection • Dataset characteristics ▫ Namespace ▫ Ontology Usage ▫ Annotation property • Concept Coverage • Case-Based Analysis • Conclusion Nooshin Allahyari 4
  • 5.
    Dataset Collection • All25 Datasets collected from Data.gov • Datasets are in RDF format • Difficulties running huge datasets • Using different tools As endpoint ▫ Virtuoso commercial version as SPARQL endpoint  Easy to Install  GUI  Lots of visual tools  SQL,SQL tools and connection tools. • Increasing dataset number for reliability Nooshin Allahyari 5
  • 6.
    Outlines • Categorizing dataprovider • Dataset collection • Dataset Composition Characteristics ▫ Namespace ▫ Ontology Usage ▫ Annotation property • Concept Coverage • Case-Based Analysis • Conclusion Nooshin Allahyari 6
  • 7.
    NameSpace Nooshin Allahyari 7 • SameNamespace usage for all datasets
  • 8.
    Ontology Vocabulary Usage •FEA Reference Model Ontology(RMO) • Vocabulary Related to Government Context ▫ General Vocabulary  Country  State  City ▫ Government programs, Services:  Health Program  Cultural Program Nooshin Allahyari 8
  • 9.
    Annotation Property • Usefulto provide additional information about datasets. All datasets have: ▫ rdfs:lable ▫ Rdfs:comments ▫ No language tag or metadata  Some datsets from Italy dataset catalog in TWC LOGD contain Language Tag . Nooshin Allahyari 9
  • 10.
    Outlines • Categorizing dataprovider • Dataset collection • Dataset characteristics ▫ Namespace ▫ Ontology Usage ▫ Annotation property • Concept Coverage • Case-Based Analysis • Conclusion Nooshin Allahyari 10
  • 11.
    Concept Coverage • SameConcept in all datasets • Metadata for Data.gov wiki and TWC LOGD Nooshin Allahyari 11 Prefix Concept foaf Homepage rdfs isDefinedBy dcterms Source dgtwc uses-property dgtwc number-of-triples dgtwc number-of-properties dgtwc number-of-enteries
  • 12.
    Concept Coverage • GeneralConcept Related Government • Low Coverage of concept • Multi-name concepts Nooshin Allahyari 12 Concept Coverage(percentage) State 48% City 32% State-Abbreviation 16% Region 12% Zip 12% Country 8% Country origin code 8% Area code 8%
  • 13.
    Outlines • Categorizing dataprovider • Dataset collection • Dataset characteristics ▫ Namespace ▫ Ontology Usage ▫ Annotation property • Concept Coverage • Case-Based Analysis • Conclusion Nooshin Allahyari 13
  • 14.
    Case-Based Analysis • Threedataset from same agency in same category ▫ Department of Veterans Affairs  dataset1213  dataset1288  Dataset1290 • Result of each dataset queries shows all three of them have similar concepts  State  City  VISN  Station Nooshin Allahyari 14
  • 15.
    Case-Based Analysis-1288 • Thequery lists all station with their specific code(VISN) in each city and determine the state in which the city is located in: Nooshin Allahyari 15 SELECT DISTINCT ?city ?station ?visn ?st WHERE { ?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#city> ?city OPTIONAL{ ?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#station> ?station} OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#visn> ?visn} OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#st> ?st} } State VISN Station City "NJ" "3" "561" "East Orange" "NY" "3" "620" "Montrose" "NY" "3" "630" "New York Harbor" "NY" "3" "632" "Northport" "DE" "4" "460" "Wilmington" "PA" "4" "503" "Altoona" "PA" "4" "529" "Butler" "WV" "4" "540" "Clarksburg"
  • 16.
    Case-Based Analysis-1290 • Thequery lists all station with their specific code(VISN) in each city and determine the state in which the city is located in: Nooshin Allahyari 16 SELECT DISTINCT ?city ?station ?visn ?st WHERE { ?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#city> ?city OPTIONAL{ ?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#station> ?station} OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#visn> ?visn} OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset- 1288.rdf#st> ?st} } State VISN Station City "ME" "1" "402" "Togus" "VT" "1" "405" "White River Junction" "MA" "1" "518" "Bedford" "MA" "1" "523" "West Roxbury" "NH" "1" "608" "Manchester" "MA" "1" "631" "Northampton" "RI" "1" "650" "Providence" "CT" "1" "689" "West Haven"
  • 17.
    Case-Based Analysis-1213 • Thequery lists all station with their specific code(VISN) in each city and determine the state in which the city is located in: Nooshin Allahyari 17 SELECT DISTINCT ?visn ?city ?state WHERE { ?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#visn> ?visn. ?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#city> ?city. ?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#state> ?state } State VISN City "CT" "1" "West Haven" "MA" "1" "Bedford" "MA" "1" "West Roxbury" "MA" "1" "Northampton" "ME" "1" "Togus" "NH" "1" "Manchester" "RI" "1" "Providence" "VT" "1" "White River Junction"
  • 18.
    Case-Based Analysis-1206 • Dataset1206 similarities Nooshin Allahyari 18 VISN STATE Facility-name City "1" "CT" "VA Connecticut HCS" "West Haven" "1" "MA" "Edith Nourse Rogers Memorial Veterans Hospital" "Bedford" "1" "MA" "VA Boston HCSW Roxbury Brockton Jamaica Plns" "West Roxbury" "1" "MA" "VAMC" "Northampton" "1" "ME" "VAMC/RO" "Togus" "1" "NH" "VAMC" "Manchester" "1" "RI" "VAMC" "Providence" "1" "VT" "VAM/ROC" "White River Junction" SELECT DISTINCT ?state ?facilityname ?city ?visn WHERE { ?s <http://www.data.gov/semantic/data/alpha/12 06/dataset-1206.rdf#visn> ?visn. ?s <http://www.data.gov/semantic/data/alpha/12 06/dataset-1206.rdf#state> ?state. ?s <http://www.data.gov/semantic/data/alpha/12 06/dataset-1206.rdf#city> ?city. ?s <http://www.data.gov/semantic/data/alpha/12 06/dataset-1206.rdf#facility_name> ?facilityname }
  • 19.
    Case-Based Analysis-Comparison • Weneed to explicitly define “owl:sameAs” property for similar properties in order to get query results: Nooshin Allahyari 19 SELECT DISTINCT ?state ?city WHERE { GRAPH <http://localhost8890/vad/dataset1288> { ?s1 <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st >?state. ?s1 <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city> ?city . <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st> owl:sameAs <http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> . http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city Owl:sameAs http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city. } GRAPH <http://localhost8890/vad/dataset1290> { ?s2 <<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> ?st. ?s2 <http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city> ?city. <http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> owl:sameAs <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st>. <http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city> Owl:sameAs <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city>. }}order by ?state State City "CT" "West Haven" "MA" "Bedford" "MA" "West Roxbury" "MA" "Northampton" "ME" "Togus" "NH" "Manchester" "RI" "Providence" "VT" "White River Junction"
  • 20.
    Outlines • Categorizing dataprovider • Dataset collection • Dataset characteristics ▫ Namespace ▫ Ontology Usage ▫ Annotation property • Concept Coverage • Case-Based Analysis • Conclusion Nooshin Allahyari 20
  • 21.
    Conclusion • No Governmentontology have been used in experimental datasets • Weak vocabulary usage in US Government • Multi-vocabulary usage for same concept • Multi-vocabulary usage in same government agency • Lack of well defined, coherent, and consistent government ontology. Nooshin Allahyari 21
  • 22.