SlideShare a Scribd company logo
Linked Census Data
                                    Rinke Hoekstra
                               CEDAR Kickoff, 26 January 2012




donderdag 26 januari 12
Overview
             “Can Linked Data make a difference for historical analysis?”


              Problem

              Procedure (as I understand it)

              Step-by-step

              Vocabularies, tools

              Conclusion


donderdag 26 januari 12
Problem
              ~519 Excel spreadsheets (more?... I heard 1200)

              Want to do analysis over time and space, but...

              Structure

                    Excel sheets cannot be readily imported in a database

              Contents

                    Excel sheets are not normalised (age) nor harmonised (occupations/places)

                    Excel sheets contain errors (both original and data-entry)

              Want to preserve all stages of data cleansing/harmonisation


donderdag 26 januari 12
Procedure
                                  Verbatim import of sheets to
            Archiving                database/triple store

           Correcting/         Add missing information (headers)




                                                                          Documenting
          Interpreting          Add corrected information (data)


         Normalising       Interpret and correct objective information


                             Link information across sheets
        Harmonising Link information to other datasets (e.g. locations)


           Visualising       Build (generic) visualisations of results



donderdag 26 januari 12
... a bit about Linked Data

              “Just another Data Model”
              RDF ≠ Ontology (OWL)
              RDF ≠ Taxonomy (RDFS/SKOS)


              Globally Unique Identifiers (URI) for all entities

              Dereferencable on the Web (URI = URL)

              HTTP-accessible databases (triple stores, SPARQL)

              Triples all the way             <subject,	
  predicate,	
  object>



donderdag 26 januari 12
Spreadsheet ≠ Database

                          Primary Keys are entities

                          Column names are attributes

                          Cell values are attribute values

                          Secondary keys are relations to
                          other entities




donderdag 26 januari 12
Spreadsheet ≠ Database

                          Primary Keys are entities

                          Column names are attributes

                          Cell values are attribute values

                          Secondary keys are relations to
                          other entities




donderdag 26 januari 12
Spreadsheet ≠ Database

                          Primary Keys are entities

                          Column names are attributes

                          Cell values are attribute values

                          Secondary keys are relations to
                          other entities




donderdag 26 januari 12
Spreadsheet ≠ Database
                          No Primary Keys!

                          Anything can be an entity

                          Column headers are “types”

                          Row headers are “types”

                          Hierarchies!

                          Cell values are entity “values”

                          No relations to other entities


donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook

                                       Cell   Cell   Cell


                               Sheet   Cell   Cell   Cell


                                       Cell   Cell   Cell



                                       Cell   Cell   Cell


                               Sheet   Cell   Cell   Cell


                                       Cell   Cell   Cell




donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook1.xls

                                          Sheet1:A1   Sheet1:B1   Sheet1:C1


                               Sheet1     Sheet1:A2   Sheet1:B2   Sheet1:C2


                                             ...         ...         ...



                                          Sheet2:A1   Sheet2:B1   Sheet2:C1


                               Sheet2     Sheet2:A2   Sheet2:B2   Sheet2:C2


                                             ...         ...         ...




donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook1.xls

                                          workers    agriculture   12


                               Sheet1                 industry     6


                                                         ...       ...


                                          diamond
                                                         A         34
                                           cutters


                               Sheet2                    B         67


                                             ...         ...       ...




donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook1.xls

                                          workers    agriculture   12


                               Sheet1                 industry     6


                                                         ...       ...


                                          diamond
                                                         A         34
                                           cutters


                               Sheet2                    B         67


                                             ...         ...       ...




                                                                     NB: all URIs scoped to sheet!



donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?

              SDMX (Eurostat, World Bank, CBS, etc.)



              Every data item is an observation

              Every observation has a value

              Every observation has one or more dimensions


donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?

              SDMX (Eurostat, World Bank, CBS, etc.)



              Every data item is an observation

              Every observation has a value

              Every observation has one or more dimensions


donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?                                                12



                                                                                                                  1878

              SDMX (Eurostat, World Bank, CBS, etc.)                                                          M


                                                                                                                    O
                                              I
                                                                                             leeftijd

                                                      nummer der beroepsklasse                                                   geboortejaar


              Every data item is an observation                                                          geslacht

                                                                                                              huwelijkse staat


                                                  E      pannenbakkers

              Every observation has a value                                                  beroep

                                                                                                    positie
                                                                                         D                          1




              Every observation has one or more dimensions    letter der beroepsklasse




donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?                                                 12



                                                                                                                   1878

              SDMX (Eurostat, World Bank, CBS, etc.)                                                           M


                                                                                                                       O
                                              I
                                                                                             leeftijd   ?
                                                      nummer der beroepsklasse                                                               ?
                                                                                                                                  geboortejaar


              Every data item is an observation                                                                    ?
                                                                                                            geslacht
                                                                                                                             ?
                                                                                                               huwelijkse staat


                                                  E      pannenbakkers

              Every observation has a value                                                  beroep

                                                                                                    positie
                                                                                         D                             1




              Every observation has one or more dimensions    letter der beroepsklasse




donderdag 26 januari 12
Anatomy of a Spreadsheet

                           Properties   Headers




                          RowHeaders     Data




donderdag 26 januari 12
Anatomy of a Spreadsheet

                           Properties   Headers




                          RowHeaders     Data




donderdag 26 januari 12
Anatomy of a Spreadsheet

                                      Properties      Headers




                                     RowHeaders        Data




                          http://github.com/Data2Semantics/TabLinker
donderdag 26 januari 12
:I
                                                                                                                                "1"^^xsd:int


                               skos:broader                  :Nummer_der_beroepsklasse                            d2s:populationSize



                                   :I/E              :Letter__Onderdeel_beroepsklasse_                     _:x                    d2s:dimension            :14--15_1875--1874


                                                                                                                                          d2s:dimension
                               skos:broader
                                                                                                                                                                  :M

             :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen                                   d2s:dimension


                                                                                                                                                                  :O
                                                                           :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D
         Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers

                                                                                               :D




                                                                                                                                       Sheet1:D15




donderdag 26 januari 12
d2s:HierarchicalRowHeader                                                                                   d2s:DataCell                                                 d2s:Header



                                     rdf:type                                                                            rdf:type                                                     rdf:type
                          rdf:type                                                                                                                                                                  rdf:type
            rdf:type                                                                                                                                                                                             rdf:type


   Sheet1:E15            Sheet1:C14             Sheet1:B8                                                             Sheet1:L15                                                    Sheet1:L3             Sheet1:L4         Sheet1:L5


                                                d2s:isDimension


                                                        :I
                           d2s:isDimension                                                                                                     "1"^^xsd:int
                                                                                                                     d2s:isObservation                                            d2s:isDimension


                                                   skos:broader           :Nummer_der_beroepsklasse                                                                                                      d2s:isDimension
                                                                                                                                 d2s:populationSize



d2s:isDimension                                        :I/E       :Letter__Onderdeel_beroepsklasse_                        _:x                   d2s:dimension             :14--15_1875--1874                     d2s:isDimension



                                                                                                                                                         d2s:dimension
                                                   skos:broader
                                                                                                                                                                                        :M

                          :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen                                      d2s:dimension
                                                                                                                                         :Regelnummer
                                                                                                                                                                                        :O
                                                                                        :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D                        d2s:dimension
                       Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers

                                                                                                            :D                                            :5                            :10



                                                                                                      d2s:isDimension                             d2s:isDimension                 d2s:isDimension



                                                                                                       Sheet1:F15                                     Sheet1:D15                   Sheet1:L6


                                                                                                          rdf:type                                             rdf:type    rdf:type



                                                                                                      d2s:RowHeader                                              d2s:Metadata




donderdag 26 januari 12
What TabLinker can’t do
              Annotations
              “footnote”-style on separate sheet

              Interpret functions
              e.g. automatic sums

              Integrate/harmonise across sheets/files

              Additional useful functionality:

                    “checksum” functionality

                    Export to database tables

donderdag 26 januari 12
Normalising & Correcting

                             "1"^^xsd:int


                           d2s:populationSize



                                  _:x



                             d2s:dimension



                          :14--15_1875--1874




donderdag 26 januari 12
Normalising & Correcting

                             "1"^^xsd:int          "1"^^xsd:int                             "11"^^xsd:int


                           d2s:populationSize    d2s:populationSize d2s:populationSize
                                                                                                    "1889"^^xsd:int
                                                                         d2s:censusYear
                                  _:x                   _:x
                                                                         d2s:birthYears
                                                                                                            :1875--1874
                                                                             d2s:gemeente

                             d2s:dimension         d2s:dimension      d2s:ageGroup

                                                                                                                 :Assendelft

                          :14--15_1875--1874    :14--15_1875--1874                            :14-15




donderdag 26 januari 12
Documenting

        <http://example.com/workbook1/sheet1>      <http://example.com/workbook1/sheet1/corrected>                                                              provo:Activity
                                                                                                                                                  rdf:type
                                                                                                                         :curation20120126
                     "1"^^xsd:int                              "11"^^xsd:int
                                                                                                     provo:wasGeneratedBy                     provo:hadAgent

                                                                                                                                provo:startedAt
                   d2s:populationSize d2s:populationSize                                                            provo:endedAt
                                                                       "1889"^^xsd:int                                                                          :RinkeHoekstra
                                           d2s:censusYear
                          _:x
                                           d2s:birthYears
                                                                               :1875--1874                         _:b                      _:a
                                                d2s:gemeente
                     d2s:dimension      d2s:ageGroup
                                                                                                            time:inXSDDateTime           time:inXSDDateTime
                                                                                    :Assendelft

                 :14--15_1875--1874                              :14-15
                                                                                                          "20120126T09:00:00"                 "20120126T08:30:00"




                                                               http://www.w3.org/TR/prov-o/


donderdag 26 januari 12
Harmonising

                                                                                  I



                                                                                              skos:broader
                                                                            skos:broader
                                                    skos:broader


                                              D                                   E                                  A



                                                                   skos:broader       skos:broader                       skos:broader
                               skos:broader



                                                                                                                                     Fabricage van
                                                 Fabricage van steen                                                                aardewerk (incl.
                    Fabricage van                                                          Fabricage van dakpannen
                                              (molensteen, steenbakkers,                                                          porcelein, terracotta,
                         kalk                                                                  (pannenbakkers)
                                                    tegelbakkers)                                                                   kachelbakkers,
                                                                                                                                  pottenbakkers, enz.)




donderdag 26 januari 12
Harmonising
                                                                                        I



                                                                                                    skos:broader
                                                                                  skos:broader
                                                          skos:broader


                                                    D                                   E                                  A



                                                                         skos:broader       skos:broader                        skos:broader
                                skos:broader



                                                                                                                                              Fabricage van
                                                       Fabricage van steen                                                                   aardewerk (incl.
                  Fabricage van                                                                  Fabricage van dakpannen
                                                    (molensteen, steenbakkers,                                                             porcelein, terracotta,
                       kalk                                                                          (pannenbakkers)
                                                          tegelbakkers)                                                                      kachelbakkers,
                                                                                                                                           pottenbakkers, enz.)

                   skos:exactMatch                       skos:broadMatch                              skos:broadMatch                          skos:closeMatch
                                     skos:exactMatch                                                                     skos:exactMatch
                                                                                 skos:exactMatch


                    HISCO:23811                           HISCO:25281                                      HISCO:25281                          HISCO:26345



                                      HISCO:23810                                HISCO:25281                               HISCO:26340




donderdag 26 januari 12
Harmonising
                                                                                             I



                                                                                                          skos:broader
                                                                                     skos:broader
                                                             skos:broader


                                                    D                                        E                                    A



                                                                            skos:broader         skos:broader                         skos:broader
                                     skos:broader



                                                                                                                                                  Fabricage van
                                                          Fabricage van steen                                                                    aardewerk (incl.
                          Fabricage van                                                                Fabricage van dakpannen
                                                       (molensteen, steenbakkers,                                                              porcelein, terracotta,
                               kalk                                                                        (pannenbakkers)
                                                             tegelbakkers)                                                                       kachelbakkers,
                                                                                                                                               pottenbakkers, enz.)

                                                                                            Sheet1:I



                                                                                           skos:broader            skos:broader
                                                           skos:broader


                                            Sheet1:D                                       Sheet1:E                                    Sheet1:A



                                                                               skos:broader         skos:broader                              skos:broader
                                    skos:broader


                                                                                                                                                 Sheet1:Fabricage van
                                                         Sheet1:Fabricage van steen                        Sheet1:Fabricage van                     aardewerk (incl.
                           Sheet1:Fabricage
                                                         (molensteen, steenbakkers,                             dakpannen                         porcelein, terracotta,
                               van kalk
                                                               tegelbakkers)                                 (pannenbakkers)                        kachelbakkers,
                                                                                                                                                  pottenbakkers, enz.)

donderdag 26 januari 12
I



                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




donderdag 26 januari 12
I
                                                                                                                                                        Is SKOS sufficient?
                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




        NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin.

donderdag 26 januari 12
I
                                                                                                                                                        Is SKOS sufficient?
                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




        NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin.

donderdag 26 januari 12
I
                                                                                                                                                        Is SKOS sufficient?
                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




        NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin.

donderdag 26 januari 12
Vocabularies, Tools
              Vocabularies
              Data Cube, SKOS, W3C Time, PROV-O

              Excel + TabLinker
              Semi-automatic conversion of Excel sheets to RDF

              ProvTracer
              Create PROV-O provenance trail for shell/python scripts

              Visualization Prototype
              SGVizler (SPARQL + Google Graph API)


donderdag 26 januari 12
Discussion
              Advantages of Linked Data approach

                    Straightforward transformation from spreadsheets

                    Seamless integration of original, corrected and harmonised data

                    Ingestion of external (linked) data

                    Powerful documentation (provenance)

                    Everything is transparently query-able (SPARQL)

                    .... on the Web


donderdag 26 januari 12
Discussion


              Disadvantages of Linked Data approach (subject to research)

                    Size? (300k * 519 sheets = 156M triples)

                    Only rudimentary support for arithmetical operations in queries

                    No dynamic/conditional ‘view’-like graphs




donderdag 26 januari 12
SPARQL vs. SQL?


              Middle ground?

              Expose database through D2RQ




donderdag 26 januari 12
Fin



donderdag 26 januari 12

More Related Content

More from Rinke Hoekstra

Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research Data
Rinke Hoekstra
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
Rinke Hoekstra
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
Rinke Hoekstra
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataRinke Hoekstra
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
Rinke Hoekstra
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for ResearchRinke Hoekstra
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
Rinke Hoekstra
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
Rinke Hoekstra
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
Rinke Hoekstra
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesRinke Hoekstra
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataRinke Hoekstra
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
Rinke Hoekstra
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
Rinke Hoekstra
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design PatternsRinke Hoekstra
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids Data
Rinke Hoekstra
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsRinke Hoekstra
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitRinke Hoekstra
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web Technology
Rinke Hoekstra
 
BestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court ProceedingsBestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court ProceedingsRinke Hoekstra
 
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2Rinke Hoekstra
 

More from Rinke Hoekstra (20)

Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research Data
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design Patterns
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids Data
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the Netherlands
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site Visit
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web Technology
 
BestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court ProceedingsBestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
 
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Linked Census Data

  • 1. Linked Census Data Rinke Hoekstra CEDAR Kickoff, 26 January 2012 donderdag 26 januari 12
  • 2. Overview “Can Linked Data make a difference for historical analysis?” Problem Procedure (as I understand it) Step-by-step Vocabularies, tools Conclusion donderdag 26 januari 12
  • 3. Problem ~519 Excel spreadsheets (more?... I heard 1200) Want to do analysis over time and space, but... Structure Excel sheets cannot be readily imported in a database Contents Excel sheets are not normalised (age) nor harmonised (occupations/places) Excel sheets contain errors (both original and data-entry) Want to preserve all stages of data cleansing/harmonisation donderdag 26 januari 12
  • 4. Procedure Verbatim import of sheets to Archiving database/triple store Correcting/ Add missing information (headers) Documenting Interpreting Add corrected information (data) Normalising Interpret and correct objective information Link information across sheets Harmonising Link information to other datasets (e.g. locations) Visualising Build (generic) visualisations of results donderdag 26 januari 12
  • 5. ... a bit about Linked Data “Just another Data Model” RDF ≠ Ontology (OWL) RDF ≠ Taxonomy (RDFS/SKOS) Globally Unique Identifiers (URI) for all entities Dereferencable on the Web (URI = URL) HTTP-accessible databases (triple stores, SPARQL) Triples all the way <subject,  predicate,  object> donderdag 26 januari 12
  • 6. Spreadsheet ≠ Database Primary Keys are entities Column names are attributes Cell values are attribute values Secondary keys are relations to other entities donderdag 26 januari 12
  • 7. Spreadsheet ≠ Database Primary Keys are entities Column names are attributes Cell values are attribute values Secondary keys are relations to other entities donderdag 26 januari 12
  • 8. Spreadsheet ≠ Database Primary Keys are entities Column names are attributes Cell values are attribute values Secondary keys are relations to other entities donderdag 26 januari 12
  • 9. Spreadsheet ≠ Database No Primary Keys! Anything can be an entity Column headers are “types” Row headers are “types” Hierarchies! Cell values are entity “values” No relations to other entities donderdag 26 januari 12
  • 10. Anatomy of a Spreadsheet Workbook Cell Cell Cell Sheet Cell Cell Cell Cell Cell Cell Cell Cell Cell Sheet Cell Cell Cell Cell Cell Cell donderdag 26 januari 12
  • 11. Anatomy of a Spreadsheet Workbook1.xls Sheet1:A1 Sheet1:B1 Sheet1:C1 Sheet1 Sheet1:A2 Sheet1:B2 Sheet1:C2 ... ... ... Sheet2:A1 Sheet2:B1 Sheet2:C1 Sheet2 Sheet2:A2 Sheet2:B2 Sheet2:C2 ... ... ... donderdag 26 januari 12
  • 12. Anatomy of a Spreadsheet Workbook1.xls workers agriculture 12 Sheet1 industry 6 ... ... diamond A 34 cutters Sheet2 B 67 ... ... ... donderdag 26 januari 12
  • 13. Anatomy of a Spreadsheet Workbook1.xls workers agriculture 12 Sheet1 industry 6 ... ... diamond A 34 cutters Sheet2 B 67 ... ... ... NB: all URIs scoped to sheet! donderdag 26 januari 12
  • 14. Data Cube How to best represent numeric data, in a flexible way? SDMX (Eurostat, World Bank, CBS, etc.) Every data item is an observation Every observation has a value Every observation has one or more dimensions donderdag 26 januari 12
  • 15. Data Cube How to best represent numeric data, in a flexible way? SDMX (Eurostat, World Bank, CBS, etc.) Every data item is an observation Every observation has a value Every observation has one or more dimensions donderdag 26 januari 12
  • 16. Data Cube How to best represent numeric data, in a flexible way? 12 1878 SDMX (Eurostat, World Bank, CBS, etc.) M O I leeftijd nummer der beroepsklasse geboortejaar Every data item is an observation geslacht huwelijkse staat E pannenbakkers Every observation has a value beroep positie D 1 Every observation has one or more dimensions letter der beroepsklasse donderdag 26 januari 12
  • 17. Data Cube How to best represent numeric data, in a flexible way? 12 1878 SDMX (Eurostat, World Bank, CBS, etc.) M O I leeftijd ? nummer der beroepsklasse ? geboortejaar Every data item is an observation ? geslacht ? huwelijkse staat E pannenbakkers Every observation has a value beroep positie D 1 Every observation has one or more dimensions letter der beroepsklasse donderdag 26 januari 12
  • 18. Anatomy of a Spreadsheet Properties Headers RowHeaders Data donderdag 26 januari 12
  • 19. Anatomy of a Spreadsheet Properties Headers RowHeaders Data donderdag 26 januari 12
  • 20. Anatomy of a Spreadsheet Properties Headers RowHeaders Data http://github.com/Data2Semantics/TabLinker donderdag 26 januari 12
  • 21. :I "1"^^xsd:int skos:broader :Nummer_der_beroepsklasse d2s:populationSize :I/E :Letter__Onderdeel_beroepsklasse_ _:x d2s:dimension :14--15_1875--1874 d2s:dimension skos:broader :M :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen d2s:dimension :O :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers :D Sheet1:D15 donderdag 26 januari 12
  • 22. d2s:HierarchicalRowHeader d2s:DataCell d2s:Header rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type Sheet1:E15 Sheet1:C14 Sheet1:B8 Sheet1:L15 Sheet1:L3 Sheet1:L4 Sheet1:L5 d2s:isDimension :I d2s:isDimension "1"^^xsd:int d2s:isObservation d2s:isDimension skos:broader :Nummer_der_beroepsklasse d2s:isDimension d2s:populationSize d2s:isDimension :I/E :Letter__Onderdeel_beroepsklasse_ _:x d2s:dimension :14--15_1875--1874 d2s:isDimension d2s:dimension skos:broader :M :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen d2s:dimension :Regelnummer :O :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D d2s:dimension Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers :D :5 :10 d2s:isDimension d2s:isDimension d2s:isDimension Sheet1:F15 Sheet1:D15 Sheet1:L6 rdf:type rdf:type rdf:type d2s:RowHeader d2s:Metadata donderdag 26 januari 12
  • 23. What TabLinker can’t do Annotations “footnote”-style on separate sheet Interpret functions e.g. automatic sums Integrate/harmonise across sheets/files Additional useful functionality: “checksum” functionality Export to database tables donderdag 26 januari 12
  • 24. Normalising & Correcting "1"^^xsd:int d2s:populationSize _:x d2s:dimension :14--15_1875--1874 donderdag 26 januari 12
  • 25. Normalising & Correcting "1"^^xsd:int "1"^^xsd:int "11"^^xsd:int d2s:populationSize d2s:populationSize d2s:populationSize "1889"^^xsd:int d2s:censusYear _:x _:x d2s:birthYears :1875--1874 d2s:gemeente d2s:dimension d2s:dimension d2s:ageGroup :Assendelft :14--15_1875--1874 :14--15_1875--1874 :14-15 donderdag 26 januari 12
  • 26. Documenting <http://example.com/workbook1/sheet1> <http://example.com/workbook1/sheet1/corrected> provo:Activity rdf:type :curation20120126 "1"^^xsd:int "11"^^xsd:int provo:wasGeneratedBy provo:hadAgent provo:startedAt d2s:populationSize d2s:populationSize provo:endedAt "1889"^^xsd:int :RinkeHoekstra d2s:censusYear _:x d2s:birthYears :1875--1874 _:b _:a d2s:gemeente d2s:dimension d2s:ageGroup time:inXSDDateTime time:inXSDDateTime :Assendelft :14--15_1875--1874 :14-15 "20120126T09:00:00" "20120126T08:30:00" http://www.w3.org/TR/prov-o/ donderdag 26 januari 12
  • 27. Harmonising I skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) donderdag 26 januari 12
  • 28. Harmonising I skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:exactMatch skos:broadMatch skos:broadMatch skos:closeMatch skos:exactMatch skos:exactMatch skos:exactMatch HISCO:23811 HISCO:25281 HISCO:25281 HISCO:26345 HISCO:23810 HISCO:25281 HISCO:26340 donderdag 26 januari 12
  • 29. Harmonising I skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) Sheet1:I skos:broader skos:broader skos:broader Sheet1:D Sheet1:E Sheet1:A skos:broader skos:broader skos:broader skos:broader Sheet1:Fabricage van Sheet1:Fabricage van steen Sheet1:Fabricage van aardewerk (incl. Sheet1:Fabricage (molensteen, steenbakkers, dakpannen porcelein, terracotta, van kalk tegelbakkers) (pannenbakkers) kachelbakkers, pottenbakkers, enz.) donderdag 26 januari 12
  • 30. I skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) donderdag 26 januari 12
  • 31. I Is SKOS sufficient? skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin. donderdag 26 januari 12
  • 32. I Is SKOS sufficient? skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin. donderdag 26 januari 12
  • 33. I Is SKOS sufficient? skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin. donderdag 26 januari 12
  • 34. Vocabularies, Tools Vocabularies Data Cube, SKOS, W3C Time, PROV-O Excel + TabLinker Semi-automatic conversion of Excel sheets to RDF ProvTracer Create PROV-O provenance trail for shell/python scripts Visualization Prototype SGVizler (SPARQL + Google Graph API) donderdag 26 januari 12
  • 35. Discussion Advantages of Linked Data approach Straightforward transformation from spreadsheets Seamless integration of original, corrected and harmonised data Ingestion of external (linked) data Powerful documentation (provenance) Everything is transparently query-able (SPARQL) .... on the Web donderdag 26 januari 12
  • 36. Discussion Disadvantages of Linked Data approach (subject to research) Size? (300k * 519 sheets = 156M triples) Only rudimentary support for arithmetical operations in queries No dynamic/conditional ‘view’-like graphs donderdag 26 januari 12
  • 37. SPARQL vs. SQL? Middle ground? Expose database through D2RQ donderdag 26 januari 12