SlideShare a Scribd company logo
STI Summit
       July 6th, 2011 Riga Latvia
                 2011, Riga,




Global Data Integration
and Global Data Mining

       Prof. Dr. Christian Bizer
        Freie U i
        F i Universität Berlin
                   ität B li
               Germany



                         Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Outline



          1. Topology of the Web of Data
              What data is out there?


          2. Global Data Integration
              How to split the integration effort


          3. Global Data Mining
               The logical next step




                                        Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Linked Data Deployment on the Web

  Year   Datasets     Triples        Growth
  2007     12         500.000.000
                      500 000 000
  2008     45        2.000.000.000   300%
  2009     95        6.726.000.000   236%
  2010     203      26.930.509.703   300%




                                        Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Uptake in the Government Domain




  The EU is starting to publish Linked Data (LOD2, LATC)
  Various other national efforts
  W3C eGovernment Interest Group

                                    Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Uptake in the Libraries Community

  Institutions publishing Linked Data
     Library of Congress (subject headings)
     German National Library (PND dataset and subject headings)
     S edish National Librar (Libris - catalog)
      Swedish          Library
     Hungarian National Library (OPAC and Digital Library)
     E
      Europeana project j t released d t about 4 million artifacts
                   j t just l      d data b t     illi     tif t


  Growth of Library Linked Data (2009-2010): 1000%
  W3C Library Linked Data Incubator Group
  Goals:
    1. Integrate Library Catalogs on global scale.
    2. Interconnect resources between repositories
       (by topic, by location, by historical period, by ...).


                                                  Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
LOD data set statistics as of November 2010


 Domain          Data Sets      Triples       Percent              RDF Links                 Percent
 Cross‐domain       20        1,999,085,950   7.42                  29,105,638                  7.36
 Geographic         16        5,904,980,833   21.93                 16,589,086                  4.19
 Government         25       11,613,525,437   43.12                 17,658,869                  4.46
 Media              26        2,453,898,811    9.11                 50,374,304                 12.74
 Libraries
 Lib i              67        2,237,435,732
                              2 237 435 732   8.31
                                              8 31                  77,951,898
                                                                    77 951 898                 19.71
                                                                                               19 71
 Life sciences      42        2,664,119,184    9.89                200,417,873                 50.67
 User Content
 User Content       7            57,463,756
                                 57 463 756   0.21
                                              0 21                   3,402,228
                                                                     3 402 228                  0.86
                                                                                                0 86
                   203       26,930,509,703                        395,499,896


 LOD Cloud Data Catalog on CKAN
 http://www.ckan.net/group/lodcloud
 http://www ckan net/group/lodcloud

 More statistics
 http://www4.wiwiss.fu-berlin.de/lodcloud/state/
                                              Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
What are the big players doing?




                          Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Structured Data becomes a SEO Topic


                                                      Data Snippets
                                                              pp




                                                    Query Answer




                         Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Result: Further growth …


 usage of RDFa has increased 510%
     g
  between March, 2009 and October, 2010
 430 million webpages contain RDFa




 Source: Yahoo
 http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/
                                            Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
The Structural Continuum


    The Web of Data is interwoven with the classic Web.


         Unstructured text: HTML
         Structured data:
            RDFa embed into HTML (Open Graph)
            Microdata embed into HTML (Schema.org)
            Microformats embed into HTML

         Linked data: RDF/XML




                                     Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Topology of the Web of Data




                          Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
How to get the data?


   Download the Billion Triples Challenge Dataset
      2 billion triples (20GB gzipped)
      crawled from the public Web of Linked Data in May/June 2011
      http://challenge.semanticweb.org/


   Download the Sindice Dump
      12 billion triples (164GB gzipped, ~1 16TB uncompressed)
                                 gzipped 1,16TB
      crawled from the public Web of Linked Data and
      includes RDFa Microformat and wrapped API data
                RDFa, Microformat,
      http://data.sindice.com/trec2011/download.html




                                            Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
2. Global Data Integration


           Applications hate heterogeneity!
            pp                     g     y




     The wild wild west                      My little world
                             Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
The Dataspace Vision

 Alternative to classic data integration systems in
 order to cope with growing number of data sources.

 P
  Properties of dataspaces
        ti    fd t
     no upfront investment into a global schema
     rely on pay-as-you-go d t integration
        l                   data i t   ti
     give best effort answers to queries


   Franklin, M., Halevy, A., and Maier, D.: From Databases to Dataspaces
   A new Abstraction for Information Management SIGMOD Rec. 2005
                                       Management,           Rec 2005.

   Madhavan, J., et al.: Web-scale Data Integration: You Can Only Afford
   to Pay As You Go, CIDR 2007




                                               Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Linked Data relies on Pay-as-You-Go Idea

  for Identity Management
  for Schema/Vocabulary Management




                                Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Publish Identity Links on the Web


                                                                      Identity Link
    <http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/Person4>
    owl:sameAs
    <http://dblp.l3s.de/d2r/resource/authors/Christian_Bizer> .




     You publish links pointing at other data sources.
    S
     Somebody else publishes li k pointing at your
           b d l     bli h links i ti       t
     data source.




                                           Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Effort Distribution between Publisher and Consumer




Consumer data mines
    identity
    identit links




              Effort
           Distribution




 Publishers or third
  parties provides
   identity links
          y


                            Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Vocabularies on the Web of Data

  Everyone can use whatever vocabularies she likes
   to publish Data on the Web.
                          Web
  Or invest effort and reuse Common Vocabularies
     Friend-of-a-Friend for describing people and their social network
     SIOC for describing forums and blogs
     SKOS for representing topic taxonomies
     Organization Ontology for describing the structure of organizations
     GoodRelations provides terms for describing products and business entities
     Music Ontology for describing artists, albums, and performances
     Review Vocabulary provides terms for representing reviews

  Many Linked Data Source use mixture of common and
   proprietary vocabulary terms.


                                              Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Publish Vocabulary Links on the Web


                                                                 Vocabulary Link
    <http://xmlns.com/foaf/0.1/Person>
    owl:equivalentClass
    <http://dbpedia.org/ontology/Person> .



     Simple Mappings: RDFS, OWL
         rdfs:subClassOf, rdfs:subPropertyOf
         owl:equivalentClass, owl:equivalentProperty

     Complex Mappings: R2R
         p      pp g
         provides value transformation functions
         structural transformations




                                             Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Deployment of Vocabulary Links




Source: Li k d O
S         Linked Open V
                      Vocabularies,
                          b l i
http://labs.mondeca.com/dataset/lov
                                      Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Effort Distribution between Publisher and Consumer




Consumer defines or
data mines mappings




               Effort
            Distribution



  Publisher reuses
   vocabularies

Publisher or third party
 publishes mappings


                            Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Somebody-Pays-As-You-Go

  The overall data integration effort is
  split between the data publisher, the
                         publisher
  data consumer and third parties.                                              Fix 
                                                                            Overall Data  
                                                                            Integration
   Data Publisher                                                             Effort
      publishes data as RDF
      sets identity links
      reuses terms or publishes mappings

   Third Parties
      set identity links pointing at y
                  y       p      g your data                           Publisher‘s
                                                                                             Third 
                                                                                             Party 
                                                                         Effort
      publish mappings to the Web                                                           Effort


   Data Consumer
                                                                                Consumer‘s
      has to do the rest                                                         Effort
      using record linkage and schema matching
       techniques
                                               Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Research Directions


 1. More research on pay-as-you-go data integration is needed.


 2. More research on data mining mappings and
    identity resolution heuristics is needed.
     Identity links make it easier to mine vocabulary links.
     Vocabulary links make it easier to mine identity links.



 3.
 3 More research on SPAM detection and data quality
    assessment is needed.




                                                Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
LDIF – Linked Data Integration Framework

  Combines vocabulary normalization and identity resolution
     C
      Currently only i
            tl    l in-memory i l
                              implementation
                                      t ti
     Next release: Hadoop-based implementation

  htt //
   http://www4.wiwiss.fu-berlin.de/bizer/ldif/
             4 i i f b li d /bi /ldif/                               Normalize                 Identity
                                                                    vocabularies              Resolution




                                               Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
What can we do afterwards …

   … build better entity search engines




                                    Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
3. Global Data Mining




                        Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Think about interesting questions …

 … that you can answer based on the Web of Data
 … that require
     aggregation
     summarization
     classification
     association rule mining

 … combined with
     text mining
     sediment analysis
                   y




                                   Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Everybody has the tools to find the answers




                           Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Research Directions


 1. More research on data space profiling is needed.


 2. More research on global data mining i needed.
 2 M            h     l b ld t    i i is     d d



  Google, Yahoo, Microsoft, Facebook will get there soon.
      g ,       ,          ,               g




                                    Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Semantic Web Challenge

  Submission Statistics

    Year       Open Track          Billion Triple Track
    2008            13                      9
    2009            16                      3
    2010            14                      4



  Do something interesting with the Billion Triple Data
     and submit your results to the challenge until October 1st
     present your results at the 10th International Semantic Web Conference
      (ISWC2011), October 2011, Koblenz, Germany




                                                Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Conclusions

  The Web of Data is there
     Linked Data, Microdata, RDFa, Microformats


  Upcoming research topics
     pay-as-you-go data integration
     mapping discovery, schema clustering
     identity resolution heuristics discovery
     probabilistic data integration
     data quality assessment
     data space profiling
     global data mining




                                                 Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
Thanks!




References
   Textbook: Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global
                   Heath
    Data Space. http://linkeddatabook.com/
   Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far
    http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf

                                                 Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)

More Related Content

Similar to STI Summit 2011 - Global data integration and global data mining

Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data World
ProQuest
 
Going for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked MetadataGoing for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked Metadata
EDINA, University of Edinburgh
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
LOD2 Creating Knowledge out of Interlinked Data
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
OpenAIRE
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
Scott Edmunds
 
Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"
National Information Standards Organization (NISO)
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
M. Tamer Özsu
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
ijdms
 
TDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationTDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citation
Vishwas Chavan
 
Jung 2010
Jung 2010Jung 2010
Jung 2010
Haklae Kim
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
AIMS (Agricultural Information Management Standards)
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
Anita de Waard
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
Dublinked .
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
Tope Omitola
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Stefan Dietze
 
SIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media SitesSIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media Sites
Uldis Bojars
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
Semantic Web Company
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
Chris Ewing
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
Sören Auer
 

Similar to STI Summit 2011 - Global data integration and global data mining (20)

Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data World
 
Going for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked MetadataGoing for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked Metadata
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
 
TDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationTDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citation
 
Jung 2010
Jung 2010Jung 2010
Jung 2010
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
SIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media SitesSIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media Sites
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 

More from Semantic Technology Institute International

Summit2013 sw in russian universities
Summit2013   sw in russian universitiesSummit2013   sw in russian universities
Summit2013 sw in russian universities
Semantic Technology Institute International
 
Summit2013 semantic web in russia
Summit2013   semantic web in russiaSummit2013   semantic web in russia
Summit2013 semantic web in russia
Semantic Technology Institute International
 
Summit2013 john domingue - introduction
Summit2013   john domingue - introductionSummit2013   john domingue - introduction
Summit2013 john domingue - introduction
Semantic Technology Institute International
 
Summit2013 john domingue - horizon2020
Summit2013   john domingue - horizon2020Summit2013   john domingue - horizon2020
Summit2013 john domingue - horizon2020
Semantic Technology Institute International
 
Summit2013 ho-jin choi - summit2013
Summit2013   ho-jin choi - summit2013Summit2013   ho-jin choi - summit2013
Summit2013 ho-jin choi - summit2013
Semantic Technology Institute International
 
Summit2013 georg gottlob and tim furche - diadem
Summit2013   georg gottlob and tim furche - diademSummit2013   georg gottlob and tim furche - diadem
Summit2013 georg gottlob and tim furche - diadem
Semantic Technology Institute International
 
Summit2013 eventos onto quad
Summit2013   eventos onto quadSummit2013   eventos onto quad
Summit2013 eventos onto quad
Semantic Technology Institute International
 
Summit2013 choi - kaist-cs-intro
Summit2013   choi - kaist-cs-introSummit2013   choi - kaist-cs-intro
Summit2013 choi - kaist-cs-intro
Semantic Technology Institute International
 
STI Summit 2011 - Conclusion
STI Summit 2011 - ConclusionSTI Summit 2011 - Conclusion
STI Summit 2011 - Conclusion
Semantic Technology Institute International
 
STI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic webSTI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic web
Semantic Technology Institute International
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
Semantic Technology Institute International
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
Semantic Technology Institute International
 
STI Summit 2011 - di@scale
STI Summit 2011 - di@scaleSTI Summit 2011 - di@scale
STI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic TechnologiesSTI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic Technologies
Semantic Technology Institute International
 
STI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked dataSTI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked data
Semantic Technology Institute International
 
STI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS KhaosSTI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS Khaos
Semantic Technology Institute International
 
STI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data workSTI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data work
Semantic Technology Institute International
 
STI Summit 2011 - Shortipedia
STI Summit 2011 - ShortipediaSTI Summit 2011 - Shortipedia
STI Summit 2011 - Shortipedia
Semantic Technology Institute International
 

More from Semantic Technology Institute International (20)

Summit2013 sw in russian universities
Summit2013   sw in russian universitiesSummit2013   sw in russian universities
Summit2013 sw in russian universities
 
Summit2013 semantic web in russia
Summit2013   semantic web in russiaSummit2013   semantic web in russia
Summit2013 semantic web in russia
 
Summit2013 john domingue - introduction
Summit2013   john domingue - introductionSummit2013   john domingue - introduction
Summit2013 john domingue - introduction
 
Summit2013 john domingue - horizon2020
Summit2013   john domingue - horizon2020Summit2013   john domingue - horizon2020
Summit2013 john domingue - horizon2020
 
Summit2013 ho-jin choi - summit2013
Summit2013   ho-jin choi - summit2013Summit2013   ho-jin choi - summit2013
Summit2013 ho-jin choi - summit2013
 
Summit2013 georg gottlob and tim furche - diadem
Summit2013   georg gottlob and tim furche - diademSummit2013   georg gottlob and tim furche - diadem
Summit2013 georg gottlob and tim furche - diadem
 
Summit2013 eventos onto quad
Summit2013   eventos onto quadSummit2013   eventos onto quad
Summit2013 eventos onto quad
 
Summit2013 choi - wise kb-introd
Summit2013   choi - wise kb-introdSummit2013   choi - wise kb-introd
Summit2013 choi - wise kb-introd
 
Summit2013 choi - kaist-cs-intro
Summit2013   choi - kaist-cs-introSummit2013   choi - kaist-cs-intro
Summit2013 choi - kaist-cs-intro
 
STI Summit 2011 - Conclusion
STI Summit 2011 - ConclusionSTI Summit 2011 - Conclusion
STI Summit 2011 - Conclusion
 
STI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic webSTI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic web
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
 
STI Summit 2011 - di@scale
STI Summit 2011 - di@scaleSTI Summit 2011 - di@scale
STI Summit 2011 - di@scale
 
STI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic TechnologiesSTI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic Technologies
 
STI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked dataSTI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked data
 
STI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS KhaosSTI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS Khaos
 
STI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data workSTI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data work
 
STI Summit 2011 - Shortipedia
STI Summit 2011 - ShortipediaSTI Summit 2011 - Shortipedia
STI Summit 2011 - Shortipedia
 

Recently uploaded

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

STI Summit 2011 - Global data integration and global data mining

  • 1. STI Summit July 6th, 2011 Riga Latvia 2011, Riga, Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie U i F i Universität Berlin ität B li Germany Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 2. Outline 1. Topology of the Web of Data  What data is out there? 2. Global Data Integration  How to split the integration effort 3. Global Data Mining  The logical next step Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 3. Linked Data Deployment on the Web Year Datasets Triples Growth 2007 12 500.000.000 500 000 000 2008 45 2.000.000.000 300% 2009 95 6.726.000.000 236% 2010 203 26.930.509.703 300% Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 4. Uptake in the Government Domain  The EU is starting to publish Linked Data (LOD2, LATC)  Various other national efforts  W3C eGovernment Interest Group Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 5. Uptake in the Libraries Community  Institutions publishing Linked Data  Library of Congress (subject headings)  German National Library (PND dataset and subject headings)  S edish National Librar (Libris - catalog) Swedish Library  Hungarian National Library (OPAC and Digital Library)  E Europeana project j t released d t about 4 million artifacts j t just l d data b t illi tif t  Growth of Library Linked Data (2009-2010): 1000%  W3C Library Linked Data Incubator Group  Goals: 1. Integrate Library Catalogs on global scale. 2. Interconnect resources between repositories (by topic, by location, by historical period, by ...). Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 6. LOD data set statistics as of November 2010 Domain Data Sets Triples Percent RDF Links Percent Cross‐domain 20 1,999,085,950 7.42 29,105,638 7.36 Geographic 16 5,904,980,833 21.93 16,589,086 4.19 Government 25 11,613,525,437 43.12 17,658,869 4.46 Media 26 2,453,898,811 9.11 50,374,304 12.74 Libraries Lib i 67 2,237,435,732 2 237 435 732 8.31 8 31 77,951,898 77 951 898 19.71 19 71 Life sciences 42 2,664,119,184 9.89 200,417,873 50.67 User Content User Content 7 57,463,756 57 463 756 0.21 0 21 3,402,228 3 402 228 0.86 0 86 203 26,930,509,703 395,499,896 LOD Cloud Data Catalog on CKAN http://www.ckan.net/group/lodcloud http://www ckan net/group/lodcloud More statistics http://www4.wiwiss.fu-berlin.de/lodcloud/state/ Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 7. What are the big players doing? Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 8. Structured Data becomes a SEO Topic Data Snippets pp Query Answer Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 9. Result: Further growth … usage of RDFa has increased 510% g between March, 2009 and October, 2010 430 million webpages contain RDFa Source: Yahoo http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/ Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 10. The Structural Continuum The Web of Data is interwoven with the classic Web.  Unstructured text: HTML  Structured data:  RDFa embed into HTML (Open Graph)  Microdata embed into HTML (Schema.org)  Microformats embed into HTML  Linked data: RDF/XML Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 11. Topology of the Web of Data Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 12. How to get the data?  Download the Billion Triples Challenge Dataset  2 billion triples (20GB gzipped)  crawled from the public Web of Linked Data in May/June 2011  http://challenge.semanticweb.org/  Download the Sindice Dump  12 billion triples (164GB gzipped, ~1 16TB uncompressed) gzipped 1,16TB  crawled from the public Web of Linked Data and  includes RDFa Microformat and wrapped API data RDFa, Microformat,  http://data.sindice.com/trec2011/download.html Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 13. 2. Global Data Integration Applications hate heterogeneity! pp g y The wild wild west My little world Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 14. The Dataspace Vision Alternative to classic data integration systems in order to cope with growing number of data sources. P Properties of dataspaces ti fd t  no upfront investment into a global schema  rely on pay-as-you-go d t integration l data i t ti  give best effort answers to queries Franklin, M., Halevy, A., and Maier, D.: From Databases to Dataspaces A new Abstraction for Information Management SIGMOD Rec. 2005 Management, Rec 2005. Madhavan, J., et al.: Web-scale Data Integration: You Can Only Afford to Pay As You Go, CIDR 2007 Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 15. Linked Data relies on Pay-as-You-Go Idea  for Identity Management  for Schema/Vocabulary Management Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 16. Publish Identity Links on the Web Identity Link <http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/Person4> owl:sameAs <http://dblp.l3s.de/d2r/resource/authors/Christian_Bizer> .  You publish links pointing at other data sources. S Somebody else publishes li k pointing at your b d l bli h links i ti t data source. Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 17. Effort Distribution between Publisher and Consumer Consumer data mines identity identit links Effort Distribution Publishers or third parties provides identity links y Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 18. Vocabularies on the Web of Data  Everyone can use whatever vocabularies she likes to publish Data on the Web. Web  Or invest effort and reuse Common Vocabularies  Friend-of-a-Friend for describing people and their social network  SIOC for describing forums and blogs  SKOS for representing topic taxonomies  Organization Ontology for describing the structure of organizations  GoodRelations provides terms for describing products and business entities  Music Ontology for describing artists, albums, and performances  Review Vocabulary provides terms for representing reviews  Many Linked Data Source use mixture of common and proprietary vocabulary terms. Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 19. Publish Vocabulary Links on the Web Vocabulary Link <http://xmlns.com/foaf/0.1/Person> owl:equivalentClass <http://dbpedia.org/ontology/Person> .  Simple Mappings: RDFS, OWL  rdfs:subClassOf, rdfs:subPropertyOf  owl:equivalentClass, owl:equivalentProperty  Complex Mappings: R2R p pp g  provides value transformation functions  structural transformations Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 20. Deployment of Vocabulary Links Source: Li k d O S Linked Open V Vocabularies, b l i http://labs.mondeca.com/dataset/lov Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 21. Effort Distribution between Publisher and Consumer Consumer defines or data mines mappings Effort Distribution Publisher reuses vocabularies Publisher or third party publishes mappings Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 22. Somebody-Pays-As-You-Go The overall data integration effort is split between the data publisher, the publisher data consumer and third parties. Fix  Overall Data   Integration  Data Publisher Effort  publishes data as RDF  sets identity links  reuses terms or publishes mappings  Third Parties  set identity links pointing at y y p g your data Publisher‘s Third  Party  Effort  publish mappings to the Web Effort  Data Consumer Consumer‘s  has to do the rest Effort  using record linkage and schema matching techniques Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 23. Research Directions 1. More research on pay-as-you-go data integration is needed. 2. More research on data mining mappings and identity resolution heuristics is needed.  Identity links make it easier to mine vocabulary links.  Vocabulary links make it easier to mine identity links. 3. 3 More research on SPAM detection and data quality assessment is needed. Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 24. LDIF – Linked Data Integration Framework  Combines vocabulary normalization and identity resolution  C Currently only i tl l in-memory i l implementation t ti  Next release: Hadoop-based implementation  htt // http://www4.wiwiss.fu-berlin.de/bizer/ldif/ 4 i i f b li d /bi /ldif/ Normalize Identity vocabularies Resolution Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 25. What can we do afterwards … … build better entity search engines Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 26. 3. Global Data Mining Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 27. Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 28. Think about interesting questions … … that you can answer based on the Web of Data … that require  aggregation  summarization  classification  association rule mining … combined with  text mining  sediment analysis y Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 29. Everybody has the tools to find the answers Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 30. Research Directions 1. More research on data space profiling is needed. 2. More research on global data mining i needed. 2 M h l b ld t i i is d d  Google, Yahoo, Microsoft, Facebook will get there soon. g , , , g Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 31. Semantic Web Challenge  Submission Statistics Year Open Track Billion Triple Track 2008 13 9 2009 16 3 2010 14 4  Do something interesting with the Billion Triple Data  and submit your results to the challenge until October 1st  present your results at the 10th International Semantic Web Conference (ISWC2011), October 2011, Koblenz, Germany Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 32. Conclusions  The Web of Data is there  Linked Data, Microdata, RDFa, Microformats  Upcoming research topics  pay-as-you-go data integration  mapping discovery, schema clustering  identity resolution heuristics discovery  probabilistic data integration  data quality assessment  data space profiling  global data mining Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)
  • 33. Thanks! References  Textbook: Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Heath Data Space. http://linkeddatabook.com/  Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf Christian Bizer: Global Data Integration – STI Summit, Riga (6/7/2011)