2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA         IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea   LOD...
Aim of this talk• How to plan, design, and implement LOD?• Learn from the case                  Hideaki Takeda, Fumi Kato ...
LODAC Project                                http://lod.ac/• Open Social Semantic Web Platform for Academic  Resources   –...
Linked Open Data Initiative • Non Profit Organization    – (Under application for approval) • Academia + IT People + local...
Museum data as LOD• The state-of-the-art of museum information in  Japan (nearly 6,000 museums in Japan)  – Distributed   ...
LODAC Museum – Main work• Gathering of data  – Thesaurus, museum collections, etc• Standardization of data  – Representing...
LODAC Museum Architecture         Consuming of data                                                 Integration of data   ...
Gathering data• No museums publish data as LOD!• We use data published as Web pages  – Scrape and translate data  – Licens...
Gathering data• No museums publish data as LOD!• We use data published as Web pages  – Scrape and translate data  – Licens...
DatasetType                   No.             Data sourceArt work                     ca.80,000 Catalog of the collections...
Extracting collection data from      museum websites                    Extract           Hideaki Takeda, Fumi Kato / Nati...
Extracting collection data from                 museum websites                               ExtractProperty   Value     ...
Standardization of data          Re-organized common metadata.                                        dc:title            ...
NamespacesPrefix                     Metadata Name crm                         CIDOC-CRMdc11                       Dublin ...
Metadata schema for works          lodac:Work         PropertyGenre                        lodac:genreType of cultural ass...
Integrating Data• How to integrate data from different sources   – sharing of responsibility        • Each source is respo...
Integrating Data  Raw Data for entities             Minimum Data to identify entities                    Raw Data for enti...
Integration of Person Data• Matching of Creators  –   Base: List of Artists from Thesaurus of Japanese Art  –   Target: Cr...
Integrating Data                                                                             Amount        Integration  In...
Publishing data as RDF                   <?xml version="1.0" encoding="UTF-8"?>                   <rdf:RDF xmlns:rdf="http...
LODAC Museum Architecture         Consuming of data                                                 Integration of data   ...
LODAC Applications•   Photo BURARI Pro•   Yokohama Art Spot•   Go2Museum•   http://lod.ac/apps                 Hideaki Tak...
Photo BURARI Pro                                     (C)ATR-Promotions,Inc       Photo App with SPARQL23            Hideak...
Photo BURARI Pro                                               (C)ATR-Promotions,Inc• SPARQL Endpoints  – DBpedia  – Linke...
An example in Objective CNSString* sparql = @” PREFIX dct: <http://purl.org/dc/terms/ > PREFIX omgeo:<http://www.ontotext....
Yokohama Art Spot        LODAC Museum × Yokohama Art LOD                           × PinQA– Application using  museum and ...
System Architecture                                                                      ‣ Python + SPARQLWrapper         ...
PREFIX ical: <http://www.w3.org/2002/12/caaltzd#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-sl/icchema#>PREFIX event: <ht...
PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#>             PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>    ...
PREFIX dc: <http://purl.org/dc/terms/>          PREFIX dc11: <http://purl.org/dc/elements/1.1/>          PREFIX rdfs: <htt...
Go2Museum                           http://160.193.95.58/~ueda/go2museum/ Hideaki Takeda, Fumi Kato / National Institute o...
iPhone   Android                   Hideaki Takeda, Fumi Kato / National Institute of Informatics
Museum data from various web sites       NDL                                                                          CiNi...
Twitter: @go2museum• “Today’s museum”• Recommendation based on lat&long of tweets                 Hideaki Takeda, Fumi Kat...
Summary• A life cycle of data is described  – Scraping, standardizing, integrating, and    publishing• Important issues  –...
Hideaki Takeda, Fumi Kato / National Institute of Informatics
• Please submit papers• Meet at Nara                Hideaki Takeda, Fumi Kato / National Institute of Informatics
Upcoming SlideShare
Loading in …5
×

LOD Examplar - LOD Museum -

1,346 views

Published on

The third talk at 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA 
IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,346
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

LOD Examplar - LOD Museum -

  1. 1. 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea LOD Application Exemplar- A case study: LODAC Museum Hideaki Takeda Fumi Kato National Institute of Informatics takeda@nii.ac.jp Hideaki Takeda, Fumi Kato / National Institute of Informatics
  2. 2. Aim of this talk• How to plan, design, and implement LOD?• Learn from the case Hideaki Takeda, Fumi Kato / National Institute of Informatics
  3. 3. LODAC Project http://lod.ac/• Open Social Semantic Web Platform for Academic Resources – Providing platforms for Linked Open Data – Practicing data accumulation and publishing• Interested Areas – Museum information – Geographical information, especially geographical names – Local information – Taxonomic information on species – … Hideaki Takeda, Fumi Kato / National Institute of Informatics
  4. 4. Linked Open Data Initiative • Non Profit Organization – (Under application for approval) • Academia + IT People + local people • Aim: facilitate LOD activities among local peoplehttp://linkedopendata.jp/ Hideaki Takeda, Fumi Kato / National Institute of Informatics
  5. 5. Museum data as LOD• The state-of-the-art of museum information in Japan (nearly 6,000 museums in Japan) – Distributed • Self maintained • Isolated – Opaque • Self designed • Messy• Aggregating and associating museum information – LODAC-Museum Hideaki Takeda, Fumi Kato / National Institute of Informatics
  6. 6. LODAC Museum – Main work• Gathering of data – Thesaurus, museum collections, etc• Standardization of data – Representing data from different sources in a unique form• Integration of data – Identifying data – Associating the same data• Consuming of data Hideaki Takeda, Fumi Kato / National Institute of Informatics
  7. 7. LODAC Museum Architecture Consuming of data Integration of data SPARQL Import unicorn(ruby) OWLIM RDF SPARQL Standardization of data Map to RDF nginx Gathering of data thttpd(python) Crawl / Scrape ID Management (MySQL) Ex tracted data (JSON) Museum Websites Semantic MediaWiki Hideaki Takeda, Fumi Kato / National Institute of Informatics
  8. 8. Gathering data• No museums publish data as LOD!• We use data published as Web pages – Scrape and translate data – License is not clear • It is a serous problem • We need permission from every site in principle • We got permission from some data publishers not all Hideaki Takeda, Fumi Kato / National Institute of Informatics
  9. 9. Gathering data• No museums publish data as LOD!• We use data published as Web pages – Scrape and translate data – License is not clear • It is a serous problem • We need permission from every site in principle • We got permission from some data publishers not all Hideaki Takeda, Fumi Kato / National Institute of Informatics
  10. 10. DatasetType No. Data sourceArt work ca.80,000 Catalog of the collections of 3 National Art(lodac:Work) Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums Database for National Treasure & Important Cultural Property of National Designated (915) The Japanese Art Thesaurus (266)Specimen ca.1,690,000 (100+ Museum collections)(lodac:Speciment) Science Net (National Science Museum)Person (foaf:Person) ca. 8,800 The Japanese Art ThesaurusFacilities (icls. ca. 200,000 The Japanese Art ThesaurusMuseum) Cultural Heritage Online GIS data National and Regional Planning Bureau Hideaki Takeda, Fumi Kato / National Institute of Informatics
  11. 11. Extracting collection data from museum websites Extract Hideaki Takeda, Fumi Kato / National Institute of Informatics
  12. 12. Extracting collection data from museum websites ExtractProperty Value Property Value Hideaki Takeda, Fumi Kato / National Institute of Informatics
  13. 13. Standardization of data Re-organized common metadata. dc:title crm:P45_consistOf skos:preflabel Raw Data .... lodac:era Re-organized MetadataCurrent organized policies・Use existing metadata・Define own metadata. 13 Hideaki Takeda, Fumi Kato / National Institute of Informatics
  14. 14. NamespacesPrefix Metadata Name crm CIDOC-CRMdc11 Dublin Core 1.1 dc DCMI Termsskos Simple Knowledge Organization System rdfs Resource Description Frame Work Schema foaf Friend of a Friendrda2 Resource Description and Accesslodac LODAC Project 14 Hideaki Takeda, Fumi Kato / National Institute of Informatics
  15. 15. Metadata schema for works lodac:Work PropertyGenre lodac:genreType of cultural assets lodac:culturalAssetsCreator dc:creator / dc11:creatorNationality crm:P7_took_place_atTitle dc:title / skos:prefLabelTitle Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabelTitle in English dc:title @en / skos:altLabelInscription crm:P62I_is_depicted_bySeal crm:P65_shows_visual_itemNo. of parts crm:P57_has_number_of_partsCollection dc:isPartOfCreated year dc:createdEstimated starting year lodac:estimatedStartYearMaterial dc:medium / crm:P45_consists_of Hideaki Takeda, Fumi Kato / National Institute of Informatics
  16. 16. Integrating Data• How to integrate data from different sources – sharing of responsibility • Each source is responsible for its data – Identifying IDs for data and managing data with the IDs • LODAC is only responsible for integration – Assigning original IDs and associating other IDs to them dc:references dc:references (Ref-resource) (ID-resource) (Ref-resource) Creator’s reference Creator’s information Creator’s reference Hideaki Takeda, Fumi Kato / National Institute of Informatics
  17. 17. Integrating Data Raw Data for entities Minimum Data to identify entities Raw Data for entities Data from Source A Integrated data Data from Source B Work dc:references dc:references crm:P55_has_current_location crm:P55_has_current_location dc:creator dc:creatordc:creator Museum crm:P55_has_current_location dc:references dc:references Creator dc:references dc:references Hideaki Takeda, Fumi Kato / National Institute of Informatics
  18. 18. Integration of Person Data• Matching of Creators – Base: List of Artists from Thesaurus of Japanese Art – Target: Creators of collection in museums + Dbpedia – Method: String match of names – Results: Links from artist nodes to work nodes are added LODAC data Links Link to Work DBpedia Basic Information for Creators Hideaki Takeda, Fumi Kato / National Institute of Informatics
  19. 19. Integrating Data Amount Integration Integrate Item Source of Data Data A.Japanese Art Thesaurus 648 Facilities 77 B.Cultural Heritage Online 915 A.Japanese Art Thesaurus (Art work) 3,800 Title of important 74cultural properties B.DB for National Treasure (Art work) 10,115 A.Japanese Art Thesaurus (Creator) 1,332Creator information and Work Title 15,020 B.All of art work (Work title string) 61,861 A.Japanese Art Thesaurus (Creator) 1,332 Creator name 615 B.All of art work title(using creator name) 61,861 19 Hideaki Takeda, Fumi Kato / National Institute of Informatics
  20. 20. Publishing data as RDF <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org /2004/02/skos/core#"> ID-resource URI <foaf:Person rdf:about="http://lod.ac/id/359"> (Own address) <lodac:creates rdf:resource="http://lod.ac/id/20029"/> http://lod.ac/id/359 <lodac:creates rdf:resource="http://lod.ac/id/20128"/> Links to her/his work <lodac:creates rdf:resource="http://lod.ac/id/20755"/> URIExternal link <lodac:creates rdf:resource="http://lod.ac/id/24768"/>DBpedia Japanese <lodac:creates rdf:resource="http://lod.ac/id/26732"/> …… <dc:references rdf:resource="http://ja.dbpedia.org/resource/下村観山"/> <dc:references rdf:resource="http://lod.ac/ref/359"/> <rdfs:label xml:lang="ja">下村観山</rdfs:label> <skos:prefLabel xml:lang="ja">下村観山</skos:prefLabel> <foaf:name xml:lang="ja">下村観山</foaf:name> Ref-resource URI </foaf:Person> http://lod.ac/ref/359 20 Hideaki Takeda, Fumi Kato / National Institute of Informatics
  21. 21. LODAC Museum Architecture Consuming of data Integration of data SPARQL Import unicorn(ruby) OWLIM RDF SPARQL Standardization of data Map to RDF nginx Gathering of data thttpd(python) Crawl / Scrape ID Management (MySQL) Ex tracted data (JSON) Museum Websites Semantic MediaWiki Hideaki Takeda, Fumi Kato / National Institute of Informatics
  22. 22. LODAC Applications• Photo BURARI Pro• Yokohama Art Spot• Go2Museum• http://lod.ac/apps Hideaki Takeda, Fumi Kato / National Institute of Informatics
  23. 23. Photo BURARI Pro (C)ATR-Promotions,Inc Photo App with SPARQL23 Hideaki Takeda, Fumi Kato / National Institute of Informatics
  24. 24. Photo BURARI Pro (C)ATR-Promotions,Inc• SPARQL Endpoints – DBpedia – Linked Geo Data – LODAC• Other data source – Sinsai.info• Using JSON Result – JSON Framework for Objective C Hideaki Takeda, Fumi Kato / National Institute of Informatics
  25. 25. An example in Objective CNSString* sparql = @” PREFIX dct: <http://purl.org/dc/terms/ > PREFIX omgeo:<http://www.ontotext.com/owlim/geo#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIXrdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT distinct ?link ?title ?lat ?long WHERE{ ?linkdct:references ?ref. ?ref rdfs:label ?title. ?ref geo:lat ?lat. ?ref geo:long ?long. ?ref omgeo:within(NW_latNW_long SE_lat SE_long). } LIMIT 30” ;NSString* query = (NSString*)CFURLCreateStringByAddingPercentEscapes(kDFAllocatorDefault, (CFStringRef)sparql, NULL, CFSTR(“;,/?:@=+$#”), kCFStringEncodingUTF8) ;NSURL *url = [NSURL URLWithString: query ];NSMutableURLRequest *req = [NSMutableURLRequest requestWithURL:url];[req setValue:@”application/sparql-results+json” forHTTPHeaderField:@”Accept”];NSURLResponse *resp;NSError *err;NSData *data = [NSURLConnection sendSynchronousRequest:req returningResponse:&resp error:&err];NSString* result = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding]; Hideaki Takeda, Fumi Kato / National Institute of Informatics
  26. 26. Yokohama Art Spot LODAC Museum × Yokohama Art LOD × PinQA– Application using museum and local data– Data related to art in Yokohama • Collections • Events • Q&A http://lod.ac/apps/yas/ Hideaki Takeda, Fumi Kato / National Institute of Informatics
  27. 27. System Architecture ‣ Python + SPARQLWrapper ‣ Geolocation Yokohama Art Spot SPARQL Yokohama JSON PinQA LODAC Art LOD Museum Question Event WorkUser Answer Artist Institution Artist Institution Hideaki Takeda, Fumi Kato / National Institute of Informatics
  28. 28. PREFIX ical: <http://www.w3.org/2002/12/caaltzd#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-sl/icchema#>PREFIX event: <http://lod.ac/ns/event#>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX dc: <http://purl.org/dc/terms/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX lodacid: <http://lod.ac/id/>PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>SELECT distinct ?event ?lat ?long ?title ?location_name ?location ?fee ?dtstart ?dtendWHERE { ?event a event:Event ; rdfs:label ?title ; event:fee ?fee; ical:location ?location ; ical:dtstart ?dtstart ; ical:dtend ?dtend . ?location rdfs:label ?location_name ; dc:references ?locRef. ?locRef omgeo:within(%(NE_lat)s %(NE_long)s %(SW_lat)s %(SW_long)s); vcard:postal-code ?postalcode; geo:lat ?lat; geo:long ?long. FILTER ((?dtstart > "%(dtstart)s"^^xsd:dateTime && ?dtstart < "%(dtend)s"^^xsd:dateTime) || (?dtend > "%(dtstart)s"^^xsd:dateTime && ?dtend < "%(dtend)s"^^xsd:dateTime) || (?dtstart < "%(dtstart)s"^^xsd:dateTime && ?dtend > "%(dtend)s"^^xsd:dateTime))}ORDER BY (omgeo:distance(?lat, ?long, %(C_lat)s, %(C_long)s)) Hideaki Takeda, Fumi Kato / National Institute of Informatics
  29. 29. PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX event: <http://lod.ac/ns/event#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX lodacid: <http://lod.ac/id/> PREFIX dc11: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { ?link a event:Event ; rdfs:label ?title ; event:fee ?fee; ical:categories ?cat; ical:location %(museum_id)s ; ical:dtstart ?dtstart ; ical:dtend ?dtend . ?cat dc11:title ?category. OPTIONAL{ ?link event:Credit ?crd . ?crd dc11:description ?credit . } }Hideaki Takeda, Fumi Kato / National Institute of Informatics
  30. 30. PREFIX dc: <http://purl.org/dc/terms/> PREFIX dc11: <http://purl.org/dc/elements/1.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX lodac: <http://lod.ac/ns/lodac#> PREFIX lodacid: <http://lod.ac/id/> SELECT ?link ?title ?creator ?created ?genre ?material ?size WHERE { %(museum_id)s lodac:isProviderOf ?link . ?link rdfs:label ?title; dc:references ?workRef . ?workRef lodac:genre %(genre)s; dc11:creator ?creator; dc:medium ?material; dc:extent ?size . OPTIONAL{ ?workRef dc:created ?created; } } LIMIT 100Hideaki Takeda, Fumi Kato / National Institute of Informatics
  31. 31. Go2Museum http://160.193.95.58/~ueda/go2museum/ Hideaki Takeda, Fumi Kato / National Institute of Informatics
  32. 32. iPhone Android Hideaki Takeda, Fumi Kato / National Institute of Informatics
  33. 33. Museum data from various web sites NDL CiNii Search Search Search LODAC LODAC Google Museum Location Yahoo!Web/Map/Route Location Link Link Link Hideaki Takeda, Fumi Kato / National Institute of Informatics
  34. 34. Twitter: @go2museum• “Today’s museum”• Recommendation based on lat&long of tweets Hideaki Takeda, Fumi Kato / National Institute of Informatics
  35. 35. Summary• A life cycle of data is described – Scraping, standardizing, integrating, and publishing• Important issues – Recognizing data – Designing schema • Good for data • Good for RDF Store and SPARQL – Developing applications • More people can be involved • Next cycle of data Hideaki Takeda, Fumi Kato / National Institute of Informatics
  36. 36. Hideaki Takeda, Fumi Kato / National Institute of Informatics
  37. 37. • Please submit papers• Meet at Nara Hideaki Takeda, Fumi Kato / National Institute of Informatics

×