Successfully reported this slideshow.

Analytical processing for Linked Data using OLAP

2,276 views

Published on

Analytical processing for Linked Data using OLAP

Published in: Technology
  • Be the first to comment

Analytical processing for Linked Data using OLAP

  1. 1. Analytical processing of Linked Data using OLAP Hiroyuki Inoue, SWIM Seminar, 16th May 2012 1
  2. 2. AgendaBackgroundObjectiveRelated workProposalExperimentsConclusionFuture work 2
  3. 3. Semantic Web and RDFSemantic Web  A movement that makes computers be able to understand meanings of resources on the Web as same as human with appending metadata.  Resources: Documents, Images, …RDF(Resource Description Framework)  A basic model for describing information  Triple: Subject,Predicate,Object  Describe attribute or property of „subject‟ with „predicate‟, and write its value with „object‟  Construct graph structure has_title “Univ. of http://www.tsukuba.ac.jp/ Tsukuba” 3 Oval: Resource, Rectangle: Value (string)
  4. 4. Linked Data Objective: It enables people and organizations to share the structured data on the Web.  Linked Data uses RDF for describing attributes and properties of data  Linked Data encourages people to use data more effectively Features  Uses URI and HTTP. It can be referred on the Web.  Uses standardized technologies (RDF, URI, SPARQL) Linked Open Data  Various data sets contains numerical and statistical values  Human, Company, Biological, Medical, Music, Weather, …  295 data sets, 31 billion triples, 5.4 hundreds links1 4 1: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
  5. 5. Its necessary to apply analytical operation to numerical and statistical data that are published as a Linked Data. 5
  6. 6. OLAP(On-Line Analytical Processing) An analytical method for huge accumulated data  It can answer to complex and statistical queries Category OS Mac Win PC Laptop Multi-dimensional model Desktop Aomori  Data cube 32 686 128 East Sendai 8 2  Numerical values (facts) 100 64 Tokyo 386  Axis for analysis (dimensions) Place 686 386 8 32  hierarchical structure Osaka 128 2 West Hiro- 4 16 shima 386 16 64 Fuku- 8 oka 64 32 686 Q1 Q2 128 Q3 f-half Q4 l-half Time 6 図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html
  7. 7. OLAP with Relational DatabaseUse data that is stored in Relational DatabaseStar Schema Category OS Mac Win  Numerical values (fact table) PC Laptop Desktop  Axis for analysis (dimension table) Aomori 32 686 Dim. 128 Category 東 Sendai 8 2 ID 100 64 Tokyo 386 large category Place 686 386 8 32 small category Osaka 128 2 西 Hiro- 4 16 Fact shima 386 Result 16 64 Dim. Dim. Fuku- 8TIme ID Place oka 64ID category_id ID 32 686 Q1 Q2 128half time_id NS Q3 f-half Q4quarter 店舗_id city name l-half sales volume Time 図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html 7
  8. 8. Objective and problemsObjective  Propose a method applying OLAP to typical numerical or statistical data that‟s published as Linked DataProblems  It‟s difficult to analyze RDF on OLAP directly  Need to convert graph data for analysis on existing OLAP system  Prepare axes and hierarchies for analysis  OLAP needs axis and hierarchies for analysis 8
  9. 9. Related workBenedikt et al1  Proposed a conversion method of Linked Data for OLAP  Uses RDF Data Cube (QB) vocabulary  A RDF vocabulary for describing a Data Cube in RDF  Convert Linked Data that follows QB Voc. to RDB  And hierarchical structures are written in RDF  However, there were only a few Linked Data sets that follows QB Voc.1. B. Kampgen, and A. Harth. Transforming Statistical Linked Data for Use in OLAP Systems,I-SEMANTICS 2011, 7th Int. Conf. on Semantic Systems, 2011. 9
  10. 10. In this research, we propose a methodto apply OLAP to TYPICAL Linked Data 10
  11. 11. Approaches Mapping RDF (graph structure) to relational schema Create hierarchical structure from data and data links, semi-automatically (using features of RDF, Linked Data) 11
  12. 12. From retrieval of data to schema creation1. Retrieval of RDF Data 2. Store RDF Data to RDB 3. Selection of analysis target 4. Create dimension table 5. Create schema for OLAP 12
  13. 13. 1. RDFデータの取得1. Retrieval of RDF Data 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選択 4. 次元表の作成Retrieve RDF Data for analysis 5. スキーマの導出  Use RDF dump  Refer with URI  Retrieve resources in the same host recursively  Retrieve outside resources are used as Object tooResources for analysis(www.example.com) Outside resources(www.w3.or 13
  14. 14. 1. RDFデータの取得2. Store RDF Data to RDB (1/2) 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選 4. 次元表の作成Store triples to RDB in each “rdf:type” 5. スキーマの導出  We can‟t decide table schema  RDF must not use schemas  store as vertical representation 1. DJ Abadiら,2007.Tsukuba computer “190,000” category volum store result_1 time_1 “Result” table(vertical) time subject predicate object type result_1 time time_1 resourc result_2 e time result_1 category computer resourc time_2 e result_1 volume “190,000” literal volumerdf:type result_1 store Tsukuba resourc “4,000” e Result Store Mito result_2 time time_2 resourc 14 e
  15. 15. 1. RDFデータの取得2. Store RDF Data to RDB (2/2) 2. RDFを関係データベース 3. ユーザーによる分析対 4. 次元表の作成Convert vertical table to horizontal one 5. スキーマの導出  we can know what attributes or properties there are“Sales result” table(vertical)subject predicate object type Sales resultresult_1 time time_1 resourc subject[PK] e time[FK]result_1 category computer resourc category[FK] e volumeresult_1 volume “190,000” literal store[FK]result_1 store Tsukuba resourc got schema eresult_2 time time_2 resourc PK: primary key e FK: foreign keyresult_2“Sales volume “4,000” volume” table(horizontal) literalresult_2subject[PK] store time[FK] Mito category[FK] resourc volume store[FK] eresult_1 time_1 computer “190,000” Tsukubaresult_2 time_2 null “4,000” Mito 15
  16. 16. 1. RDFデータの取得 3. Selection of analysis target 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選 4. 次元表の作成User chooses which value will be 5. スキーマの導出 targeted in the analytical operation Stock Category Sales volume subject subject subject product_name nameTime category_ID category_IDsubject time_ID store_IDdate store_ID Store stock numberstime volume subject location_ID gn:location subject Visitor name subject store_ID time_ID visitor counter 16
  17. 17. 1. RDFデータの取得 4. Create dimension table 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選 4. 次元表の作成 Create dimension table for OLAP 5. スキーマの導出1  We propose 3 creation methods 2Use literal that is written in data directly Use layer structure that is written in data directly Fact category Sales volume subject subject Time name 3 category_ID subject time_ID Use layer structure date from outside data store_ID time volume store gn:location subject subject location_ID name Blue: outside resources 17
  18. 18. 3 Use layer structure from outside data Use other datasets that can use layer structure outside datasets (resources) target dataset Tsukuba Japan Mito Ibaraki layer structure e.g.)GeoNames  Can use geologically layered structure  Tsukuba / Ibaraki / Japan / Asia 18
  19. 19. 1. RDFデータの取得5. Create schema for OLAP 2. RDFを関係データベースへ 3. ユーザーによる分析対象 4. 概念階層の作成e.g.) A case “sales volume” was selected as 5. スキーマの導出 a measure  fact-table)Results  dim.-table)Time,Category,Store-gn:place Dim. Category Dim. Time Fact subject Results subject name subject hour category_subject Store-gn:place day time_subject subject month store_subject L1(district) quater vlume L2(prefecture) year L3(country) L4(continent) Dim. 19
  20. 20. ExperimentsObjective  Convert data and create a schema for OLAP from numerical/statistical data that is published as Linked DataExp. 1) Radiation observation data  National Radioactivity Stat as Linked Data1  A dataset from 環境放射能水準調査 by 文部科学省  文部科学省発行の環境放射能水準調査2をRDF化したデータセットExp. 2) Weather observation data  Linked Sensor/Observation Data3  Observatory meta data from more than 200,000 places.  Hurricane observation data from these observatories 1 http://www.kanzaki.com/works/2011/stat/ra/ 2 http://radioactivity.mext.go.jp/ja/monitoring_by_prefecture/ 20 3 http://wiki.knoesis.org/index.php/LinkedSensorData
  21. 21. Exp. 1)Results(1/2) Got radiation observed data by crawling  Num. of Triples: 1,003,410 (March-Dec. in 2011)  Geo. info.: Use dumped data from GeoNames Candidates of measure  Observation instance, Time, Location Obs. Instance Value rdf:value ra:20110315/p02/t20 “0.040” ev:place ev:time Location gn:2111833 Time tl:at time:20110315T22PT1H “2011-04-14T00:00:00”^^xsd:dateTime“ra” は “http://www.kanzaki.com/works/2011/stat/ra/”,”time” は “http://www.kanzaki.com/works/2011/stat/dim/d/”“gn” は “http://sws.geonames.org/”,”ev” は “http://purl.org/NET/c4dm/event.owl#” 21”tl” は “http://purl.org/NET/c4dm/timeline.owl#”,”rdf” は “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
  22. 22. Exp. 1)Results(2/2) Created schema when “value” was used as a measure  fact-table)integrated table that contains obs. value  dim.-table) 1. Time(created hierarchy structure from obs. time) 2. Location(got hierarchy structure from GeoNames)Dim. Fact Dim.Time observation_instance Locationsubject subject subjectsec place_subject layer_1(district)min time_subject layer_2(prefecture)hour value layer_3(country)day layer_4(continent)month (observed) value was used as string typeyear  It was needed to have the data type completed by the user 22
  23. 23. Exp. 2)Results(1/2)Used a dumped data  Hurricane Bill (17-22nd/Sept./2009)  Num. of triples: 231,021,108 / count of obs.: 21,272,790  Geo info.: used a dumped data of GeoNamesCandidates of measures  time, value, coordinate (lat., long.)Created a schema when “Value” was used as a measure  fact table: integrated table(obs. instance and value)  dimension table: 1. observed time (hierarchy structure got from time) 2. observatory (hierarchy structure got from GeoNames) 23
  24. 24. “-121.6736"^^xsd:float “45.0397”^^xsd:float gn:Feature wgs48:latwgs48:long “3780"^^xsd:float om-owl:hasLocation wgs48:alt wgs84:Point om-owl:LocatedNearRel om-owl:processLocation om-owl:hasLocatedNearRel om-owl:System Observatory om-owl:procedure Linked Sensor Data Linked Observation Data om-owl:generatedObservation om-owl:result Observed value om-owl:MeasureData instantOfObservation om-owl:floatValue Observation instance “81.3” ^^xsd:float om-owl:samplingTime time:inXSDDateTime “2004-08-10T16:10:00-06:00” time:Instant ^^xsd:dateTime Observation time “wgs84” は “http://www.w3.org/2003/01/geo/wgs84_pos#” の略 “time” は “http://www.w3.org/2006/time#” の略 “gn” は “http://www.geonames.org/ontology#” の略 “om-owl” は “http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#” の略 24 “xsd” は “http://www.w3.org/2001/XMLSchema#” の略
  25. 25. Exp. 2)Results(2/2) Created schema Dim. System_LocatedNearRel Dim. (観測所-周辺-その他) FactInstant (Time) Observation_MeasureData subjectsubject subject ID(Name)sec procedure (system_subject) Source URImin samplingTime (Instant_subject) wgs84:althour wgs84:lat floatValueday wgs84:longmonth layer_1(town)year layer_2(district) layer_3(prefecture) layer_4(country) Problem layer_5(continent)  disappeared ontology (couldn‟t access)  “weather:” ontology* * http://knoesis.wright.edu/ssw/ont/weather.owl# 25
  26. 26. ConclusionProposed a method to apply OLAP to typical Linked Data (numerical and statistical data)  processing with features of RDF and Linked Data  get layered structure from inside and outside of the data  and created hierarchy (dimension-table) for OLAPApplied method to two observation data  convert data and prepare some axes for analysis when target is chose  create star schema for OLAP 26
  27. 27. Future work Revise method  The triple has many objects are described in the same predicate case  Subject has many “rdf:type”s case  Subject has no “rdf:type” case In order to analyze more datasets  Handling a lot of ontologies  Provide a mechanism to use outside resources  Definition of layered structure are different each Apply and verify a lot of kinds of data in other regions 27
  28. 28. Thank you for your attention! 28

×