Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Inquiry Optimization Technique for a Topic Map Database

1,294 views

Published on

In this paper the inquiry optimization technique for a topic map database is presented.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Inquiry Optimization Technique for a Topic Map Database

  1. 1. Inquiry Optimization Technique for a Topic Map Database Yuki Kuribara (Graduate School of Engineering, Shibaura Institute of Technology) Masaomi Kimura (Information Engineering, Shibaura Institute of Technology)
  2. 2. Contents  Background  Research contents  Experimental  Conclusion 2 Data Engineering Lab 2010/10/6
  3. 3. Topic maps  Recently, many kinds of topic maps are created  For web portal site  For application development… and so on  When we target the large topic maps, we need to construct databases for them  since databases can deal with the data larger than the size of physical memory Out of memory On memory 3 Data Engineering Lab 2010/10/6
  4. 4. The role of database  Database systems should take responsibility for managing information of topic maps  Query optimization  Transaction management  Physical data structure hiding Query optimization query Physical data information Transaction structure of topic map management hiding Database system 4 Data Engineering Lab 2010/10/6
  5. 5. The physical data model for databases We propose to utilize the object oriented model for the databases  There are several options of data models for the databases  A relational model (table) and an object oriented model are mainly used in topic map databases  When we crawl on the topic map to retrieve information, an object oriented model needs not to join tables multiple times unlike a relational model A relational model An object oriented model Object A Object B 5 Data Engineering Lab 2010/10/6
  6. 6. The logical data model for databases  We assumed the topic map data structure defined by the topic maps data model (TMDM)  since topic maps should follow TMDM!!  The data model consists of seven types of information items and 19 types of named properties  We implemented these items as classes, whose instance have reference relationships to other corresponding information item objects Association 0..* 1 TopicMap +associations +parent +parent 1 1 +parent +roles 0..* 0..* +topics AssociationRole 0..* 1 Topic 1 0..* TopicName +roles +player +parent +topicNames 6 Data Engineering Lab 2010/10/6
  7. 7. The possibility of plural retrieval routes The database systems need to select most suitable retrieval route (Query optimization)  When we retrieve the information of topic map, there may be more than one way to retrieve the same objects  We can retrieve objects efficiently by searching method Association 0..* 1 TopicMap +associations +parent +parent 1 1 +parent +roles 0..* 0..* +topics AssociationRole 0..* 1 Topic 1 0..* TopicName +roles +player +parent +topicNames 7 Data Engineering Lab 2010/10/6
  8. 8. Query optimization The database should take responsibility for query optimization  Database systems need to estimate the suitable execution plan  the database system may take very long retrieval time without the query optimization  Though there are some topic map database systems, they seem not to take the optimization into consideration 8 Data Engineering Lab 2010/10/6
  9. 9. Objective We propose the optimization technique based on the estimation of execution cost  In this presentation, we focus on retrieval of topic objects that are referred by a specific association with a particular topic  e.g.) we want to know that what Conan Doyle write? Intended topic A particular topic Specified in the query A study in Scarlet Conan Doyle write A specific association 9 Data Engineering Lab 2010/10/6
  10. 10. Retrieval plan - the association route  e.g.) What did Conan Doyle write? We search the association objects ‘write’ 1 A study in write Conan Scarlet Doyle 2 2 We find the intended We search the topic object topic objects ‘Conan Doyle’ 10 Data Engineering Lab 2010/10/6
  11. 11. Retrieval plan - the topic route  e.g.) What did Conan Doyle write? We search the topic object ‘Conan Doyle’ 3 1 2 A study in Conan Doyle write Scarlet We find intended We again search the association objects ‘write’ topics referred by the association role objects 11 Data Engineering Lab 2010/10/6
  12. 12. Estimation of execution cost We define the estimation formulae for the retrieval cost of each plan  Systems have to choose the most suitable plan  It is necessary to define the cost which can effectively estimate the retrieval time (cost estimation) cost : 10 query Route A information Route B of topic map cost : 100 12 Data Engineering Lab 2010/10/6
  13. 13. Cost of objects - definition of cost  We measured the total execution time and the retrieval time of objects  The object retrieval time dominates the processing time more than 99%  It is enough to measure the time to retrieve objects to evaluate the cost of query processing Execution time of retrieval Retrieval time Retrieval time of Execution Time The ratio of object of objects (B) objects : (A) (nano sec) retrieval time (B/A) (nano sec) More than 99% Association 6.025×108 5.991×108 99.44 (%) Route Topic Other time : 1.035×108 1.033×10 8 99.81 (%) Less than 1% Route 13 Data Engineering Lab 2010/10/6
  14. 14. Cost estimation formula for the association route We need to retrieve all associations since multiple associations may have A study in Conan 1 the same name Scarlet Doyle write Cassoc_ route  Ca  N  2Car  Ct  N 2 2 1 Q 2 The cost is doubled since we retrieve We approximate the number of two topics both sides of the association associations with the specified name by the average number of associations per their unique name 14 Data Engineering Lab 2010/10/6
  15. 15. Cost estimation formula for the topic route The average times of topic retrieval 3 1 ( note that each topic must have a A study in Conan unique name ) 2 Scarlet Doyle write Ctopic_ route  Ct   Car  Ca   Car  M 2N 2N 2 M MQ 1 2 3 The average number of associations The average number of associations per that have the name specified by the topic query 15 Data Engineering Lab 2010/10/6
  16. 16. Experiment  In order to demonstrate our method, we applied our technique to TOME  TOME is a prototype topic map database developed by authors  As target topic maps, we selected following two that have different sizes  Rampo Edogawa* topic map  # of topics:29 (his name, his works and his hometown)  # of associations:15 (his works and his hometown)  Pokemon topic map  # of topics:174 (Pokemon names and their attributes)  # of associations:432 (evolutional and attribute relationships) *Rampo Edogawa is a famous mystery story writer in Japan. 16 Data Engineering Lab 2010/10/6
  17. 17. Evaluation of cost estimation formulae  In order to evaluate our cost estimation formulae, we measured the execution time of a query and compared the tendency of the value of cost We can see the tendencies : the less estimated costs are, the short the execution time is The average time of query execution The evalueated cost for each query (nano sec) execution plan Topic Maps The association The association The topic route The topic route route route Rampo Edogawa Topic Map 31 < 157 133.2 < 164.0 Pokemon Topic Map 297 > 31 2533 > 697.7 17 Data Engineering Lab 2010/10/6
  18. 18. Conclusion  We proposed the optimization technique based on the estimation of execution cost  We showed that there are possibly more than one way to retrieve the same objects  We defined the cost estimation formulae for the retrieval cost of each plan  We estimated our optimization technique  The result of our experiment shows that we can see a proportional tendency of the retrieval time and the object size  We can also see the tendencies that estimated costs are small in the case that the execution time is short 18 Data Engineering Lab 2010/10/6
  19. 19. Thank you for your kind attention 19 Data Engineering Lab 2010/10/6
  20. 20. The effect of buffers  If the objects existing on the memory are required to be loaded, a buffer shortens the retrieval time  the cost estimated by the formulae needs to be modified (reduced) because of the effect of buffers  In our target query, there are two cases that the buffer is used : The topic existing on The Sign Conan the memory is loaded of Four Doyle from buffer The topic for association A Study name existing on the in Scarlet memory is also loaded Write from buffer 20 Data Engineering Lab 2010/10/6
  21. 21. The coefficients of buffer  In our target query, we need two coefficients :  For retrieval of topic M  M    r 1   2N  2N  The probability that the topic do not exist on buffer  For retrieval of topic for the association names r : the effective retrieval Q  Q ratio of cost for buffer   r 1   N:the number of N  N association objects The probability that the topic for the M:the number of association names do not exist on topic objects Q:the number of unique buffer association names 21 Data Engineering Lab 2010/10/6
  22. 22. The modified cost estimation formulae  Taking the buffering effect into consideration, we modify the cost estimation formulae into this  The contribution of loading topic name objects is also taken into consideration Cassoc_ route  Ca   Ct  Ctn N  2Car   Ct  Ctn  N Q Ctopic_ route  Ct  Ctn   Car  Ca   Ct  Ctn   Car   Ct  Ctn  M 2N 2N 2 M MQ 22 Data Engineering Lab 2010/10/6
  23. 23. Cost estimation formula for the association route  We define the cost estimation formula as follows C1  Ca   Ct  Ctn N  2Car   Ct  Ctn  N Q Q  Q TMDM permits the redundant existence of    r 1   multiple associations that have the same name N  N Retrieval of M  M  TopicMap objects   r 1   We assume that the association roles are 2N  2N  Retrieval of Retrieval of Topic Retrieval of TopicName uniformly assigned to associationsare defined objects that are defined objects that Association objects N:the number of as the Association name as the Association name association objects M:the number of Retrieval of topic objects AssociationRole objects Q:the number of unique Retrieval of TopicName association names Retrieval of objects that are defined Topic objects as the Topic name 23 Data Engineering Lab 2010/10/6
  24. 24. The accurate cost estimation formula for the association route Cassoc_ route  Ca   Ct  Ctn N  2Car   Ct  Ctn  N Q We have to consider the retrieval cost of We have to consider the retrieval topic and topic cost of topic name objects and name objects and effect of buffer effect of buffer Cassoc_ route  Ca  N  2Car  Ct  N Q Ca: the retrieval cost of association objects Q  Q Car: the retrieval cost of   r 1   N  N association role objects N:the number of association objects Ct: the retrieval cost of M  M  M:the number of topic objects topic objects   r 1   Q:the number of Ctn: the retrieval cost of 2N  2N  unique association names topic name objects 24 Data Engineering Lab 2010/10/6
  25. 25. Cost estimation formula for the topic route  We define the cost estimation formula as follows C2  Ct  Ctn   Car  Ca   Ct  Ctn   Car   Ct  Ctn  M 2N 2N 2 M MQ Retrieval of TopicMap objects TMDM permits the existence of only one topic Retrieval of Retrieval of TopicName objects Topic objects that are defined as the Topic name name that has the same Retrieval of AssociationRole objects Regarding the topic map as a graph, this is equal to the average degree Retrieval of Retrieval of Topic objects that are Retrieval of TopicName objects that Association objects defined as the Association name are defined as the Association name Retrieval of We assume that the association roles are AssociationRole objects uniformly assigned to associations Retrieval of Retrieval of TopicName objects Topic objects that are defined as the Topic name 25 Data Engineering Lab 2010/10/6
  26. 26. The accurate cost estimation formula for the topic route Ctopic_ route  Ct  Ctn   Car  Ca   Ct  Ctn   Car   Ct  Ctn  M 2N 2N 2 M MQ We have to We have to consider We have to consider the the retrieval cost of consider the retrieval cost of topic objects and retrieval cost of topic name topic name objects topic name objects objects and effect of buffer and effect of buffer  Car  Ca   Car  M 2N 2N Ctopic_ route  Ct  2 M MQ Ca: the retrieval cost of association objects Q  Q Car: the retrieval cost of   r 1   N  N association role objects N:the number of association objects Ct: the retrieval cost of M  M  M:the number of topic objects topic objects   r 1   Q:the number of Ctn: the retrieval cost of 2N  2N  unique association names topic name objects 26 Data Engineering Lab 2010/10/6
  27. 27. Result-Cost estimation of an object of each class We can see a similar tendency between the retrieval time and the object size The normalized value The object The normalized value The retrieval time Topic Maps The object name by setting the retrieval time Size by setting the object size (nano sec) to be 1 (byte) to be 1 The retrieval time of topic 969200 3.34 608 4.75 The retrieval time of Rampo topicname 496700 1.71 376 2.94 Edogawa The retrieval time of Topic Map associationrole 289900 1 128 1 The retrieval time of association 562600 1.94 376 2.94 The retrieval time of topic 1053000 5.5 608 4.75 The retrieval time of Pokemon topicname 501600 2.62 376 2.94 Topic Map The retrieval time of associationrole 191400 1 128 1 The retrieval time of association 577700 3.02 376 2.94 27 Data Engineering Lab 2010/10/6
  28. 28. Retrieval cost of each object  We measured the retrieval time and the object size of each object  The result tells us that the retrieval time is almost proportional to the object size  Based on this, we define the cost as an object size scale factor ( the ratio of object size to association role objects) We can see a similar tendency between the retrieval time and the object size The normalized value by setting Topic Maps The object name Object size scale factor the retrieval time to be 1 Topic object 5.5 4.75 Pokemon Topic name object 2.62 2.94 Topic Map Association role object 1 1 Association object 3.02 2.94 28 Data Engineering Lab 2010/10/6
  29. 29. Future perspective  We will apply our method to other topic maps that have much larger size  Our target topic maps are less than 1000 topics  We need to confirm the universality of cost estimate formulae by evaluating of various topic maps  We will develop the mechanism to measure the size of objects in a topic map  Since the size of objects depends on each topic map, we have to measure it to set the value of costs adequate to evaluate execution plan 29 Data Engineering Lab 2010/10/6
  30. 30. Reference  M. Naito:An Introduction to Topic Maps. Tokyo Denki University Press, 2006.  Yuki Kuribara, Takeshi Hosoya, Masaomi Kimura : TOME : The Topic Map Database Extended, 2009  Ontopia:tolog Language tutorial. http://www.ontopia.net/  ISO/IEC JTC1/SC34, Topic Map – Data Model http://www.isotopicmaps.org/sam/sam-model/  Pokemon Topic Map http://www.ontopia.net/omnigator/models/topicmap_complete .jsp?tm=pokemon.ltm  Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/ 30 Data Engineering Lab 2010/10/6

×