Towars a virtual laboratory over biodataspaces


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good afternoon, my name is Gabriela Montiel Moreno and I will present our work in the autonomic management of dataspaces.
  • Actually, the increment of information have been impulsed thanks to new web technologies. Scientifics organizations have obtained benefits from these great information production. This information is stored in distributed resources and is presented in different formats like images, documents, web resources and applications. However, they lack of a convenient and integrated form to manage these resources according to their necessities. To cover such necessities, dataspace has arise as a dynamic environment that allows to publish, to access and to manage a set of heterogeneous resources. The objective of a dataspace is to provide functionalities over the resources without regarding of their format and metadata. A dataspace is fed when new resources are subscribed and when new information is generated through the dataspace exploitation. Dataspace is used by scientific community. Scientists are only interested in obtaining information from certain resources from the dataspace. This information is used to execute informatics processes and study scientific problems. For example, consider a biologist who presents a study case: he wants to analyze the behavior and possible side effects of Vinerolbine drug used in the treatment of breast cancer for women from 35 to 60 years old. Nowadays, the biologist must analyze manually every resource from the dataspace to determine relevant information. Relevant resources found from analysis define a subspace. Analysis process becomes long and complex when the number of resources increases. Also, resources described under different terms may be ignored by the biologists. Also, scientists may explore the same problem using different subspaces which can be stored in the dataspace. Using different subspaces biologists can have different perspectives from the same problem. Using stored subspaces, new ones can be generated reducing the computational time. For this reason it is necessary to build a mechanism that automatically identifies relevant resources and generate an integrated subspace satisfying the requirements.
  • Subspaces must be adaptable to the evolution of the dataspace and the user’s requirements. When new resources are produced in the dataspace, the mechanism must identify the potential subspace where the resources could be useful. Then, it must execute operations to include the new resources. Also, when researchers modify their requirements, the mechanism must identify all related subspaces. Later, it can make operations between subspaces like fusion, merge, and split. Finally, it is necessary that the mechanism materialize subspaces continuously accessed by a group of experts. Consequently, a scientific community can visualize all problems treated in the dataspace and their associated subspaces. This promotes information sharing and collaboration between organizations. We called the mechanism that provides the described functionalities a view dataspace management. It provides to users the illusion of counting with a virtual system that integrates all distributed information and provides operations to enrich the information contained in the dataspace.
  • As I mentioned before, resources in a dataspace are presented under different formats and structure. In order to propose a first organization of heterogeneous resource, we suppose that we count with an indexation of resources according to their content. This indexation is defined under a vocabulary representing a knowledge domain like Biology. We are supported in a knowledge representation formalism, the description logic SHIQ-D to represent resources and relate them with a set of terms. The dataspace vocabulary is enriched when new resources are published, when new knowledge is generated or when scientists specify or modify their requirements. Scientist communities use their own vocabulary to express their requirements to study certain problem. Information in the dataspace is organized in views. A view is the representation of a scientific problem defined semantically by a set of terms, and it is associated with a subspace. Researchers define the semantic of their problems using a set of terms in their vocabulary. The dataspace view manager determines the appropriate semantic mappings required to indentify relevant resources. This way, we provide transparent access to resources.
  • Due to the heterogeneity of resources, integration within dataspaces is dealt under two perspectives: indexation of resources and integration pay-as-you-go. The indexation of resources consists on defining indexing resources according to the data items they contain (attributes, associations). Also they define a hierarchy of index structures to support complex query processing. Pay-as-you-go integration organizes resources according to a set of metadata schemas organized by topic. They define approximate mappings in order to provide an approximate integration of pertinent resources. For this reason, results obtained from pay-as-you-go are ranked according to the assurance of approximate mappings. Pay-as-you-go integration supports basic keyword queries over resources and a refinement of queries over time when more information about the sources is obtained.
  • Because the dataspace change frequently, views must be autonomically adapted. We inspire in autonomic computing to attack this problem. Autonomic computing involves computing systems that can manage themselves. This is known as self-management and involves the following aspects: Currently we are working in the self-configuration of views according to the evolution of the dataspace. The other three aspects are not analyzed under this presentation.
  • Given a view whose semantic defined by a set of concepts and is composed by a set of resources, the objective is to define strategies for autonomically managing views according to the evolution of the dataspace. In order to achieve this objective it is necessary to define a set of operators between materialized views, execute a continuous monitoring of events within a dataspace and users, and the execution of predefined rules respect to the presence of events in a dataspace.
  • Next, I will present our approach towards the autonomic management of resources within a dataspaces through the definition and administration of views according to the evolution of the dataspace.
  • Suppose that we count with a dataspace composed by distributed and heterogeneous resources indexed respect to a vocabulary. Users define their own vocabulary to express their requirements. This is processed by a dataspace view manager which exploits the users vocabulary and the resources indexation to produce, maintain and delete integrated views of resources. A view manager is composed by three main components: a monitor which continuously observes the evolution of the dataspace (insertion, update, and removal of resources) and the evolution of user’s requirements (modification). The evaluator receives the event retrieved by the monitor and analyze if the conditions related to this event are satisfied to determine the operations to be executed.The executor executes the operators over the correspondient views accordingo to the satisfied rule identified after evaluation. View management considers three main operations: definition of new views, materialization of views and update of views. When a researcher defines a new problem in the dataspace by specifying his/her requirements, the view manager identifies the type of event produced and send the event to the evaluator, the evaluator verifies the rules related to these type of event and identifies that a new view has to be generated. According to these rule, the executor analyzes the indexing structure identifying the appropiate mappings between the vocabulary representing the requirements of the user and the vocabulary from the content of relevant resources. Then, the system generates a new view of resources in the dataspace. When new pairs of resources are published in the dataspace, the monitor detect these events and sends them to the evaluator in order to verify the rules satisfied by these events. The evaluator identifies the set of operations the executer must achieve. The executer then must index the resources respect to the vocabular used to describe its content. When a resource uses terms not defined in the indexation, these terms must be incorporated and the mappings with new resources and existing ones must be generated. Once the system has defined the appropiate indexation it is necessary to identify the views related to the terms describing the content of the new resources, and incorporate them to the integrated view. When the requirements of researchers are modified, for example a new terminolgy is incorporated, the monitor must detect this event and evaluate the rules associated to this type of event. If rules are satisfied, the evaluator identifies the set of operations the executer must execute. The executer then identifies the view associated to the previous requirements and identifies the resources related to the new terms used by the researcher. Then, it incorporates the new resources into the predefined view and identifies if there are other views that can be merged with the one modified. If there are views, pertinent operations are executed. Operations are merge, join, and split. By the other hand, the system monitors the queries executed over the dataspace to determine when to materialize or not views. Views are materialized when similar conditions are presented over time and when researchers explicitly specify the materialization of views.
  • Finally, I will present our conclusions and the future work related to the autonomic management of a dataspace.
  • During this presentation we described our proposal for the management of resources within a dataspace according to the requirement of a group of experts. Our approach proposes a first attempt to have a resource integration by defining an indexing over resources according to their content using an specific vocabulary representing a knowledge domain. Also we propose a representation of scientific problems using a vocabulary. We define a dataspace view manager that define the corresponding mappings to relate scientific problems and resources without to conform the researcher of the use of certain concepts to relate them. The view manager develops techniques for query processing based on the concept of view to provide transparent access to resources and to execute operations over views respect to the events presented in the dataspace and the user’s requirements. We propose the use of knowledge representation as an efficient model to index resources respect to their content under an specific knowledge domain. This indexation promotes the integration data and resources according to an specification. We use reasoning mechanisms which facilitates the inference to both retrieve explicit knowledge and automatically discover implicit knowledge. Consequently, it is possible to exploit the knowledge associated to resources.
  • Future work relies on the construction of a view management mechanism automatically adapted to the evolution of the dataspace. In order to complete this task, it is necessary to define an algebra of operators to be executed between materialized views. Also, it is necessary to define view management rules according to the presence of certain events over the dataspace or the user’s requirements. We have not extensively used inference mechanisms for view management and query processing. This would reduce the computational cost related to the management of greats amounts of data. Also, It is necessary to define techniques to automatically proof the consistency of terms according to certain vocabulary. This can be achieved through the construction of a knowledge base that contains a set of rules describing the principles of the domain. Finally, it is important to consider that reasoning is computational expensive specially when we count with great amounts of information. For this reason, we want to explore possible optimization strategies to reduce this computational costs.
  • Thank you for you attention and I’m ready for your questions.
  • Towars a virtual laboratory over biodataspaces

    1. 1. Towards a virtual laboratory over bio-dataspaces Gabriela Montiel Moreno Research Center of Information and Automation Technologies, CENTIA, UDLAP French-Mexican Laboratory of Computer Science and Automatic Control, LAFMIA UMI 3175
    2. 2. Context and motivation Analize th e b e h avior and p os s ib le s id e e ffe cts of th e Vinorelbine in th e tre atm e nt of breas t cancer in women from 35 to 60 ye ars old .2
    3. 3. Context and motivation View dataspace manager3
    4. 4. Viewing the dataspaceUser’s vocabularyA view is the representation of a problem defined by a set of conceptsViewsand associated to a set of resources in the dataspaceDataspace vocabulary 4
    5. 5. Related works Dataspaces •Indexing of data items contained in Indexation of resources resources (attributes, associations). •Definition of hierarchy in the index [Dong, Halevy] structure. Resources Integration •Repositories of metadata organized Pay-as-you-go by topic. •Definition of approximate mappings. [Madhavan et Al, Maier •Keyword queries supporting query and Rayner] refinement. •Heterogeneous results are ranked.5
    6. 6. Related works Autonomic Computing •Configure themselves automatically in accordance to high-level policies, representing business-level objectives. Self-configure •When a new component is introduced, it will incorporate itself seamlessly, and the rest will adapt to its presence. •Continually seek ways to improve their operation, identifying and Self-optimization opportunities to become more efficient in performance and cost. •Detect, diagnose and repair localized problems resulting from Self-healing bugs or failures in software and hardware. •Defend the system as a whole against large-scale, correlated problems from malicious attacks or cascading failures. Self-protection •Anticipate problems based on early reports form sensors and take steps to avoid or migrate them.6
    7. 7. Objective Given a view V= (C, R) defined by a set of concepts C={c1, c2, …, cn} and composed by a set of resources R= {r1, r2, …, rn}: Define strategies for autonomically managing views according to the evolution of the dataspace.  Operators between materialized views.  Monitoring of events within a dataspace and users.  Execution of predefined rules respect to the presence of events in a dataspace. 7
    8. 8. Roadmap Context and motivation  Resources integration  Objective Towards autonomic dataspace management Conclusions and future work8
    9. 9. Towards autonomic dataspace management Monitor Evaluator View Manager Executer9
    10. 10. Execution concerns Homogeneous representation of the dataspace. Vocabulary defined by experts communities used to express requirements. Views adapted to the requirements of researchers to study problems: Relations between experts vocabulary and dataspace vocabulary. Execution of views operations: Determined respect to events in the dataspace and user’s requirements. Executed according to the current occupation of the involved views. Materialization of views respect to the dataspace storage capabilities. 10
    11. 11. Roadmap Context and motivation  Resources integration  Objective Towards autonomic dataspace management Conclusions and future work11
    12. 12. Conclusions Semantic exploitation of resources  Resource content, knowledge domain, problems  Semantic correspondences: resources-problems Query processing for transparent access to resources  Adapted to user’s requirements  Modified respect to evolution of the dataspace Knowledge representation and reasoning facilitates resource exploitation  Inference of implicit knowledge  Resource exploitation by knowledge associated 12
    13. 13. Future Work Construction of view management mechanism automatically adapted to the evolution of the dataspace:  Specification of operations between materialized views.  Definition of view management rules for materialization of views and execution of operations. Verification of the consistency of terms describing the content of resources according to certain domain (e.g. Biology). Optimization of access to resources 13
    14. 14. Thank You!14