Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

iBioSearch: The Integrated Biological Database Search


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

iBioSearch: The Integrated Biological Database Search

  1. 1. iBioSearch: The Integrated Biological Database Search Ritu Khare and Yuan An PROBLEM Presence, of a very large number of biological Web databases and their interfaces, makes it difficult for biologists to search for any biological entity (See Fig. 1). Currently, the only option biologists have is to search each of these numerous interfaces individually. WI Metamodel: We observe that all input Web Interfaces (WIs) have an underlying global model. We created this global model manually and termed it as the "WI Metamodel". See Fig. 2. WI: Every Web Interface (WI) can be represented as an instance of the metamodel. Fig. 1: Problem - biologist searching for an entity META-SEARCH INTERFACE GENERATION OF GLOBAL BIOLOGICAL WI SCHEMA RE VE RS CLUSTERING SEARCH ENTITIES AND LABELS FUTURE WORK EE INE NG In future, we intend to dynamically update biological databases repository, maintain semantic mappings when base databases evolve, translate user queries, and consolidate, reconcile, and rank the query results using data cleansing and relevance computing algorithms. In addition to this, our plan includes performing usability testing of iBioSearch system with the help of biologists. ER MAPPING WI WITH METAMODEL WI MetaModel ING We aim to provide a unified search interface with capability of searching multiple (1000+) biological databases. This interface would be a representation of the biological search interface ontology. For finding the global search ontology, we take a novel approach of reverse engineering individual search interface into a conceptual model, and then finding an integrated model that would be consistent with all the interfaces up to a level of significance. HYPOTHESIS & ASSUMPTIONS Fig.2: WI Metamodel INFORMATION RETRIEVAL INFORMATION EXTRACTION OUR SOLUTION OLDB OLDB OLDB The GBWS or ontology could be represented as a meta-search interface for biologists wherein they can search for most of the biological entities on several search criteria available on different databases. Eventually, we aim to find the answers to other research questions such as: 1. Differences between commercial and biological databases. 2. Automatic identification of biological search interfaces. 3. Reverse Engineering of a WI into an ER diagram. 4. Integration of multiple ER diagrams 5. Extracting relationships between biological search entities. METHODOLOGY Which interface to search? Which database to access? What all search criteria do I have? How many sources to consider? CURRENT AND PREDICTED RESULTS OLDB OLDB Fig. 3: Methodology REFERENCES 1. Web Interface (Wis) Collection: Collect WIs to biological databases. 2. Information Extraction: For each WI, extract attributes corresponding to the WI metamodel. Broadly, a WI can be represented as a collection of search entities and their respective labels (search criteria). 3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate the instances of the metamodel. Then, we have a list of search entities and their respective criteria (labels). For a given search entity Si , there will be label set (li1, li2, li3,…, lim). 4. Clustering: Find non-overlapping classes of search entities representing synonyms, and for each class, find a list of non-redundant labels. 5. Generation of GBWS: Eventually, we generate another conceptual model that we call as a “Global Biological WI Schema“ (GBWS). It would represent all possible input WIs in a non-redundant manner, and capture matchings between individual instances of the WI metamodel. 1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data , San Diego, California. 337-348. 2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing a directory of molecular biology databases. Proceedings of the International Workshop on Data Integration in the Life Sciences 2007 (DILS), Philadelphia, PA. 3. He, B., & Chang, K. C. (2003). Statistical schema matching across web query interfaces. 2003 ACM SIGMOD International Conference on Management of Data , San Diego, Californi. 217-228. 4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based schema matching for web databases by domain-specific query probing. Thirtieth International Conference on very Large Data Bases, 30, 408 419.