The document proposes a solution called iBioSearch that aims to provide a unified search interface for biologists to search over 1000 biological databases. It does this by first collecting interfaces from various biological databases and then reverse engineering them to generate a global schema (metamodel) that represents common search capabilities across interfaces. It maps each interface as an instance of this metamodel to extract search entities and criteria. It then clusters entities and consolidates criteria to generate a non-redundant global biological search interface (GBWS) for biologists. Future work involves testing this approach with biologists and expanding the methodology.
CNIC Information System with Pakdata Cf In Pakistan
iBioSearch: The Integrated Biological Database Search
1. iBioSearch: The Integrated Biological Database Search
Ritu Khare and Yuan An
PROBLEM
Presence, of a very large number of biological Web databases and
their interfaces, makes it difficult for biologists to search for any
biological entity (See Fig. 1). Currently, the only option biologists
have is to search each of these numerous interfaces individually.
WI Metamodel: We observe that all input Web Interfaces (WIs) have an
underlying global model. We created this global model manually and termed
it as the "WI Metamodel". See Fig. 2.
WI: Every Web Interface (WI) can be represented as an instance of the
metamodel.
Fig. 1: Problem - biologist
searching for an entity
META-SEARCH
INTERFACE
GENERATION OF
GLOBAL
BIOLOGICAL WI
SCHEMA
RE
VE
RS
CLUSTERING
SEARCH ENTITIES
AND LABELS
FUTURE WORK
EE
INE
NG
In future, we intend to dynamically update biological databases
repository, maintain semantic mappings when base
databases evolve, translate user queries, and consolidate,
reconcile, and rank the query results using data cleansing and
relevance computing algorithms. In addition to this, our plan
includes performing usability testing of iBioSearch system with
the help of biologists.
ER
MAPPING WI
WITH
METAMODEL
WI MetaModel
ING
We aim to provide a unified search interface with capability of
searching multiple (1000+) biological databases. This interface
would be a representation of the biological search interface
ontology. For finding the global search ontology, we take a novel
approach of reverse engineering individual search interface into a
conceptual model, and then finding an integrated model that would
be consistent with all the interfaces up to a level of significance.
HYPOTHESIS & ASSUMPTIONS
Fig.2: WI Metamodel
www.ischool.drexel.edu
INFORMATION
RETRIEVAL
INFORMATION
EXTRACTION
OUR SOLUTION
OLDB
OLDB
OLDB
The GBWS or ontology could be represented as a meta-search
interface for biologists wherein they can search for most of the
biological entities on several search criteria available on
different databases.
Eventually, we aim to find the answers to other research
questions such as:
1. Differences between commercial and biological databases.
2. Automatic identification of biological search interfaces.
3. Reverse Engineering of a WI into an ER diagram.
4. Integration of multiple ER diagrams
5. Extracting relationships between biological search entities.
METHODOLOGY
Which interface to search?
Which database to access?
What all search criteria do I have?
How many sources to consider?
CURRENT AND PREDICTED RESULTS
OLDB
OLDB
Fig. 3: Methodology
REFERENCES
1. Web Interface (Wis) Collection: Collect WIs to biological databases.
2. Information Extraction: For each WI, extract attributes corresponding to
the WI metamodel. Broadly, a WI can be represented as a collection of
search entities and their respective labels (search criteria).
3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate
the instances of the metamodel. Then, we have a list of search entities and
their respective criteria (labels). For a given search entity Si , there will be
label set (li1, li2, li3,…, lim).
4. Clustering: Find non-overlapping classes of search entities representing
synonyms, and for each class, find a list of non-redundant labels.
5. Generation of GBWS: Eventually, we generate another conceptual model
that we call as a “Global Biological WI Schema“ (GBWS). It would represent
all possible input WIs in a non-redundant manner, and capture matchings
between individual instances of the WI metamodel.
1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from
web pages. Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data , San Diego, California. 337-348.
2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing
a directory of molecular biology databases. Proceedings of the
International Workshop on Data Integration in the Life Sciences 2007
(DILS), Philadelphia, PA.
3. He, B., & Chang, K. C. (2003). Statistical schema matching across web
query interfaces. 2003 ACM SIGMOD International Conference on
Management of Data , San Diego, Californi. 217-228.
4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based
schema matching for web databases by domain-specific query probing.
Thirtieth International Conference on very Large Data Bases, 30, 408 419.