iBioSearch: The Integrated Biological Database Search

iBioSearch: The Integrated Biological Database Search
Ritu Khare and Yuan An
PROBLEM
Presence, of a very large number of biological Web databases and
their interfaces, makes it difficult for biologists to search for any
biological entity (See Fig. 1). Currently, the only option biologists
have is to search each of these numerous interfaces individually.

WI Metamodel: We observe that all input Web Interfaces (WIs) have an
underlying global model. We created this global model manually and termed
it as the "WI Metamodel". See Fig. 2.
WI: Every Web Interface (WI) can be represented as an instance of the
metamodel.

Fig. 1: Problem - biologist
searching for an entity

META-SEARCH
INTERFACE

GENERATION OF
GLOBAL
BIOLOGICAL WI
SCHEMA

RE
VE
RS

CLUSTERING
SEARCH ENTITIES
AND LABELS

FUTURE WORK

EE
INE
NG

In future, we intend to dynamically update biological databases
repository, maintain semantic mappings when base
databases evolve, translate user queries, and consolidate,
reconcile, and rank the query results using data cleansing and
relevance computing algorithms. In addition to this, our plan
includes performing usability testing of iBioSearch system with
the help of biologists.

ER

MAPPING WI
WITH
METAMODEL

WI MetaModel

ING

We aim to provide a unified search interface with capability of
searching multiple (1000+) biological databases. This interface
would be a representation of the biological search interface
ontology. For finding the global search ontology, we take a novel
approach of reverse engineering individual search interface into a
conceptual model, and then finding an integrated model that would
be consistent with all the interfaces up to a level of significance.

HYPOTHESIS & ASSUMPTIONS

Fig.2: WI Metamodel

www.ischool.drexel.edu

INFORMATION
RETRIEVAL

INFORMATION
EXTRACTION

OUR SOLUTION

OLDB

OLDB

OLDB

The GBWS or ontology could be represented as a meta-search
interface for biologists wherein they can search for most of the
biological entities on several search criteria available on
different databases.
Eventually, we aim to find the answers to other research
questions such as:
1. Differences between commercial and biological databases.
2. Automatic identification of biological search interfaces.
3. Reverse Engineering of a WI into an ER diagram.
4. Integration of multiple ER diagrams
5. Extracting relationships between biological search entities.

METHODOLOGY
Which interface to search?
Which database to access?
What all search criteria do I have?
How many sources to consider?

CURRENT AND PREDICTED RESULTS

OLDB

OLDB

Fig. 3: Methodology

REFERENCES
1. Web Interface (Wis) Collection: Collect WIs to biological databases.
2. Information Extraction: For each WI, extract attributes corresponding to
the WI metamodel. Broadly, a WI can be represented as a collection of
search entities and their respective labels (search criteria).
3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate
the instances of the metamodel. Then, we have a list of search entities and
their respective criteria (labels). For a given search entity Si , there will be
label set (li1, li2, li3,…, lim).
4. Clustering: Find non-overlapping classes of search entities representing
synonyms, and for each class, find a list of non-redundant labels.
5. Generation of GBWS: Eventually, we generate another conceptual model
that we call as a “Global Biological WI Schema“ (GBWS). It would represent
all possible input WIs in a non-redundant manner, and capture matchings
between individual instances of the WI metamodel.

1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from
web pages. Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data , San Diego, California. 337-348.
2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing
a directory of molecular biology databases. Proceedings of the
International Workshop on Data Integration in the Life Sciences 2007
(DILS), Philadelphia, PA.
3. He, B., & Chang, K. C. (2003). Statistical schema matching across web
query interfaces. 2003 ACM SIGMOD International Conference on
Management of Data , San Diego, Californi. 217-228.
4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based
schema matching for web databases by domain-specific query probing.
Thirtieth International Conference on very Large Data Bases, 30, 408 419.

iBioSearch: The Integrated Biological Database Search

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to iBioSearch: The Integrated Biological Database Search

Similar to iBioSearch: The Integrated Biological Database Search (20)

Recently uploaded

Recently uploaded (20)

iBioSearch: The Integrated Biological Database Search