We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
There is a wide range of available biological databases developed by bioinformatics experts, employing different methods to extract biological data. In this paper, we investigate and evaluate the performance of some of these methods in terms of their ability to efficiently access bioinformatics databases using webbased interfaces. These methods retrieve bioinformatics information using structured and semi-structured
data tools, which are able to retrieve data from remote database servers. This study distinguishes each of these approaches and contrasts these tools. We used Sequence Retrieval System (SRS) and Entrez search tools for structured data, while Perl and BioPerl search programs were used for semi-structured data to retrieve complex queries including a combination of text and numeric information. The study concludes that the use of semi-structured data tools for accessing bioinformatics databases is a viable alternative to the structured tools, though each method is shown to have certain inherent advantages and disadvantages.
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
There is a wide range of available biological databases developed by bioinformatics experts, employing different methods to extract biological data. In this paper, we investigate and evaluate the performance of some of these methods in terms of their ability to efficiently access bioinformatics databases using webbased interfaces. These methods retrieve bioinformatics information using structured and semi-structured data tools, which are able to retrieve data from remote database servers. This study distinguishes each of these approaches and contrasts these tools. We used Sequence Retrieval System (SRS) and Entrez search tools for structured data, while Perl and BioPerl search programs were used for semi-structured data to retrieve complex queries including a combination of text and numeric information. The study concludes that the use of semi-structured data tools for accessing bioinformatics databases is a viable alternative to the structured tools, though each method is shown to have certain inherent advantages and disadvantages.
The article outlines extensions to the model and algorithm of spyware detection
procedures which, in particular, presents a potential threat to medical information
systems. The approaches and solutions in this paper allow to programmatically
implement binary file segmentation using discrete wavelet transform (WT). Unlike the
existing approaches, the model and algorithm proposed in the article took into
account local extrema of the wavelet coefficients (WC). The described solutions allow
for analysis of potentially dangerous and spyware files the number of bytes, as well as
the entropy for individual segments of files that are transmitted for analysis
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
Latest progress in biology, medical science, bioinformatics, and biotechnology has become important and
tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data
mining research has led to the development of numerous efficient and scalable methods for mining
interesting patterns in large databases. This paper bridge the two fields, data mining and bioinformatics
for successful mining of biological data. Microarrays constitute a new platform which allows the discovery
and characterization of proteins.
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
There is a wide range of available biological databases developed by bioinformatics experts, employing different methods to extract biological data. In this paper, we investigate and evaluate the performance of some of these methods in terms of their ability to efficiently access bioinformatics databases using webbased interfaces. These methods retrieve bioinformatics information using structured and semi-structured
data tools, which are able to retrieve data from remote database servers. This study distinguishes each of these approaches and contrasts these tools. We used Sequence Retrieval System (SRS) and Entrez search tools for structured data, while Perl and BioPerl search programs were used for semi-structured data to retrieve complex queries including a combination of text and numeric information. The study concludes that the use of semi-structured data tools for accessing bioinformatics databases is a viable alternative to the structured tools, though each method is shown to have certain inherent advantages and disadvantages.
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
There is a wide range of available biological databases developed by bioinformatics experts, employing different methods to extract biological data. In this paper, we investigate and evaluate the performance of some of these methods in terms of their ability to efficiently access bioinformatics databases using webbased interfaces. These methods retrieve bioinformatics information using structured and semi-structured data tools, which are able to retrieve data from remote database servers. This study distinguishes each of these approaches and contrasts these tools. We used Sequence Retrieval System (SRS) and Entrez search tools for structured data, while Perl and BioPerl search programs were used for semi-structured data to retrieve complex queries including a combination of text and numeric information. The study concludes that the use of semi-structured data tools for accessing bioinformatics databases is a viable alternative to the structured tools, though each method is shown to have certain inherent advantages and disadvantages.
The article outlines extensions to the model and algorithm of spyware detection
procedures which, in particular, presents a potential threat to medical information
systems. The approaches and solutions in this paper allow to programmatically
implement binary file segmentation using discrete wavelet transform (WT). Unlike the
existing approaches, the model and algorithm proposed in the article took into
account local extrema of the wavelet coefficients (WC). The described solutions allow
for analysis of potentially dangerous and spyware files the number of bytes, as well as
the entropy for individual segments of files that are transmitted for analysis
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
Latest progress in biology, medical science, bioinformatics, and biotechnology has become important and
tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data
mining research has led to the development of numerous efficient and scalable methods for mining
interesting patterns in large databases. This paper bridge the two fields, data mining and bioinformatics
for successful mining of biological data. Microarrays constitute a new platform which allows the discovery
and characterization of proteins.
ABSTRACT
Scientific publications are considered as the most up-to-date resource of ongoing research
activities and scientific knowledge. Efficient practices for accessing biomedical
publications are key to allowing a timely transfer of information from the scientific
research community to peer investigators and other healthcare practitioners. Biomedical
sequence images published within the literature play a central role in life science
discoveries. Whereas advanced text-mining pipelines for information retrieval and
knowledge extraction are now commonplace methodologies for processing documents,
the ongoing challenges associated with knowledge management and utility operations
unique to biomedical image data are only recently gaining recognition. Sequence images
depicting key findings of research papers contain rich information derived from a wide
range of biomedical experiments. Searching for relevant sequence images is however error
prone as images are still opaque to information retrieval and knowledge extraction
engines. Specifically, there is no explicit description or annotation of the sequence image
content. Moreover, traditional biomedical search engines, which search image captions
for relevant keywords only, offer syntactic search mechanisms without regard for the
exact meaning of the query. As proposed in this thesis, semantic enrichment of biomedical
sequence images is a solution which adopts a combination of technologies to harness the
comprehensive information associated with, and contained in, biomedical sequence
images. Extracted information from sequence images is used as seed data to aggregate and
iii
harvest new annotations from heterogeneous online biomedical resources. Comprehensive
semantic enrichment of biomedical images incorporates a variety of knowledge
infrastructure components and services including image feature extraction, semantic web
data services, linked open data and crowd annotation.
Together, these resources make it possible to automatically and/or semi-automatically
discover and semantically interlink new information in a way that supports semantic
search for sequence images. The resulting enriched sequence images are readily reusable
based on their semantic annotations and can be made available for use in ad-hoc data
integration activities. Furthermore, to support image reuse this thesis introduces a
mechanism for identifying similar sequence images based on fuzzy inference and cosine
similarity techniques that can retrieve and classify the related sequence images based on
their semantic annotations. The outcomes of this research work will be relevant to a variety
of user groups ranging from clinicians and researchers searching with sequence image
data.
. Images have an irrefutably central role in scientific discovery and discourse.
However, the issues associated with knowledge management and utility operations
unique to image data are only recently gaining recognition. In our previous
work, we have developed Yale Image finder (YIF), which is a novel Biomedical image
search engine that indexes around two million biomedical image data, along with
associated metadata. While YIF is considered to be a veritable source of easily accessible
biomedical images, there are still a number of usability and interoperability challenges
that have yet to be addressed. To overcome these issues and to accelerate the
adoption of the YIF for next generation biomedical applications, we have developed a
publically accessible semantic API for biomedical images with multiple modalities.
The core API called iCyrus is powered by a dedicated semantic architecture that exposes
the YIF content as linked data, permitting integration with related information
resources and consumption by linked data-aware data services. To facilitate the adhoc
integration of image data with other online data resources, we also built semantic
web services for iCyrus, such that it is compatible with the SADI semantic web service
framework. The utility of the combined infrastructure is illustrated with a number
of compelling use cases and further extended through the incorporation of Domeo, a
well known tool for open annotation. Domeo facilitates enhanced search over the
images using annotations provided through crowdsourcing. The iCyrus triplestore
currently holds more than thirty-five million triples and can be accessed and operated
through syntactic or semantic query interfaces. Core features of the iCyrus API,
namely: data reusability, system interoperability, semantic image search, automatic
update and dedicated semantic infrastructure make iCyrus a state of the art resource
for image data discovery and retrieval
While the world is witnessing an information revolution unprecedented and great speed in the growth of databases in all aspects. Databases interconnect with their content and schema but use different elements and structures to express the same concepts and relations, which may cause semantic and structural conflicts. This paper proposes a new technique for integration the heterogeneous eXtensible Markup Language (XML) schemas, under the name XDEHD. The returned mediated schema contains all concepts and relations of the sources without duplication. Detailed technique divides into three steps; First, extract all subschemas from the sources by decompose the schemas sources, each subschema contains three levels, these levels are ancestor, root and leaf. Thereafter, second, the technique matches and compares the subschemas and return the related candidate subschemas, semantic closeness function is implemented to measures the degree how similar the concepts of subschemas are modelled in the sources. Finally, create the medicate schema by integration the candidate subschemas, and then obtain the minimal and complete unified schema, association strength function is developed to compute closely of pair in candidate subschema across all data sources, and elements repetition function is employed to calculate how many times each element repeated between the candidate subschema.
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASEScsandit
Relational Databases (RDB) are used as the backend database by most of information systems.
RDB encapsulate conceptual model and metadata needed in the ontology construction. Schema
mapping is a technique that is used by all existing approaches for ontology building from RDB.
However, most of those methods use poor transformation rules that prevent advanced database
mining for building rich ontologies. In this paper, we propose transformation rules for building
owl ontologies from RDBs. It allows transforming all possible cases in RDBs into ontological
constructs. The proposed rules are enriched by analyzing stored data to detect disjointness and
totalness constraints in hierarchies, and calculating the participation level of tables in n-ary
relations. In addition, our technique is generic; hence it can be applied to any RDB. The
proposed rules were evaluated using a normalized and open RDB. The obtained ontology is
richer in terms of non- taxonomic relationships.
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
For a same domain, several databases (DBs) exist. The emergence of classical web to the semantic web has
contributed to the appearance of the notion of ontology that have shared and consensual vocabulary. For a
given, it is more interesting to take advantage of existing databases, to build an ontology. Most of the data
are already stored in these databases. So many DBs can be integrated to enable reuse of existing data for
the semantic web. Even for existing ontologies, the relevance of the information they contain requires
regular updating. These databases can be useful sources to enrich these ontologies. In the other hand, for
these ontologies more than the ratio ‘size of the instances on the size of working memory’ is large more
than the management of these instances, in memory, is difficult. Finding a way to store these instances in a
structured manner to satisfy the needs of performance and reliability required for many applications
becomes an obligation. As a consequence, defining query languages to support these structures becomes a
challenge for SW community. We will show through this paper how ontologies can benefit from DBs to
increase system performance and facilitate their design cycle. The DBs in their turn suffers from several
drawbacks namely complexity of the design cycle and lack of semantics. Since ontologies are rich in
semantic, DBs can profit from this advantage to overcoming their drawbacks.
An approach for transforming of relational databases to owl ontologyIJwest
Rapid growth of documents, web pages, and other types of text content is a huge challenge for the modern content management systems. One of the problems in the areas of information storage and retrieval is the lacking of semantic data. Ontologies can present knowledge in sharable and repeatedly usable manner and provide an effective way to reduce the data volume overhead by encoding the structure of a particular domain. Metadata in relational databases can be used to extract ontology from database in a special domain. According to solve the problem of sharing and reusing of data, approaches based on transforming relational database to ontology are proposed. In this paper we propose a method for automatic ontology construction based on relational database. Mining and obtaining further components from relational database leads to obtain knowledge with high semantic power and more expressiveness. Triggers are one of the database components which could be transformed to the ontology model and increase the amount of power and expressiveness of knowledge by presenting part of the knowledge dynamically.
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
Bioinformatics may be defined as the field of science
in which biology, computer science, and information
technology merge to form a single discipline. Its ultimate
goal is to enable the discovery of new biological insights as
well as to create a global perspective from which unifying
principles in biology can be discerned by means of
bioinformatics tools for storing, retrieving, organizing and
analyzing biological data. Also most of these tools possess
very distinct features and capabilities making a direct
comparison difficult to be done. In this paper we propose
taxonomy for characterizing bioinformatics tools and briefly
surveys major bioinformatics tools under each categories.
Hopefully this study will stimulate other designers
and
experienced end users understand the details of particular
tool categories/tools, enabling them to make the best choices
for their particular research interests.
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...IJwest
Constructing ontologies from relational databases is an active research topic in the Semantic Web domain.
While conceptual mapping rules/principles of relational databases and ontology structures are being
proposed, several software modules or plug-ins are being developed to enable the automatic conversion of
relational databases into ontologies. However, the correlation between the resulting ontologies built
automatically with plug-ins from relational databases and the database-toontology mapping principles has
been given little attention. This study reviews and applies two Protégé plug-ins, namely, DataMaster and
OntoBase to automatically construct ontologies from a relational database. The resulting ontologies are
further analysed to match their structures against the database-to-ontology mapping principles. A
comparative analysis of the matching results reveals that OntoBase outperforms DataMaster in applying
the database-to-ontology mapping principles for automatically converting relational databases into
ontologies
Information recovery is the recovery of things (objects, Web pages, archives, and so forth) that fulfill explicit conditions set in an ordinary articulation like query. While IR targets fulfilling a bit of client data need generally communicated in common language, information recovery targets figuring out which records contain the specific terms of the user queries.
GASCAN: A Novel Database for Gastric Cancer Genes and Primersijdmtaiir
GasCan is a specialized and unique database of
gastric cancer protein encoding genes expressed in human and
mouse. The features that make GasCan unique are availability
of gene information, availability of primers for each gene, with
their features and conditions given that are useful in PCR
amplification, especially in cloning experiments and to make it
more unique built in programmed sequence analysis facility is
provided that analyze gene sequences in database itself,
resulting sequence analysis information can be valuable for
researchers in different experiments. Furthermore, DNA
sequence analysis tool is provided that can be access freely.
GasCan will expand in future to other species, genes and cover
more useful information of other species. Flexible database
design, expandability and easy access of information to all of
the users are the main features of the database. The Database is
publicly available at http://www.gastric-cancer.site40.net.
ABSTRACT
Scientific publications are considered as the most up-to-date resource of ongoing research
activities and scientific knowledge. Efficient practices for accessing biomedical
publications are key to allowing a timely transfer of information from the scientific
research community to peer investigators and other healthcare practitioners. Biomedical
sequence images published within the literature play a central role in life science
discoveries. Whereas advanced text-mining pipelines for information retrieval and
knowledge extraction are now commonplace methodologies for processing documents,
the ongoing challenges associated with knowledge management and utility operations
unique to biomedical image data are only recently gaining recognition. Sequence images
depicting key findings of research papers contain rich information derived from a wide
range of biomedical experiments. Searching for relevant sequence images is however error
prone as images are still opaque to information retrieval and knowledge extraction
engines. Specifically, there is no explicit description or annotation of the sequence image
content. Moreover, traditional biomedical search engines, which search image captions
for relevant keywords only, offer syntactic search mechanisms without regard for the
exact meaning of the query. As proposed in this thesis, semantic enrichment of biomedical
sequence images is a solution which adopts a combination of technologies to harness the
comprehensive information associated with, and contained in, biomedical sequence
images. Extracted information from sequence images is used as seed data to aggregate and
iii
harvest new annotations from heterogeneous online biomedical resources. Comprehensive
semantic enrichment of biomedical images incorporates a variety of knowledge
infrastructure components and services including image feature extraction, semantic web
data services, linked open data and crowd annotation.
Together, these resources make it possible to automatically and/or semi-automatically
discover and semantically interlink new information in a way that supports semantic
search for sequence images. The resulting enriched sequence images are readily reusable
based on their semantic annotations and can be made available for use in ad-hoc data
integration activities. Furthermore, to support image reuse this thesis introduces a
mechanism for identifying similar sequence images based on fuzzy inference and cosine
similarity techniques that can retrieve and classify the related sequence images based on
their semantic annotations. The outcomes of this research work will be relevant to a variety
of user groups ranging from clinicians and researchers searching with sequence image
data.
. Images have an irrefutably central role in scientific discovery and discourse.
However, the issues associated with knowledge management and utility operations
unique to image data are only recently gaining recognition. In our previous
work, we have developed Yale Image finder (YIF), which is a novel Biomedical image
search engine that indexes around two million biomedical image data, along with
associated metadata. While YIF is considered to be a veritable source of easily accessible
biomedical images, there are still a number of usability and interoperability challenges
that have yet to be addressed. To overcome these issues and to accelerate the
adoption of the YIF for next generation biomedical applications, we have developed a
publically accessible semantic API for biomedical images with multiple modalities.
The core API called iCyrus is powered by a dedicated semantic architecture that exposes
the YIF content as linked data, permitting integration with related information
resources and consumption by linked data-aware data services. To facilitate the adhoc
integration of image data with other online data resources, we also built semantic
web services for iCyrus, such that it is compatible with the SADI semantic web service
framework. The utility of the combined infrastructure is illustrated with a number
of compelling use cases and further extended through the incorporation of Domeo, a
well known tool for open annotation. Domeo facilitates enhanced search over the
images using annotations provided through crowdsourcing. The iCyrus triplestore
currently holds more than thirty-five million triples and can be accessed and operated
through syntactic or semantic query interfaces. Core features of the iCyrus API,
namely: data reusability, system interoperability, semantic image search, automatic
update and dedicated semantic infrastructure make iCyrus a state of the art resource
for image data discovery and retrieval
While the world is witnessing an information revolution unprecedented and great speed in the growth of databases in all aspects. Databases interconnect with their content and schema but use different elements and structures to express the same concepts and relations, which may cause semantic and structural conflicts. This paper proposes a new technique for integration the heterogeneous eXtensible Markup Language (XML) schemas, under the name XDEHD. The returned mediated schema contains all concepts and relations of the sources without duplication. Detailed technique divides into three steps; First, extract all subschemas from the sources by decompose the schemas sources, each subschema contains three levels, these levels are ancestor, root and leaf. Thereafter, second, the technique matches and compares the subschemas and return the related candidate subschemas, semantic closeness function is implemented to measures the degree how similar the concepts of subschemas are modelled in the sources. Finally, create the medicate schema by integration the candidate subschemas, and then obtain the minimal and complete unified schema, association strength function is developed to compute closely of pair in candidate subschema across all data sources, and elements repetition function is employed to calculate how many times each element repeated between the candidate subschema.
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASEScsandit
Relational Databases (RDB) are used as the backend database by most of information systems.
RDB encapsulate conceptual model and metadata needed in the ontology construction. Schema
mapping is a technique that is used by all existing approaches for ontology building from RDB.
However, most of those methods use poor transformation rules that prevent advanced database
mining for building rich ontologies. In this paper, we propose transformation rules for building
owl ontologies from RDBs. It allows transforming all possible cases in RDBs into ontological
constructs. The proposed rules are enriched by analyzing stored data to detect disjointness and
totalness constraints in hierarchies, and calculating the participation level of tables in n-ary
relations. In addition, our technique is generic; hence it can be applied to any RDB. The
proposed rules were evaluated using a normalized and open RDB. The obtained ontology is
richer in terms of non- taxonomic relationships.
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
For a same domain, several databases (DBs) exist. The emergence of classical web to the semantic web has
contributed to the appearance of the notion of ontology that have shared and consensual vocabulary. For a
given, it is more interesting to take advantage of existing databases, to build an ontology. Most of the data
are already stored in these databases. So many DBs can be integrated to enable reuse of existing data for
the semantic web. Even for existing ontologies, the relevance of the information they contain requires
regular updating. These databases can be useful sources to enrich these ontologies. In the other hand, for
these ontologies more than the ratio ‘size of the instances on the size of working memory’ is large more
than the management of these instances, in memory, is difficult. Finding a way to store these instances in a
structured manner to satisfy the needs of performance and reliability required for many applications
becomes an obligation. As a consequence, defining query languages to support these structures becomes a
challenge for SW community. We will show through this paper how ontologies can benefit from DBs to
increase system performance and facilitate their design cycle. The DBs in their turn suffers from several
drawbacks namely complexity of the design cycle and lack of semantics. Since ontologies are rich in
semantic, DBs can profit from this advantage to overcoming their drawbacks.
An approach for transforming of relational databases to owl ontologyIJwest
Rapid growth of documents, web pages, and other types of text content is a huge challenge for the modern content management systems. One of the problems in the areas of information storage and retrieval is the lacking of semantic data. Ontologies can present knowledge in sharable and repeatedly usable manner and provide an effective way to reduce the data volume overhead by encoding the structure of a particular domain. Metadata in relational databases can be used to extract ontology from database in a special domain. According to solve the problem of sharing and reusing of data, approaches based on transforming relational database to ontology are proposed. In this paper we propose a method for automatic ontology construction based on relational database. Mining and obtaining further components from relational database leads to obtain knowledge with high semantic power and more expressiveness. Triggers are one of the database components which could be transformed to the ontology model and increase the amount of power and expressiveness of knowledge by presenting part of the knowledge dynamically.
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
Bioinformatics may be defined as the field of science
in which biology, computer science, and information
technology merge to form a single discipline. Its ultimate
goal is to enable the discovery of new biological insights as
well as to create a global perspective from which unifying
principles in biology can be discerned by means of
bioinformatics tools for storing, retrieving, organizing and
analyzing biological data. Also most of these tools possess
very distinct features and capabilities making a direct
comparison difficult to be done. In this paper we propose
taxonomy for characterizing bioinformatics tools and briefly
surveys major bioinformatics tools under each categories.
Hopefully this study will stimulate other designers
and
experienced end users understand the details of particular
tool categories/tools, enabling them to make the best choices
for their particular research interests.
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...IJwest
Constructing ontologies from relational databases is an active research topic in the Semantic Web domain.
While conceptual mapping rules/principles of relational databases and ontology structures are being
proposed, several software modules or plug-ins are being developed to enable the automatic conversion of
relational databases into ontologies. However, the correlation between the resulting ontologies built
automatically with plug-ins from relational databases and the database-toontology mapping principles has
been given little attention. This study reviews and applies two Protégé plug-ins, namely, DataMaster and
OntoBase to automatically construct ontologies from a relational database. The resulting ontologies are
further analysed to match their structures against the database-to-ontology mapping principles. A
comparative analysis of the matching results reveals that OntoBase outperforms DataMaster in applying
the database-to-ontology mapping principles for automatically converting relational databases into
ontologies
Information recovery is the recovery of things (objects, Web pages, archives, and so forth) that fulfill explicit conditions set in an ordinary articulation like query. While IR targets fulfilling a bit of client data need generally communicated in common language, information recovery targets figuring out which records contain the specific terms of the user queries.
GASCAN: A Novel Database for Gastric Cancer Genes and Primersijdmtaiir
GasCan is a specialized and unique database of
gastric cancer protein encoding genes expressed in human and
mouse. The features that make GasCan unique are availability
of gene information, availability of primers for each gene, with
their features and conditions given that are useful in PCR
amplification, especially in cloning experiments and to make it
more unique built in programmed sequence analysis facility is
provided that analyze gene sequences in database itself,
resulting sequence analysis information can be valuable for
researchers in different experiments. Furthermore, DNA
sequence analysis tool is provided that can be access freely.
GasCan will expand in future to other species, genes and cover
more useful information of other species. Flexible database
design, expandability and easy access of information to all of
the users are the main features of the database. The Database is
publicly available at http://www.gastric-cancer.site40.net.
There are many characteristics of biological data. All these characteristics make the management of biological information a particularly challenging problem. Here mainly we will focus on characteristics of biological information and multidisciplinary field called bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in several universities.
Query Optimization Techniques in Graph Databasesijdms
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for
storing and managing data with graph-like structure. Today, they represent a requirementfor many
applications that manage graph-like data,like social networks.Most of the techniques, applied to optimize
queries in graph databases, have been used in traditional databases, distribution systems,… or they are
inspired from graph theory. However, their reuse in graph databases should take care of the main
characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to
efficiently access data relationships. In this paper, we survey the query optimization techniques in graph
databases. In particular,we focus on the features they have in
Bridging the gap between the semantic web and big data: answering SPARQL que...IJECEIAES
Nowadays, the database field has gotten much more diverse, and as a result, a variety of non-relational (NoSQL) databases have been created, including JSON-document databases and key-value stores, as well as extensible markup language (XML) and graph databases. Due to the emergence of a new generation of data services, some of the problems associated with big data have been resolved. In addition, in the haste to address the challenges of big data, NoSQL abandoned several core databases features that make them extremely efficient and functional, for instance the global view, which enables users to access data regardless of how it is logically structured or physically stored in its sources. In this article, we propose a method that allows us to query non-relational databases based on the ontology-based access data (OBDA) framework by delegating SPARQL protocol and resource description framework (RDF) query language (SPARQL) queries from ontology to the NoSQL database. We applied the method on a popular database called Couchbase and we discussed the result obtained.
Linked Data Generation for the University Data From Legacy Database dannyijwest
Web was developed to share information among the users through internet as some hyperlinked documents.
If someone wants to collect some data from the web he has to search and crawl through the documents to
fulfil his needs. Concept of Linked Data creates a breakthrough at this stage by enabling the links within
data. So, besides the web of connected documents a new web developed both for humans and machines, i.e.,
the web of connected data, simply known as Linked Data Web. Since it is a very new domain, still a very
few works has been done, specially the publication of legacy data within a University domain as Linked
Data.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing one single patient’s information gathered from a large volume of data over a long period of time. The main objective of this paper is to represent our ontology-based information retrieval approach for clinical Information System. We have performed a Case Study in the real life hospital settings. The results obtained illustrate the feasibility of the proposed approach which significantly improved the information retrieval process on a large volume of data over a long period of time from August 2011 until January 2012.
The suite of free software tools created within the OpenCB (Open Computational Biology – https://github.com/opencb) initiative makes possible to efficiently manage large genomic databases.
These tools are not widely used, since there is quite a steep learning curve for their adoption, thanks to the complexity of the software stack, but they may be really cost-effective for hospitals, research institutions etcetera.
The objective of the talk is showing the potential of the OpenCB suite, the information to start using it and the advantages for the end users. BioDec is currently deploying a large OpenCGA installation for the Genetic Unit of one of the main Italian Hospitals, where data in the order of the hundreds of TBs will be managed and analyzed by bioinformaticians.
Data integration in a Hadoop-based data lake: A bioinformatics caseIJDKP
When we work in a data lake, data integration is not easy, mainly because the data is usually
stored in raw format. Manually performing data integration is a time-consuming task that requires the
supervision of a specialist, which can make mistakes or not be able to see the optimal point for data integration among two or more datasets. This paper presents a model to perform heterogeneous in-memory
data integration in a Hadoop-based data lake based on a top-k set similarity approach. Our main contribution is the process of ingesting, storing, processing, integrating, and visualizing the data integration
points. The algorithm for data integration is based on the Overlap coefficient since it presented better
results when compared with the set similarity metrics Jaccard, Sørensen-Dice, and the Tversky index. We
tested our model applying it on eight bioinformatics-domain datasets. Our model presents better results
when compared to an analysis of a specialist, and we expect our model can be reused for other domains of
datasets.
Data integration in a Hadoop-based data lake: A bioinformatics caseIJDKP
When we work in a data lake, data integration is not easy, mainly because the data is usually
stored in raw format. Manually performing data integration is a time-consuming task that requires the
supervision of a specialist, which can make mistakes or not be able to see the optimal point for data integration among two or more datasets. This paper presents a model to perform heterogeneous in-memory
data integration in a Hadoop-based data lake based on a top-k set similarity approach. Our main contribution is the process of ingesting, storing, processing, integrating, and visualizing the data integration
points. The algorithm for data integration is based on the Overlap coefficient since it presented better
results when compared with the set similarity metrics Jaccard, Sørensen-Dice, and the Tversky index. We
tested our model applying it on eight bioinformatics-domain datasets. Our model presents better results
when compared to an analysis of a specialist, and we expect our model can be reused for other domains of
datasets.
Knowledge Driven User Interfaces for Complex Biological Queriesalexander garcia
With the explosion of biological data in the postgenomic era, there has been a growing need for semantic data integration, supported by ontologies. Semantic integration techniques enable biologists to construct complex biological queries. However, the construction of these queries and analysis of their results can place a high cognitive load on biologists. This paper presents a proposed information visualisation tool, Digr, to aid biologists in these processes within the context of DigraBase, a graph database for semantic data integration. A working example of a query is presented, to illustrate the complexity of the information spaces under consideration. Visualisation techniques that have been applied to similar problems are discussed in the context of their applicability to the problem of aiding the construction of complex queries over DigraBase, and the interpretation of their results.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
Information residing in relational databases and delimited file systems are inadequate for reuse and sharing over the web. These file systems do not adhere to commonly set principles for maintaining data harmony. Due to these reasons, the resources have been suffering from lack of uniformity, heterogeneity as well as redundancy throughout the web. Ontologies have been widely used for solving such type of problems, as they help in extracting knowledge out of any information system. In this article, we focus on extracting concepts and their relations from a set of CSV files. These files are served as individual concepts and grouped into a particular domain, called the domain ontology. Furthermore, this domain ontology is used for capturing CSV data and represented in RDF format retaining links among files or concepts. Datatype and object properties are automatically detected from header fields. This reduces the task of user involvement in generating mapping files. The detail analysis has been performed on Baseball tabular data and the result shows a rich set of semantic information.
Similar to A consistent and efficient graphical User Interface Design and Querying Organelle Genome “GUEDOS” (20)
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
A consistent and efficient graphical User Interface Design and Querying Organelle Genome “GUEDOS”
1. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 191
A Consistent and Efficient Graphical User Interface Design and
Querying Organelle Genome “GUEDOS”
Hassan BADIR hbadir@gmail.com
Labtic, National School of applied sciences
University Abdelmalek essaadi
90020, Tangier, Morocco
Rachida FISSOUNE fissoune@gmail.com
National School of applied sciences
University Abdelmalek essaadi
Tetouan, Morocco
Amjad RATTROUT ramjad@gmail.com
University Claude Bernard
Lyon, France
Abstract
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management
System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete
genome. We aim to offer biologists the possibility to access in a unified way information spread
among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual
schema graph through a number of illustrative examples. The adopted, human-computer
interaction technique in this visual designing and querying makes very easy for biologists to
formulate database queries compared with linear textual query representation.
Keywords: Graphical User Interface, Complex Object, Bioinformatics, Gene, Genome, UML-BD
1. INTRODUCTION
Many biologists use freely and extensively a large number of databanks collecting considerable
amount of information. Data from sequence databanks are daily submitted worldwide, usually by
electronic mail, then manually checked and stored – generally-under a RDBMS. They are then
broad casted to the main servers of the community – daily by Internet or quarterly by CDROM-
under a flat file format (ASCII files).
There is no agreement on the format and, for the same data; many formats may b available,
depending on the databank manager (Europe, US, Japan…). The number and the size of
databanks are grouping rapidly (the size of major sequence databanks doubles each year).
Why Organelle Genome? Organelles (mitochondria and chloroplasts) are of interest for several
reasons, including their:
• Possible bacterial origins,
• Relationship to the evolution of the nuclear genome,
• Central role in eukaryotic cell energy production
• Utility as population markers.
The most important advantage to genomics offered by organelles is the number of completely
sequenced genomes already available and currently being sequenced (e.g. by OGMP: Organelle
Genome Mega-sequencing Program). No larger collection of completely sequenced exists.
Sequenced organelle genomes are information-rich datasets: firstly most of them belong to a
highly analysed set including protein-coding, transfer RNA (tRNA) and ribosomal RNA (rRNA)
genes, secondly the relationships between and among genomes can be determined (at both
gene and genome levels), making them ideal for comparative genomic studies.
2. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 192
For many years, computer scientists have provided help to biologists for managing these data. As
a result, a lot of retrieval systems, like SRS [11] or ACNUC, have been created which are based
on indexing flat files. These simple tools obtain a wide success in the biologist community.
Web interfaces have also become very familiar and make the data very easy to access for any
biologist. The ability to retrieve data from anywhere on the network has progressively leaded to a
new problem: how to interconnect heterogeneous databanks? At present it is difficult for
researchers to access all the relevant information associated with an sequence genome. Data are
dispersed among a number of sources. In this present disorganized state, organelle genomic
data constitute a major underexploited source of information. A wide discussion on this topic has
started within the molecular biology community. To integrate such data warehouse is considered
as one of the main problems to face up in bioinformatics.
Directly manipulating visual conceptual schemes of complex objects provide users with a clear
and powerful mean of interaction. An end user of GUEDOS-DB is able to graphically build a
database schema or modify an existing one, to load all the information initially provided by a data
bank as GENBANK, to browse through the schema of the database in order to construct the
queries about the data and to save them for later use. All of these activities are accomplished by
using a unique graphical iconic representation.
As a consequence, different types of users, especially novices, can learn and use the query
facilities in a more intuitive way, without necessity to remember the database scheme or the
grammar of the query language.
In this paper we describe the visual query interface GUEDOS-DB (DBioMics) developed in the in
object-oriented [2] environment for software engineering workshop development. A database
schema based on an object-relational [1] data model using SQL3.
The user interface proposed in this paper:
In descriptive: it provides a concise and complete visualization of the data scheme called
Object Semantic Graph;
Uses the same medium both for the description of the data scheme and for the
representation of the formulated query on the semantic graph.
Is interactive: the formulation of a query is made by simply designating the nodes and arcs
of the displayed semantic graph and the syntactic units through the technique of direct
manipulation.
The remainder of this paper is organized as follows. We first review the previously proposed
biological databases in the next section. In section 3, we review the basic concepts of the object
oriented data model. We then describe our proposed graphics user interfaces in section 4.
Section 5 describes some graphical queries with a number of illustrative examples. Finally, we
conclude in section 6.
2. RELATED WORK
In fact biological interfaces are developed upon one or more database management systems or
one or more data file management systems. Database models can be relational, object-Oriented,
object-relational or hierarchical. Some ad-hoc solutions have been proposed:
The Kabat antibody databank [4] is one of the first realizations in this area. The system
visualizes hierarchical schemes and includes computational tools. It is developed on P/FDM
(Functional Data Model), an in-house OODBMS built at Aberdeen. Data are modelized in
the FDM, and queries are made through DAPLEX language, which is based on PROLOG.
Describing database schemes and querying are quite distinct.
GOBASE [3] is implemented under SYBASE RDMS on a Sun SPARCstation. Custom
software has been developed for make easy to populate and maintain the database. It uses
the Perl language and Sybase’s Open-Client development tools. The query interface is
supported through WWW forms using the web/General gateway. The user interface allows
the users to navigate in and to retrieve information from GOBASE, and includes the
following entities: gene, intron, RNA, protein, organism, sequence, chromosome and signal.
The interface also contains hypertext links to specific information in other internet-
accessible databases. In future versions, the user interface will be expanded by adding new
entities to the database and by offering analytical tools such as subsequence extraction and
neighboring searches.
3. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 193
AGIS (agricultural Genome Information System) [12] is a World Wide Web product of a
cooperative effort between the department of Plant Biology, University of Maryland and
College Park in USA. This databank consists of genome information for agricultural
organisms. At present, it encompasses mostly crop and livestock or non-commodity animal
species. Also included are a number of databases that have related information, e.g.
germplasm and plant gene nomenclature data.
Genomic databases can be viewed by ACDB [9] a widely used generic genome interface.
It offers a specific visual interface. It is worth noting that this interface supports describing
and querying schema but possible queries are too coarse.
Docking-D [3], a prototype for managing ligands (PDB, HPSS…), is implemented with and
OODBMS prototype VODAK [8]. According to its authors, the VODAK data model provides
all standard features of object-oriented data model; the system includes SQL-Like query
language with optimization and multi-user access modules. VODAK was initially created in
response to the lack of a declarative query language in many object-oriented database
systems like ObjectStore. It is still under development.
3. GUEDOS DESCRIBING
3.1 Architecture
3.1.1 Description
GUEDOS prototype is based on a graphical approach used to represent graphically database
schemas. In addition, schema conception is realized by systemic and direct graphs manipulations
by a set of available graphic operators. Also, GUEDOS is characterized by several criteria:
Flexible: flexibility implies that the used graphical language should be adapted to possible
uses.
Incremental: practical interest is situated in the fact that, during schema conception phase,
user can construct schema incrementally by defining more constraints.
Uniform: uniformity imposes identical interaction modes for the different functionalities
enhancing thus the tool ergonomic.
Figure 1 describes GUEDOS general architecture. It's composed mainly by two components: a
graphical interface and an automatic transformation module.
GUEDOS Prototype was developed by java. It represents a semantic suite of LASCI-Complex
toot developed earlier by in a thesis framework. This tool allows an automatic visualization of
database structure, allowing thus an automatic transformation using algorithms developed by our
team.
3.1.2 Functions
GUEDOS is a platform for editing and designing object and object relational database schemas.
The systemic aspect in GUEDOS is the key solution for a guided construction of such a schema,
because his objective is to facilitate requirements specifications of skilled designer or not, and to
simplify him the design task. Its functions are to allow:
To users to express their requirements by the means of one or several models like:
universal relation with inclusion (URI), object attributes forest and UML DB-stereotyped
classes diagram. The tool deals with these models by the means of a logical and syntax
apprehension of those. User describe his requirements, and affine his constraints to obtain
an initial conceptual schema. User can also normalize and transform a representation under
the form of relations related by inclusions dependences into a normalized semantic graph.
This normalization could be partial, leaving unchanged certain complex structures of fixed
objects by user,
To swap from one model -among those cited above- to another using transformation
algorithms [6] and to generate SQL3 description or an XML schema,
To personalize the obtained conceptual schema with respect to the foreseen processing, by
introducing access methods, even denormalizing them.
And to integrate many conceptual schemas into one without losing any information.
4. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 194
FIGURE 1: Architecture of GUEDOS
GUEDOS interface could be decomposed into three different screens, according to user need. In
the first case, user looks to edit a set of attributes and functional dependencies to produce
systemically NSG, using normalization algorithm; secondly, user constructs an OAF manually by
specifying its constraints or automatically from the previously obtained NSG relying on a
transformation algorithm developed in [6]. In the third case, user conceives a database
stereotyped UML class diagram from an OAF previously obtained using transformation algorithm
in [7]. GUEDOS has two other forms allowing to display SQL3 description and XML schema
generated from classes diagram. Concretely, GUEDOS avoids waste of time generally noticed in
redoing repetitive tasks, and allows then a more coherent information processing. User beneficing
of supervision and control capacities on global tasks he accomplishes. It allows, more generally,
a better comprehension of user conceptual framework.
3.2 Manipulation and Importing
The first part of this biological application has consisted in defining a data schema in the
Graphical Object Data Model (GODM) [13] [6].
Figure 2 shows mitochondrial sequences graph. This schema respects the ontology and allows
the indexation to the largest quantity of knowledge. The design of a GODM schema is not our
object in this paper [6][14].
GUEDOS includes a schema editor, allowing designers to build such a GODM schema by picking
graphical symbols from palette and positioning them into the workspace provided in an ad-hoc
window. An analyst is able to work simultaneously on several schemas by using graphical
window for each. Standard editing operations are available through pull-down menus.
The mitochondrial schema [10] represents seven fundamental classes corresponding to the types
Genome, Gene, Fragment, NotLeafTaxon, Reference, KeyWord and Author. These are depicted
using rectangle nodes, indicating that they correspond to abstract data types in the world. These
classes are referencing each other using internal identifiers that are not visible to the user. In
contrast with abstract types, attributes may be either atomic or complex.
An atomic attribute is depicted using oval such as GeneType of class Gene in Figure 3.
5. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 195
FIGURE 2: schema graph for mitochondrial sequences
A complex attribute is depicted using also oval, but decomposed into a tuple of attributes, which
may be also either atomic or complex. Figure 2 also illustrates attribute-domain relationships. For
example the Genome class has an attribute NotLeafTaxon referencing the NotLeafTaxon class.
Then,
The NotLeafTaxon class is the domain of attribute NotLeafTaxon of class Genome;
An object instance of Genome has an object instance of NotLeafTaxon as the value of
NotLeafTaxon attribute (single-valued attribute).
Figure 2 shows mitochondrial sequences schema graph. This schema respects the ontology and
allows the indexation to the largest quantity of knowledge. In an Object Attributes Forest [7], each
attributes forest has a color chosen randomly by GUEDOS or indicated by user. This color will be
inherited by UML classes diagram during transformation. A complex class or an attributes tree
could have one more keys. For example in figure 2, Genome has as keys IdGenome and Locus
TaxonNotLeaf . Figure 3 represents DB-stereotyped classes diagram with derived keys from an
OAF (figure 2) using UML-DB profile [6].
4. GUEDOS MANIPULATION AND QUERIES
There are two ways to extract data from a database. One way is extensional, by navigating
through the database at the occurrence level. The other way is intentional, by formulating a query
asserting which types in the database schema are relevant to given query and which attribute
values and links among objects in order to define which subsets of those populations are
relevant. A query also defines which data from the relevant subsets have to be put into the result
(typically a projection operation) and how these data have to be structured for the end result
(unless the result is by definition a flat or first normal form from a relation). In a classical query
language as OQL/SQL, we can find these different specifications expressed as Select-From-
Where blocks: the FROM clause restricts these classes to the relevant subsets, finally the Select
clause specifies the projected data.
In a visual environment, the definition of the relevant object classes (the FROM clause) is
performed by visualizing the database schema on the screen and by clicking on the desired
class-nodes to lift them out from the schema into the query sub-schema (alternatively, by clicking
on undesired types to reduce them from the schema, which gradually reorients it to the target
query sub-schema). The query sub-schema is a sub-graph of the original schema graph.
The definition of the relevant subsets of the classes in the query (the WHERE clause) is
expressed as conditions (predicates) upon desired attributes. Only these objects that match the
6. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 196
conditions will contribute to the result. Conditions in a query are divided in two categories: simple
conditions which apply to object sets in a single class and composite conditions which apply to
object sets in multiple classes.
FIGURE 3: schema UML-DB for mitochondrial sequences
The definition of data to be presented to the user and the way by which they have to be
presented (the SELECT clause) can be seen as the definition of a new, virtual object type. In the
query sub-schema the projected attributes are represented by the symbol “?”.
4.1 Complexes Queries in Object-relational Model
Object Relational Queries are the queries, which exploit the basic object relational model thus
allowing the user to retrieve data from the new data structures that have been introduced in the
model. Object Relational Queries can be broadly classified into: Queries having REF, Queries for
Nested Table Structure (Aggregate Queries), Queries using index cluster, and Queries for
Inheritance Relationship [16].
REF Queries
REF is incorporated in database by defining one attribute in the table, which holds the REF
information of the attribute, which belongs to the other table. REF is used when there is an
association relationship between two objects. REF queries are the type of queries, which involve
REF either in projection or join or selection. REF is a logical pointer, which creates a link between
two tables so that integrity in data can be maintained between the two tables. The attribute which
holds the REF value is called as ref attribute (e.g. Loginstaff_id in Login_t table) and the attribute
to which ref attribute points is called as referred attribute (e.g. staffid in person_t table). Ref
attributes stores the pointer value of referred attribute as its real data value. Most important thing
to keep in mind is that whenever we refer to REF we always give the alias of the table, which
holds the referred attribute and not the table name. REF takes as its argument a correlation
variable (table alias) for an object table. Generally, REF query consists of Projection, Joins and
Selection.
SELECT <Projection List>
FROM <table1> <alias1>, <table2> <alias2>, … … …
WHERE <alias2>.<ref attribute> = REF(<alias1>) ;
7. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 197
FIGURE 4: General Classification of the ORDBMS Queries
Projection refers to what we to get as a result of the execution of the query. Joins are the links,
which are responsible for maintaining integrity in data, which is common to two tables. Selection
is the condition based on which we do projection.
Aggregate Queries
Collection types give the flexibility of storing a series of data entries that are jointly associated to a
corresponding row in the database. They can be further classified into two: VARRAYS, Nested
Tables. We will not be discussing VARRAYS in this section, since at the moment we cannot write
SQL statements to access the elements of the VARRAYS. The elements of the VARRAYS can
only be accessed through PL/SQL block; hence it is out of the scope of this section.
Nested table is one of the ways for implementing aggregation. Nested table data is stored in a
single table, which is then associated with the enclosing table or object type. In nesting technique,
the relationship between “part” and “whole” is existence dependent type. If the data for the whole
object is removed all of its part objects are removed as well. Nested Table is a user-defined
datatype, which is linked to the main table in which it is nested. Generally nesting technique is of
two types: Homogeneous and Heterogeneous, depending upon the number of the parts that the
main table has.
Aggregation is an abstraction concept for the building composite objects from their component
objects. Participating entities have “Whole-Part” type of relationship and the part is tightly coupled
with whole. Aggregation can be done in the following two ways: Nesting Technique and
Clustering Technique. Both the techniques store aggregate data efficiently but the degree of
cohesiveness between whole and part data is more in nesting technique than in clustering
technique. Nesting and Clustering technique can further be classified into Homogeneous and
Heterogeneous Aggregation depending upon the number of parts they have,
SELECT <Nested table attribute1>,
<Nested table attribute2>, … … …
FROM THE (SELECT <nested attribute>
FROM <whole table> <alias1>
WHERE <primary key condition>);
Clustering Technique Aggregation Queries
Clustering gives the flexibility of storing the “whole-part” type of information in one table. This
technique is used when there is aggregation (i.e. the whole is composed of parts). This enforces
8. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 198
dependent type of relationship and each (i.e. either whole or part) has unique ID [17]. Clustering
can be further classified into homogeneous and heterogeneous clustering aggregation.
Clustering technique implements the participating tables in “Whole–Part” type of relationship. The
part information is tightly coupled with the corresponding whole record. For each whole info, we
have many corresponding parts and this is achieved by creating cluster on the whole key. Index
creation improves the performance of the whole clustering structure. Clustering can also be
divided into two types depending upon the number of participating subtypes: Homogeneous
Clustering and Heterogeneous Clustering.
SELECT CURSOR (SELECT <Projection List>
FROM <main table name>
WHERE Join AND[<condition>]),
CURSOR (SELECT <Projection List>
FROM <part table name>
WHERE Join AND [<condition>])
FROM <whole table name> <alias1>,
<part table name> <alias2>
WHERE <alias1>.<attribute name> = <alias2>.< attribute name >
AND [<condition>];
Inheritance Queries
Inheritance is a relationship between entities, which gives the flexibility to have the definition and
implementation of one entity to be based on other existing entities. The entities are typically
organized into hierarchy consisting of parent and child [17]. Child inherits all the characteristics of
its parent and can also add new characteristic of its own. A parent can have many children but
each child will have only one parent. There can be multilevel of inheritance (i.e. a parent child can
have many other child’s as well). Basically in Object Relational Database System, inheritance can
be of three types: Union Inheritance, Mutual Exclusion Inheritance and Partition Inheritance. The
basic difference between three types of inheritance is in implementation but for querying
purposes they are basically the same. In this paper we have only taken Union Inheritance into
consideration for writing SQL statements as the only difference between different types of
inheritance is the way they are implemented in the database and not in terms of writing SQL. The
general syntax for implementing inheritance relationship is as follows.
Generally inheritance queries consist of projection and selection. SQL for inheritance queries is
same irrespective of type of inheritance or number of levels in the tree hierarchy unless
mentioned specifically. The general syntax for inheritance queries is as follows.
SELECT VALUE(<alias>).<supertype attribute name1>,
VALUE(<alias>).<supertype attribute name2>, … … …
FROM <table name> <alias>;
4.2 Query Editor
GUEDOS-ASCK includes an editor for graphical specification of queries, inserts and updates. In
this section we present the main features of the editor. The discussion is limited to query
formulation. The various steps which the process of query formulation comprises are:
a. Selecting the query sub-schema:
This is the initial step for all visual query languages. The portion relevant to the query is extracted
from the database schema is reverse video.
b. Specifying predicates:
Predicates are stated here to be applied to database occurrences, so that only relevant data are
selected. Predicates against complex objects may be rather clumsy. For the simplest ones
(comparison of a mono-valued attributes with a constant) a graphical counter-part may easily be
defined. A simple specification technique is to click on the attributes, select a comparison
operator from a menu, and finally type the value or choose one from a list. For complex
predicates (involving several quantifiers, for instance), there might be no simple way to express it
graphically. Menus are sometimes used for syntactic editing of predicates. In GQL/ER [15], QBE-
Like (Kari, 1990) forms are used to specify conditions on the selected nodes. In GUEDOS, textual
9. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 199
specification of Boolean or arithmetic expressions has been preferred to graphical representation:
There are simpler for complex expressions.
c. Formatting the output:
The selection of projection of projected attributes defines the structure of the resulting entity type.
4.3 Query Examples
This section illustrates some examples of graphical queries in GUEDOS-Queries. Let us assume
that the user wants to formulate a query for the schema show in Figure 1.
For each query we show:
• The query window, which displays twos superimposed graphs with different shades:
o The object-relational schema which is bright and fixed in the background to guide
the user,
o The sub-graph under development, which is reverse video.
The window has a selection bar which, in addition File and Edit menus, contains: the
Query menu, used for choosing the type of query to be made on the query graph
• The SQL3 code window for displaying the corresponding SQL3 code of the formulated
visual query.
• The result window for displaying the result of the executed query.
• Consider the simple query, ‘Print the NId and TaxonLeaf of Genome which the Locus is
“ACU12386” ‘in Figure 4.
The user selects the Class boxes to which the projection and condition may be specified
(Genome Class in this example). The user will proceed in this way.
• Designate the projected attributes NId and TaxonLeaf of Genome class by choosing the
PROJECT option in the pull-down menu associated to this attributes.
• Establish the selection condition Locus = “ACU12386” of Genome class. The user
carries out the following actions:
o Double-click on the attribute Locus of GENOME class and the pull-down menu
associated to this attribute is displayed,
o Choose the predicate option in the menu
o (implicit) choose the comparator “=”
o Implicit choose the option “value to be entered”,
o Enter the value “ACU12386” in the open box.
o The answer to the request is a temporarily sub-class of the Genome class and is
composed of appropriate objects. The sub-class name will be either the name of
the query or a specific name given by the user. On the query sub-graph, the user
selects the SHOW RESULT option in pull-down menu in the class node to
visualize the temporarily objects.
10. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 200
FIGURE 5: Example of query for mitochondrial sequences
5. CONSLUSION & FUTURE WORK
We have proposed a solution consisting on an integrated environment facilitating cohabitation of
several models and techniques to sustain user when designing database schema. GUEDOS is
specific to object-relational database schema design.
We have described the environment at whole, focusing at the same time on data static structure
and dynamic process modeling. As perspectives, we tend to extend GUEDOS to characterize the
obtained conceptual schema with respect to the previous processing, by introducing access
methods, even denormalization, and to integrate several schemas conceptual schemas into on
schema without lost of information. This work direction leads us to take more attention on
optimization and design process using a self-optimization approach based on user preferences
by selecting indexing methods, fragmentation and selection of views to materialize.
6. REFERENCES
[1] M. Stonebraker, P. Brown, 1999. Object-Relational DBMSs – Tracking the Next Great Ware,
2nd ed., Morgan Kaufmann, San Fransisco.
[2] Bertino E. and Martino L., 1993. Object Oriented Database Systems; Concepts and
Architectures. Addison-Wesley Publishing Company Inc, (1993)
[3] Korab-Laskowska M., Pierre Rioux, Nicolas Brossard, Timothy G. Little-john1, Michael W.
Gray2, B. Franz Lang, Gertraud Burger, 2001. The Organelle Genome Database Project
(GOBASE). Nucleic Acids Research, volume 26, Issue 01: January 1 (2998) 138-144.
[4] Marie-Paule Lefranc, Véronique Giudicelli, Chantal Busin, Julia Bodmerl, Werner Müller,
Ronald Bontrop, Marc Lemaitre, Ansar Malik, Denys Chaume, 1998. IMGT. The
international ImMunoGeneTics database, Nucleic Acids Research, Volume 26, Issue 01:
Jaunuary 1, 297-303.
[5] Moore, R., Lopes, J., 1999. Paper templates. In TEMPLATE’06, 1st International Conference
on Template Production. SciTePress.
11. Hassan BADIR, Rachida FISSOUNE & Amjad RATTROUT
International Journals of Biometric and Bioinformatics (IJBB), Volume (5) : Issue (3), 2011 201
[6] Badir, H. and Pichat, E., 2005. An Interactive Tool for creative data modeling, in Database
Technology and Applications for International Conference on Information Technology
ITCC'05, IEEE, Las Vegas, Nevada
[7] Badir H., Tanacescu A., 2007. An efficient interface to handle complex structure for database
design”, ICEIS ‘07, Madeira, Potugal.
[8] Chang W., I.N. Shindyalov, C. Pu and P.E Bourne, 1994. Design and application of PDBlib,
a C++ macromolecular class library. Computer Application in Biosciences, 10(6).
[9] Tateno Y., Kaoru Fukami-Kobayashi, Satoru Miyazaki Sugawara and Takashi Gojobori,
1999. DNA Data Bank of Japan at work on genome sequence data. Nucleic Acids
Research 6 (19) 16-20.
[10] Lagesen K, et al. RN Ammer, 2007. Consistent and rapid annotation of ribosomal RNA
genes. Nucleic Acids Res. 35:3100–3108.
[11] Etzold T. and P. Argos. SRS, 1993. An indexing and retrieval tool for flat file data libraries.
Computer Applications in the Biosciences. 9(1) 49-57.
[12] Stephen M. Beckstrom-Sternberg and D. Curtis Jamison, 1999. AGIS: Using the Agricultural
Genome Information System, Bioinformatics: Databases and Systems, p 163-174
[13] Kim W., 1989. A model of queries for object-oriented databases. VLDB, pages 423-432.
[14] Kari S. And Rosenthal, A. G-WHIA, 1990. Conceptual Query Language-CQL: a visual user
interface to application databases. IOS Press, pages 608-623.
[15] Lecluse, C. Richard, P. And Velez, F. O2, 1988. An Object-Oriented data model. EDBT,
pages 556-562.
[16] D. Tania, Rahayu and Srivastava, , 2003, A Taxonomy for Object-Relational Queries, by
IRM Press
[17] Loney, K. & Koch, G. (2002). Oracle 9i: The Complete Reference. Oracle Press.