Institutional database analysis

INSTITUTIONAL
DATABASES
FOUNDATIONS FOR THE LIFE CYCLE
Massimo Buonaiuto

IPGRI – Foundations of Institutional Database Management Life Cycle
Page 2
SUMMARY
SUMMARY................................................................................................................... 2
INTRODUCTION ........................................................................................................ 5
INFORMATION AS AN ORGANIZATION ASSET ................................................. 6
CURRENT STATUS AND REMARKABLE PROBLEMS........................................ 7
BAD STRUCTURES AND SCHEMES.............................................................................................. 7
REDUNDANCY ............................................................................................................................... 7
BAD DATABASE MANAGEMENT SYSTEMS............................................................................... 7
PURPOSES NOT ALWAYS DEFINED............................................................................................. 7
DATA DICTIONARY....................................................................................................................... 8
DOCUMENTATION......................................................................................................................... 8
REFERENCES .................................................................................................................................. 8
STANDARD NAMING SCHEMES................................................................................................... 8
RELATIONSHIPS AMONG DATABASES....................................................................................... 8
RELATIONSHIPS AMONG TABLES: ABSENCE OF CASCADING FEATURES .......................... 8
LOOKUP TABLES............................................................................................................................ 8
STANDARD DATABASES .............................................................................................................. 8
VIEWS.............................................................................................................................................. 9
BACKUP AND RECOVERY............................................................................................................. 9
EXPORTING OF RECORDS BETWEEN TWO USERS.................................................................... 9
PRIMARY, SECONDARY AND FOREIGN KEYS........................................................................... 9
INDEXING ....................................................................................................................................... 9
PERSISTENT QUERIES................................................................................................................... 9
DISTRIBUTE DATABASES AND REPLICATION .......................................................................... 9
THE PROJECT .......................................................................................................... 10
PROJECT DETAILS ....................................................................................................................... 11
PROJECT TEAM..........................................................................ERROR! BOOKMARK NOT DEFINED.
DATA ARCHITECTURE.......................................................................................... 12
INTRODUCTION AND BACKGROUND....................................................................................... 12
A SIMPLE DATA ARCHITECTURE .............................................................................................. 12

Page 3
DATA MODELING OVERVIEW.................................................................................................... 13
METADATA................................................................................................................................... 13
DATABASE MANAGEMENT SYSTEM (DBMS) OVERVIEW..................................................... 13
DATA ACCESS MIDDLEWARE OVERVIEW............................................................................... 13
DATA ACCESS IMPLEMENTATION OVERVIEW....................................................................... 13
DATA SECURITY OVERVIEW ..................................................................................................... 13
THE DAWN OF A NEW ARCHITECTURE ........................................................... 14
ADVANTAGES.............................................................................................................................. 14
DISADVANTAGES........................................................................................................................ 14
GENERAL RECOMMENDATIONS........................................................................ 15
WHY INST. DATABASES ARE NOT UP TO USERS EXPECTATIONS? ...........................16
INSTITUTIONAL DATABASE ................................................................................ 17
DEFINITION................................................................................................................................... 17
IMPLEMENTATION...................................................................................................................... 17
NAMING CONVENTION............................................................................................................... 17
DBMS Naming Convention.......................................................................................... 17
Field naming convention............................................................................................... 18
Default fields ................................................................................................................ 18
Primary and Foreign key naming convention ................................................................ 19
Data standard: the Unicode Standard............................................................................. 21
Data Sources................................................................................................................. 21
APPENDIX A: DATABASE DESIGN....................................................................... 23
APPENDIX B: DATABASE EVALUATION BY METRICS .................................. 25
APPENDIX C: REDUNDANCY AND NORMALISATION.................................... 27
NORMALISATION ........................................................................................................................ 27
EXAMPLE...................................................................................................................................... 28
APPENDIX D: DATABASE STANDARDS.............................................................. 31
APPENDIX E: GLOSSARY ...................................................................................... 33
COMMENTS .............................................................................................................. 39

Page 4

Page 5
INTRODUCTION
IPGRI is an international research institute of United Nations with a mandate to advance
the conservation and use of genetic diversity for the well being of present and future
generations.
IPGRI aims to meet three major objectives:
§ Countries, particularly developing countries, can better assess and meet their own
plant genetic resources needs
§ International collaboration in the conservation and use of genetic resources is
strengthened
§ Knowledge and technologies relevant to the improved conservation and use of plant
genetic resources are developed and disseminated
The information collected by scientists and all people who work in IPGRI, are fundamental
for IPGRI mission. All these data are stored in different kind of supports: databases,
documents, papers, and backup tools.
The primary objective of the Database Inventory Project is to provide a detailed analysis
of the databases considered Institutionally important. This aim is reached by deep
investigation of the context in which the databases are used.
There are various areas of Information and Knowledge Management in IPGRI, which need
to be strengthened. In particular, there is great demand for Intranet access to administrative
information, such as Personnel, Budgeting, Financing etc. The lack of Institutional
Database Management is strongly felt as a weak point for the Institute, which should
instead, lead the way in this area. In addition, the Intranet should acquire the capability to
search in the Institutional documents as agreed in the last several meetings.
Currently, databases are managed by various responsible persons in different groups and
with poor technical support as well as integration. Some of them such as Contacts Database
exist in various forms in different sites. Other ones are not scalable, shared among staff
members or the tools used to interact with it are obsolete. The obvious result is that a
combined search would currently require a huge manual work.
In addition, if the databases are ever to become a truly valuable information asset in IPGRI
a mechanisms must be put in place for managing and controlling the quality of the content
stored.

Page 6
INFORMATION AS AN ORGANIZATION ASSET
Efficient Organizations nowadays base their processes on the quick and flexible access to
proper Information. Information can become so critical and expensive to produce that it
must be made available to other groups in the Organization “Anytime - Anywhere”.
The process required to make Information available to others is expensive. It is not
justifiable to store all possible pieces of Information for Organizational access. Therefore,
in planning its Databases an Organization should look at its objectives and the processes
required to achieve them to quantify the value of the Information created at each stage.
Once the Organization has decided what Information has a value that justifies its
availability to other staff that Information becomes an asset and should be treated as any
other assets in the Organization.
Can IPGRI recognize itself in this model?
Information is the base of our work. In IPGRI we would not be going very far without it.
IPGRI is a very dispersed and distributed Institution. Several projects make use of
information that can be reutilised in other projects even at the same time.
However, the Institute has seen the growth of such information sets without the use of
Database tools. For example, Word processors and Spreadsheets have and are used to store
tables of different nature. This is all fine, as long as a careful evaluation of the value of the
Information that must be shared is performed on a periodic basis.
As a final statement:
All Information that becomes an Institutional Asset will have to follow a life cycle,
which process is defined in “Appendix A: Database Design” and be based on the
standards described in “Appendix D: Database Standard”.

Page 7
CURRENT STATUS AND REMARKABLE PROBLEMS
In IPGRI, a lot of databases have been developed without any approved standard. Poor
documentation exists, but some patterns can be detected about the current status. These
considerations suggested to perform a deep study of those databases that are considered by
IPGRI institutionally important, achieving a consolidated view of them, gathered in an
inventory. An archive of this information has to be created and maintained for the time
being. From this necessity the Institutional Databases Project arose.
For a definition of the terms used below pleases refer to Appendix E.
These are some of the problems that we found in IPGRI Institutional Databases.
BAD STRUCTURES AND SCHEMES
Many IPGRI Databases are created on bad structures. Any Database Theories have not
been applied. It is evident the lack of the following important Database properties:
I. Correctness
II. Reliability
III. Maintainability
IV. Flexibility
V. Testability
VI. Reusability
VII. Interoperability
Refer to Appendix B for the definitions of these terms.
REDUNDANCY
IPGRI Databases are often duplicated in several versions and the maintenance is often very
heavy and produces incoherent records, because it is necessary to make updates into each
version. Examples: Contacts and Publications databases. A good solution to this problem
is normalisation (see Appendix C).
BAD DATABASE MANAGEMENT SYSTEMS
All databases should be developed using dedicated DBMS, Database Management
Systems, like MS SQL Server, MySQL, MS Access, Oracle, etc. There are many databases
not implemented using these tools, but with different kind of applications like Word
Processors and Spreadsheet Tools. It becomes hard to share and manipulate this
Information.
PURPOSES NOT ALWAYS DEFINED
Every database should satisfy a well-defined purpose. We found many databases without
a general objective accepted by IPGRI as a whole. Reengineering is needed for many
databases, like Europe LOA database.

Page 8
DATA DICTIONARY
There isn’t any data dictionary standard. Data dictionary is a fundamental document that
allows a rapid access to many rates of quality, as interoperability, reliability,
maintainability, flexibility, testability and reusability (see Appendix B).
DOCUMENTATION
Most of the databases don’t have any documentation, neither user manuals nor technical
papers. Data Source Documents are primary importance issues for database administrators,
because they contain all information about the settings used for a particular database.
REFERENCES
Producing correct Databases requires a source of references. These could be created once
and used always, as documents of standards and directives. We didn’t find any Institutional
guidelines about database design.
STANDARD NAMING SCHEMES
IPGRI doesn’t use a naming convention for structures and interfaces of databases
RELATIONSHIPS AMONG DATABASES
We noticed the need to find the proper links among the different databases. But the
existence of multiple Databases on the same topic (redundancy) and the absence of an up-
to-date database inventory make this task more complex.
RELATIONSHIPS AMONG TABLES: ABSENCE OF CASCADING
FEATURES
Relationships among tables of the same database are necessary for referential integrity.
Most of the databases don’t use them to avoid loss of meaning of stored data. Referential
integrity is a feature provided by relational database management systems (RDBMS's)
that prevents users or applications from entering inconsistent data. Most RDBMS's have
various referential integrity rules that can be applied when a relationship is created
between two tables. See the Glossary (Appendix E) for the referential integrities
proprieties.
LOOKUP TABLES
A table in a database that contains recurring values for a specific field should be used as
unique source of stored data. The update of these kind of tables has to be centrally
controlled. This technique is a user convenience that also promotes referential integrity.
STANDARD DATABASES
The use of data, considered as standard source of information by other international
Institutes, is fundamental to produce a good interface between IPGRI and other

Page 9
organizations. In fact the use of structures recognized by other institutes aids the sharing
process and makes IPGRI a good institute to treat with. Agrovoc is an example.
VIEWS
Databases Views are not used except for certain cases. The views permit the extrapolations
of subset data from a database. For example: the Germplasm Database stores information
about Institutes that collect a selected taxon. A view should be all Germplasm data without
Institutes information that couldn’t be useful for a particular application. In data
warehouses, the Data Mart is an evolution of the view concept.
BACKUP AND RECOVERY
Many databases don’t implement any backup policy to preserve data from crashes and
other events.
EXPORTING OF RECORDS BETWEEN TWO USERS
The exporting of records from a database, with the intention to send data between different
users, occurs without a previous agreement of the involved parts. In fact, we found that
many hours are spent converting from a database format to another.
PRIMARY, SECONDARY AND FOREIGN KEYS
Each Table of every database should use correct keys to uniquely individuate tuples in the
data. The Team found many tables without good kind of keys. For example: the current
version of the table tip in the Travel Information Plan database contains ID field as primary
key: the correct key should be travel code that could uniquely identify a TIP.
INDEXING
Indexes are used to order tables or to display them in a specific order, by a data structure
used to give rapid, random access to relations. Indexes are most often used with large
relations and can give high advantages to database queries. We didn’t find any policy on
database indexing.
PERSISTENT QUERIES
Persistent queries should be applied to the queries most used in applications. They should
be implemented directly in DBMS tools in order to obtain speed in data accessing. SQL
Server Stored Procedures and Queries stored in Access are examples of these kind of
queries. This tool is not often implemented in IPGRI databases.
DISTRIBUTE DATABASES AND REPLICATION
Institutional Databases are not always distributed or replicated. All Institutional Databases
that are remotely accessed should be replicated.

Page 10
THE PROJECT
For each Institutional Database the Team will provide a recommendation document with
the following topics:
1) Analysis of the data architecture, interfaces and data entry procedures and tools
with Entity Relationship diagram and analysis of all strengths and limitations.
The main objective is finding holes in the data input that would allow inconsistent
data to be produced.
2) A Data Dictionary covering the entire set of data represented.
3) A list of redundancies on the data architecture and data content obtained as a
comparison among the Databases.
4) Improvements to the Data entry process to support multi-site, multi-user updates
5) A list of suggested improvements including development tools standards.
6) Values of main rates of quality with simple questionnaire.
7) Skills that were required for the design of the data structure and the interface.
8) A Map of redundancies among Databases and a list of suggested database merges
with a list of steps to be taken
A final presentation will be given to Management with summarized results.
A Collaboration web site has been created, with Sharepoint, for quick interaction during
the analysis phase and final delivery of the reports. All users can discuss about the
published documents: databases, documents, spreadsheets can be uploaded and
downloaded and new forums can be created around them to discuss different topics.
Interaction with IPGRI staff to collect survey information and files needed to analyze the
databases, their development and data entry processes are fundamental in this project.
Various databases, such as Contacts, share some common problems, such as being able to
update records data can be viewed by all other parties without requiring a manual merge
process. In these instances, it maybe necessary to implement a different Database
architecture to allow users in different sites to share a Distributed Database that will allow
the selected update of data with an automatic replication. This Distributed Database
architecture will require partitioning of the data tables for controlled update.
We will give a detailed look at the skills that were required for the development of the
existing Databases in the regions and at HQ. Along with suggestions from the various
parties these will constitute the basis for a recommendation on Development tools
standards.

Page 11
The findings will provide the basis for Management to understand the extent of usage of
the Institutional Databases in IPGRI and the reliability of the data content. The presentation
and all the documents produced will represent the basis for the successive activities in this
area.
PROJECT DETAILS
The suggested procedures to obtain the above output are as follows:
A. Identify an initial set of Databases that should be analysed and staff members that
should be ready to provide all the information needed.
B. Send a message to all IPGRI staff advising about the activity, giving the initial list
of Databases and Database contacts and asking for suggestions on what additional
Databases/Staff members should be included in the activity.
C. Interact with all Database contacts to collect a sample of the database along with
user, maintenance and development documentation. In addition, a list of questions
will be sent which will enable to quantify the quality of the data content, any
projected activity or any other addition/fixes that would improve the Database.
D. Creation of the Collaboration web site.
E. Actual analysis takes place.
F. Final presentation.

Page 12
DATA ARCHITECTURE
The mission of Data Architecture is to establish and maintain an adaptable infrastructure
designed to facilitate the access, definition, management, security, and integrity of data
across the state.
INTRODUCTION AND BACKGROUND
Data and information are extremely valuable assets of the institute. Data Architecture
establishes an infrastructure for providing access to high quality, consistent data wherever
and whenever it is needed. This infrastructure is a prerequisite for fulfilling the requirement
for data to be easily accessible and understandable by authorized end users and IPGRI
applications. Data and access to data are focal points for many areas of the Technical
Architecture. Data Architecture influences how data is stored and accessed, including
online input and retrieval, outside application access, backup and recovery, and data
warehouse access. An established Data Architecture is the foundation for many other
components of the IPGRI technical architecture.
Using a good data architecture ensures that data is:
1) Defined consistently across the Institute
2) Re-useable and shareable
3) Accurate and up-to-date
4) Secure
5) Centrally managed
A SIMPLE DATA ARCHITECTURE
The Data Architecture consists of the following technical topics, including the
recommended best practices, implementation guidelines, and standards, as they apply:
1) Data Modeling
2) Metadata
3) Database Management System (DBMS)
4) Data Access Middleware
5) Data Access Implementation
6) Data Security

Page 13
DATA MODELING OVERVIEW
How data is modeled and designed inside an application can significantly impact the way
an application runs and how other applications can access that data. This topic covers a
basic overview of data modeling.
METADATA
The way to describe or define data is through metadata. Metadata is "information about
data". Metadata is stored in a repository containing detailed descriptions about each data
element. A generic implementation is as a data dictionary, with full description of the
database fields. By using the formats described in the metadata repository, whether the
data resides in a single location or in multiple databases across the IPGRI, the same data
management principles apply.
DATABASE MANAGEMENT SYSTEM (DBMS) OVERVIEW
Database Management System (DBMS) addresses the Data Architecture recommendations
for projects selecting, designing, and implementing database management systems. In
order to meet existing and future database needs, a relational database technology is
recommended, particularly for online transactional business applications. An emerging
technology in the database world is the object database technology.
DATA ACCESS MIDDLEWARE OVERVIEW
Data access middleware addresses the Data Architecture recommendations for the
implementation of data access middleware. Data access middleware is the communications
layer between data access programs and databases.
DATA ACCESS IMPLEMENTATION OVERVIEW
The Implementing Data Access topic is a key topic, it’s a fundamental component of every
application. This topic discusses recommendations for implementing data access within an
application and to outside applications.
DATA SECURITY OVERVIEW
Data security is an important piece of the Data Architecture and the application security
model. This topic provides an overview of data security and discusses the best practices
for protecting data.

Page 14
THE DAWN OF A NEW ARCHITECTURE
We will describe here what is suggested as a new Database Architecture that will be able
to solve most of the problems we have listed in the “Current Status” section from the
standpoint of concurrent access and update to the Institutional Databases.
In the past to consolidate data coming from different locations a considerable amount of
time was spent in merging the records manually. This is no required if we set up an
Architecture whereby each location replicates the changes to the other major sites.
ADVANTAGES
1) Staff will be able to export data from the SQL server database present in their
regional office. The exporting can be performed using Microsoft access, Excel or
any other ODBC compliant tool depending on the needs. Therefore, people will be
able to run statistics, create graphs or perform any kind of processing using their
preferred tool.
2) The interface to the Database will be the same for all sites. In practical terms each
site will become a mirror of the others. Staff travelling will be able to access any
of the 6 web sites from the Internet.
DISADVANTAGES
Additional Administration will be required from a group with the know how to manage
SQL Server. This can be accomplished by a good extent using remote control tools, such
as vnc. In addition, once the migration to Windows 2000 will take place at the remote
sites, administration will be possible using the remote control features built into the
Operating System.
Replication
SSA
HQ
APO
Replication
SSAReplication
Americas
Replication
CWANAReplication

Page 15
GENERAL RECOMMENDATIONS
The Database Project Team has defined general recommendations for the databases:
1) A purpose document should exist to describe the reasons of the database, who
created it, where it is stored and how it’s accessed and maintained
2) Only DBMS tools should be used to create the Institutional Databases
3) The creation of each Institutional Database should require the following standard
processes which underline certain related documents:
a. Conceptual Design: Defines the interaction between users and the database to be
created using text and graphics. The documents produced are:
i. Requirements Document
ii. Specification Document
iii. Planning Document
b. Logical Design: indicates which are the data flows between the actors involved
during the interactions. The document produced is:
iv. Entity Relationship Graph
c. Physical Design: includes the physical creation of the database. The documents
produced are:
v. Implementation Document
vi. User and Technical Manuals
vii. Maintenance Document
(For a brief description of the above documents see Appendix A)
4) The exporting of records from a database with the intention to share data, between
different users, should occur with a previous agreement of the involved parts. This
rule is supposed to simplify the importing procedures at the receiving site.
5) Standardization of data interface and data dictionary is required by a standard object
naming scheme and naming convention.
6) Each database major properties should be saved in an inventory. This inventory
should be implemented as a web-enabled database.
7) The use of international databases of standards (like FAO Agrovoc database) is
tempted. In this manner, there will be the bases for international cooperation.

Page 16
8) Planning ahead: when a database is designed, a future development cases should be
considered
Example: Contacts database should be designed for mailing list as well.
9) It is suggested that a special team should be created to support the creation and
maintenance of Institutional Databases.
10) Although not a precise standard, there are some well-defined rules that can be used
for extending the Data Dictionary. See document “How to model People and
Organization” for a sample of this. In addition, initiatives are ongoing to set
standards in this area following the increasing popularity of XML. We will look at
these initiatives and try to find out if they can be of any help in this area.
It can never be stressed enough how important it is to keep up-to-date documentation like
ER diagrams, very useful for showing relationships between tables, and a data dictionary
that describes what each field is used for and any aliases that may exist. Documenting SQL
statements is a must as well. In this manner, the database will be a powerful resource for
all the IPGRI staff.
WHY INST. DATABASES ARE NOT UP TO USERS EXPECTATIONS?
It has been verified that various databases are in use at different sites for the same purpose.
One clear example is the Contacts database. The main reasons for this situation are the
following:
a) Several IPGRI sites are badly served from the communications point of view. It is
difficult for most of the sites to work with sufficient efficiency on a centralized
Database, even if the interface is provided through a web browser using the Internet
as a transport.
b) The lack of Enterprise level applications providing reliability and scalability. This
has resulted in consistent work required to centrally consolidate the data present at
each site.
c) The lack of commitment at HQ on the maintenance of the Institutional Databases
has given rise to independent versions of the databases.

Page 17
INSTITUTIONAL DATABASE
The Database Project Team has defined rules to be applied to the Institutional Database:
DEFINITION
The Institutional Database is the collection of data definite as important for the IPGRI
Institute.
IMPLEMENTATION
IPGRI uses relational database management system (RDBMS) as collector of
institutional databases.
The Institutional Database is implemented as Tables in a adopted RDBMS.
Each Table is defined by Data Definition Language (DDL) SQL or RDBMS wizards and
must contain primary keys.
A relationship diagram has to be published and be available for internal users.
NAMING CONVENTION
DBMS Naming Convention
The structure of Institutional Database will follow a naming convention.
Each institutional database should have a full data dictionary as documentation.
Each interface of Institutional Database will have a default layout and be programmed
using a general naming convention of variables and controls.
From Leszynski Database Naming Convention the following Database naming
convention is assumed:
Only one consented exception is accepted: the table name can be
a. an aggregate name
b. in plural form without any prefix
For example:
Data Type Prefix Example
Tables tbl tblContacts
Views vws vwsEurope
Queries qry qryLookUp
Forms frm frmContacts
Reports rpt rptMain
Macros mcr mcrMySubs
Modules mod modFunction
Stored Proc. sp_ sp_records
Triggers trg trgOnClick
Indexes ind indMYField
Primary Keys ID IDContacts
Database naming convention

Page 18
a. Repository or Warehouse are aggregate names
b. tblContacts and tblCountries can be renamed to Contacts and Countries
Field naming convention
Each field (except for primary keys and foreign keys) of the tables will have only first
letter capitalised and will follow this naming convention:
<type><Singular Table Name><Singular Name>
where
l Types: one value of Database Types naming convention table (see below)
l Singular Table Name: the table name in singular form
l Singular Names: any name with the first letter in upper case, without any
symbol and spaces (like underscore, dollars, etc.). Multiple names will be
linked without spaces: “Telephone and Fax” field will become
txtContactsTelFax
An example of Contacts field: the Surname field could be strContactSurname (note that
the table name in this field name is singular).
In this manner it can be easily decomposed: strContactSurname is a contact field and
contains a text value that is the surname.
Except for the prefixes, all parts have only the first letter in upper case, as you can noted
above, except for abbreviations that are always in capital form.
Only letters must be used, in the following ranges: [a…z] and [A…Z]; space, dot, minus
and other ASCII symbols cannot be used for database structures, except for underscore
symbol (_): keep in mind that SQL Server and other DBMS consider the underscore like
a wildcard and this could be arise some problems accessing to the data.
Sometimes the length of a field could be too long: for example
pktxtCollectingMissionInstOriginalInstColumn: this is the trade off between
transparency, easy-rules and tiresome disadvantages.
Default fields
Some field names are equivalent or synonyms: "remarks" and "comments" for
example. a good rule is to use only one name for the same data.
It’s a good rule build a field name starting from the context.
For example:
A budget has a “Code” and a brief “Description” in a table “Europe” of “LOA” database.
The implementation of these field names are:
l TxtLOABudgetCode
l TxtLOABudgetDescription
As you can see they are built from the “Budget” Prefix. An alphabetic order would
evidence that there are two fields about the Budget of LOA: Code and Description. If we
inverted the order then we could have: TxtLOACodeBudget and
TxtLOADescriptionBudget that are not clear as the first set.

Page 19
Here we are a brief list of common fields and a strong suggestion synonym to use in data
structure:
l Notes, Remarks, Comments: Remarks
l Info, Descriptions: Description
l [field name]ID, ID[field name]: ID[table name]
l Starting period: [prefixes]DateFrom
l Ending period: [prefixes]DateTo
l Telephone: [prefixes]Tel
l Email: [prefixes]Email
l Detail, Details: mem[singular table name]Details
l Update, Updated, InputDate: dat[singular table name]Update. InputDate could be
different from Update name only if it’s relevant the date of the first input of a record.
l URL, Website, http, webaddress: txt[singular table name]URL
Primary and Foreign key naming convention
General primary key is defined as:
ID<Table Name>
It’s used when the tuple (called also record in relational databases) has not a unique
attribute value (called also value of the field in relational databases) that can be used as
primary key (see Entity/Relationship Theory).
The type name is not necessary because this kind of field is always a counter handled by
the RDBMS.
For example: IDMyTables is the primary key of MyTables table. Note that the table
name is in plural form.
When there are attributes as primary keys they can be implemented as
pk<type><Singular Table Name><Singular Name>
For example: Contacts can have PIN (Personal Identification Number) as primary key. It
will become pkintContactPIN. In this manner a brief analysis of the contacts database
structure will evidence that the primary key is the PIN number (it is evident that PIN is an
abbreviation, being in capital form).
The foreign keys will have the name of the related primary key.
At first sight, this rule bring to misunderstand the primary keys from other foreign keys,
but the table name included in each field name will exclude this error.
For example, Contacts table could contain IDCountries as foreign key (IDCountries could
be the primary key of Countries table).
These field types are adopted:

Page 20
Object Naming Convention for Institutional Database Interface programming:
Use above table in programming when you want to reference to a database object.
This variable convention naming table is adopted for Institutional Database Interface
programming:
Data Type Prefix SQLServer Type MSAccess Type Example
Boolean bit bit bool blnAccepted
Byte byt binary yes/no bytPixelValue
Counter idx uniqueidentifier counter idxPrimaryKeys
Currency cur money currency curMoney
Date dat datetime date/time datMyDate
DateTime dtm datetime date/time dtmFirstTime
Double dbl double numeric dblTotalDistance
Float flt float float fltValue
Image img image ole Object imgPhoto
Integer int smallint numeric intCount
Long lng int numeric lngFreeSpace
Memo mem nvarchar memo memComments
Object obj varbinary ole Object objListBox
Smallint sml smallint numeric smlVariable
String str nvarchar text strAddress
Database Types naming convention
Objects Prefix
Connection conn
Database db
Field fld
Group grp
Index idx
Property prop
QueryDef sql
Recordset rs
Relation rel
TableDef td
User usr
Password pwd
Workspace ws
Objects Naming convention

Page 21
This convention naming table is adopted for Institutional Database Interface
programming:
Data standard: the Unicode Standard
Before the development of the Unicode standard, character data was limited to sets of 256
characters. This limitation came from the one-byte storage space used by a single character;
one byte can represent only 256 different bit combinations. The Unicode standard expands
the number of possible values for character data. By doubling the amount of storage space
used for a single character, the Unicode standard exponentially increases the number of
possible character values from 256 to 65,536. With this increased range, the Unicode
standard includes letters, numbers, and symbols used in languages around the world,
including all of the values from the previously existing character sets.
IPGRI will use this code for the data inserted in Institutional Database.
Data Sources
Data, articles and other publications could come from a unique source represented in a
universal format and published using many supports: HTML, papers, etc. It requires some
rules with a complexity that is inversed proportionally to the flexibility. Many publications
like annual report, PGR, newsletters, etc. could be
represented in an unique database and published using XML or other
descriptive languages and supports in different formats. In this way an
Data Type Prefix Example
Boolean bit blnAccepted
Byte byt bytPixelValue
Date or Time dat dtmFirstTime
Double dbl dblTotalDistance
Integer int intCount
Long lng lngFreeSpace
Object obj objListBox
Single sng sngLength
String str strAddress
Variant vrn vrnObject
Error err ErrMessage
Variable naming convention
Scope Prefix Example
Browsing bws bwsMain
Deleting del delMask
Editing edit editInterface
Adding New Record new newInterface
Confirming Questions qst qstIMask
Printing Errors err errMessage
Exiting exi exiMask
Table Lookup tbl tblContacts
Naming convention for interfaces

Page 22
Annual Report Issue could have a unique origin and published in Internet, in
PDF, etc.

Page 23
APPENDIX A: DATABASE DESIGN
The database design is a well-defined standard procedure that arise from different user
needs. The main purpose of a database is storing homogeneous information about a wee-
defined argument, with the aim of sharing these data among different users.
Database design is fundamental to obtain correct specifications and final database that
matches with initial design. It produces some documents that describe various aspects of
the use, the processes and the cases in which the database is used. These aspects are handy
when an IPGRI member would like to know if there are databases that collect some kind
of data.
As a software product, the database should come up from many standard processes that
produced different documents that evidence how the database is created.
Many organization adopted these kind of design for their databases. It’s easy to understand
the sense of that database and how can be accessed the data stored when there are some
papers that explain different aspect of the database. A user searching for particular data
could read the requirement document. A programmer that should access to the database
could read the Specification and Implementation Documents. Finally, accounting staff
should access the Planning document to obtain details about the total cost of the database
without require any additional documents. All the history of the database is included in a
few sheets. Database maintainers often don’t know whether a database is still used or not.
Technical mistakes come often out due to the impossibility to determine the origin of a
particular database previous installed.
To design a good database these standard processes should be as follows:
§ Requirement Process: analyses the current status of the data to be imported in the
database and evidences the needs to be satisfied by the database
§ Specification Process: defines all database features
§ Planning Process: the technologies are defined (DBMS used, interface web-
enabled proprietary, etc.) and the cost is indicated
§ Implementation Process: the database is created and the interface is built
§ Maintenance Process: the database has to be maintained
The documents produced for these processes are the following:
1. Requirement Document: It covers the WHY an implementation of this Database
is being attempted. It contains the needs to be satisfied by the new database, the
current status of the data to be stored and who will benefit from the database.
For example: papers contain a lot of data to be shared among members that need an
independent access to it. It produces a detailed list of the necessities to be satisfied,

Page 24
with all the advantages gained from the final implemented database. This document
is fundamental for all people that want to know the purposes of the database and
can avoid redundancy and duplication of data to be stored.
2. Specification Document: It covers the HOW an implementation of this Database
will be used. It evidences the characteristics of the final database, its features, how
the data is accessed, without specifying the technologies used during the
implementation. Entity Relationships, UML and ORM show detailed description
of the database structure.
3. Planning Document: shows tools used for hosting the data, with the costs of the
creation and maintenance task. This document clarifies which technology will be
used.
4. Creation and Integration Document or Implementation Document: It contains
a brief description of the implementation (DMBS used, database name, complete
path in the server, name of tools used to access data, like ASP applications, and
technical information used by technicians that maintain the database, like technical
documentation and user manual).
5. Maintenance Document: describes how the maintenance is performed; if the
maintenance process is executed outside the Institute, this document will be the
contract of maintenance.
Requirement
Process
Specification
Process
Planning
Process
Requirement
Document
Specification
Document
Planning
Document
Implementation
Process
C&I
Document
Maintenance
Process
Maintenance
Document

Page 25
APPENDIX B: DATABASE EVALUATION BY METRICS
All databases are weighable defining metrics parameters. The quality of a database is
defined by different rates of quality.
The main important ones are:
§ Correctness: indicates whether exists matching between specification and
implementation.
§ Reliability: concerns fault tolerance, data coherent, etc.
§ Integrity: evaluates the security of the data from non-authorized attack.
§ Maintainability: how the database is maintained.
§ Flexibility: concerns expandability, modular propriety, etc.
§ Testability: the database structure and his collocation should be assented by
technical staff.
§ Reusability: the database could be used for other purposes. For example Contacts
Database should be accessed by mailing-list tools.
§ Interoperability: indicates the relationships with other databases
Testing mentioned quality attributes needs to be spread in engineering criterions to be
evaluated using checklist methods. Every attribute is judged giving a weight to all sub
attributes that constitute above rates. This checklist are simply questionnaires made up by
technical staff.
For example: Flexibility has this sub attributes:
a) Consistency
b) Complexity
c) Generality
d) Modularity
e) Auto-documentation
The relative checklist questions could be:
i. Is the database produced following IPGRI standard techniques?

Page 26
ii. Is the structure comprehensible?
iii. Is the database usable for other requirements?
iv. Are the tables decomposable?
v. Can a user comprehend the meaning of the database without access to other
documents?
The value of the flexibility rate is defined as:
Vflex =
Doing this method for all rates the quality of a database is correctly defined.
ghtsanswer weimaximumofsum
ghtsanswer weiofsum

Page 27
APPENDIX C: REDUNDANCY AND NORMALISATION
One of the objectives in designing a relational database is the reduction of duplication in
the stored data. Duplicated data items represent redundancy. That is, duplicated items take
up more storage space than is absolutely necessary. We might put up with this loss in
storage space if it were not for a more significant consequence of redundancy.
If a data item is stored in more than one place then, when we need to change that item we
must do so in every location that it is to be found. The more copies there are the more
difficult this is. If we miss just one copy then the database is in an invalid state (being
incoherent) and there is no easy way to know which of the stored versions is correct.
In relational database design, the process of organizing data to minimize redundancy.
Normalization usually involves dividing a database into two or more tables and defining
relationships between the tables. The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table and then propagated through the
rest of the database via the defined relationships. There are three main normal forms, each
with increasing levels of normalization:
a. First Normal Form (1NF): Each field in a table contains different information.
For example, in an employee list, each table would contain only one birth date field.
b. Second Normal Form (2NF): No field values can be derived from another field.
For example, if a table already included a birth date field, it could not also include
a birth year field, since this information would be redundant.
c. Third Normal Form (3NF): No duplicate information is permitted. So, for
example, if two tables both require a birth date field, the birth date information
would be separated into a separate table, and the two other tables would then access
the birth date information via an index field in the birth date table. Any change to a
birth date would automatically be reflect in all tables that link to the birth date table.
NORMALISATION
A technique exists by which we can arrange our data so that redundancy is minimised.
Normalisation arranges the data in a succession of normal forms. Each normal form further
reduces the degree of duplication.
The first step is to make sure data is in First Normal Form (1NF). This is quite easy as
we simply make sure there are no repeating groups. Any groups which repeat are placed in
a separate table. The trick here is to look at the key. The key is the attribute which can be
used to uniquely determine the row of the table that we are interested in. If there are any
repeating groups then the key does not adequately determine the contents of a row and the
table is not in 1NF. Now, since an E/R model of the data is necessary, entities has a primary
key. The key was chosen so that it did uniquely identify each occurrence and so the entity
should be in 1NF already.

Page 28
EXAMPLE
A manufacturing company makes products from a variety of components. Each product
has a unique product number, a name and an assembly time. Each component has a unique
component number, a description, a supplier code and a price. Assume we have an entity
definition from our data modelling which looks like this:
Product (ProdCode, Name, Time, ComponentCode, Description, Quantity, Supplier, Cost)
The primary key is the product code ProdCode but look at some typical occurrences of this
entity and we will see some problems. An example tuple of the relation defined above is:
(325,Trolley,0.35,B1378,Wheel,6,S2341,0.22)
While the product code uniquely identifies each product it does not serve to identify each
occurrence of the Product entity therefore the entity definition is not in 1NF. The problem
is caused by the fact that some of the (non-key) attributes are not dependant upon the
primary key. For example, the description, supplier and cost are dependant upon the
component code and the quantity is dependent upon the combination of the product code
and the component code. Dependency in this case is about how we can work out one
attribute once given another. For example, ProductCode 325 tells us we are dealing with a
Trolley but not which supplier(s). If we are given the ComponentCode though, we can
determine which Supplier.
Thus, the Supplier depends upon the ComponentCode. The reverse is not true and if we
know the Supplier we cannot determine the either the ComponentCode or the ProductCode.
We can group together those attributes where there are some dependencies by writing a list
of functional dependencies. In order to get the definition into 1NF we need to extract those
groups which repeat and put them into an entity of their own. In this case the last five
attributes form a repeated group for each instance of a product code and must go into a
separate entity. We must take care however to take a copy of the primary key as that will
be needed to form a link between the two new entities. Our two new entity definitions are:
§ Product (ProductCode, Name, Time)
§ Component (ProductCode, ComponentCode, Description, Quantity, Supplier,
Cost)
You may complain that the new entity Component contains a repeating product code.
However, this is now a necessary part of the primary key of Component and represents the
least duplication we can have and still maintain a link between the two entities. Each
product now has its name and assembly time stored only once so that if that changes we
only have to change it in one place. There is a further step that we can take. You should
notice that, in the occurrence entity, the Description, Supplier and Cost get repeated
because they depend on part of the primary key not all of it. We can transform the
Component entity into two new 2NF entities. A 2NF entity is one where all the (non-key)

Page 29
attributes depend on all of the primary key. We shall extract the offending attributes and
create two new entities with the following definitions:
§ Parts (ProductCode, ComponentCode, Quantity)
§ Component (ComponentCode, Description, Supplier, Cost)
Notice that Quantity is a function of both ProductCode and ComponentCode and that
Description, Supplier and Cost are functions only of the ComponentCode.
The tables will now look like this:
§ Parts (ProdCode CompCode Quantity ) ; Tuple Example : (325, B1378, 6)
§ Component (CompCode Description Supplier Cost ); Tuple Example: (B1378,
Wheel, S2341, 0.22)
There is a third stage that can be applied although our data now satisfies the conditions for
that and there is little else we can do to remove redundancy. The important thing to realise
is that our data is now stored in a way that minimises the amount of duplication.
That will help to maintain the integrity of the database and the quality of the data. The
complete definition and the occurrence tables are shown on the next page. Compare them
with the original definition and table carefully and note the differences.
The normalised database has now the following entities:
§ Product (ProductCode, Name, Time)
§ Parts (ProductCode, ComponentCode, Quantity)
§ Component (ComponentCode, Description, Supplier, Cost)
NORMALIZE PROCESS TO ELIMINATE REDUNDANCY
Normalization process helps eliminate the redundancy of data in a database by ensuring
that all fields in a table are atomic. There are several forms of normalization, but the Third
Normal Form (3NF) is generally regarded as providing the best compromise between
performance, extensibility, and data integrity. Briefly, 3NF states that:
§ Each value in a table is to be represented only once
§ Each row in a table should be uniquely identifiable. (It should have a unique key)
§ No non-key information that relies upon another key should be stored in the table
Databases in 3NF are characterized by a group of tables storing related data that is joined
together through keys.

Page 30
For example, a 3NF database for storing customers and their related orders would likely
have two tables: Customer and Order.
The Order table would not contain any information about an order’s related customer.
Instead, it would store the key that identifies the row containing the customer’s information
in the Customer table.
Higher levels of normalization exist, but is not always necessarily better. In fact, for some
projects, even 3NF may introduce too much complexity into the database to be worth the
rewards.

Page 31
APPENDIX D: DATABASE STANDARDS
Here we will look at the standards to be used in the preparation of the documents listed in
Appendix A on Database Design and the products and interfaces that are to be used for the
implementation itself. Database design can be a very complex task, but it starts with an
important iteration with the final users. In fact, the project leader in a Database project
should be chosen as a champion in the area where the Database is going to be used.
For this purpose the first step in a Database Project is to map the requirements to a
Conceptual Model.
The Conceptual model has nothing to do with technology and a lot with trying to capture
what kind of information we want to store to solve our business problem. Several models
have been created, each with its strengths and weaknesses but none has emerged as the best
in all-possible situations. Therefore, it is most probable that more than one model will have
to be used in this area. See the document “Evaluation of modelling Techniques” for details.
Currently, we are oriented toward using the ER (Entity Relationship) and the ORM (Object
Role Modeling) models.
See “Entity Relationship diagrams documentation and presentation” at
http://dec.bournemouth.ac.uk/staff/kcox/ERDs/index.htm and the document “Modeling,
Data Semantics and Natural Language” for another analysis of the Conceptual models and
for details on ORM.
ER is best for quick reference and maintenance and it is widely known between developers
while ORM is best for the interaction with the users. ORM allows also to model the
business rules that apply over the information.
After the Conceptual model is ready it can be mapped to a Logical model.
At this stage, we have to choose the type of Database system we are going to use such as
Hierarchical, Networked or Relational. Because, the industry has been orienting itself
toward the Relational model already from a long time we really do not have much choice
here. The Relational model is mathematically well founded and it has given rise to a
number of important standards that allow the cooperation of different products in the same
application.In particular, SQL (Standard Query Language) is a declarative language that
can be used to work on the Relational Databases and, although very powerful it is oriented
toward the final user. For more information on SQL see the document “Introduction to
Structured Query Language”.
At a third stage we will have to map the Logical model to the Physical model. Now we
have to make our choice of a product. We have come to this point after having created the
most important documentation using models that are product independent. To help us
further, we have to adopt products based on standard interfaces, such as ODBC, which will
simplify the migration to new products in the future. In addition, other requirements will
become important at this stage such as:
a) Network support group
b) Know How
c) The ability to replicate data on slow links

Page 32
d) Security granularity requirements
e) Decision support components
f) Data Warehousing capabilities
The Conceptual model defines Information at the highest level of abstraction while the
Physical model describes the details at the lowest level of abstraction. Due to their
importance for the success of a Database Design project the models chosen must be kept
always in sync.
Organizational Decision makers can take great advantage from the central consolidation of
Information stored in different products. This way they can perform high level analysis on
the Information which is already available in different areas of the Organization. This is
the target of Data warehouses. However, letting different products talk to each other is
requires standards.
For this reason, Database vendors started to include a Data Repository that stores all Meta
data information about the databases along with the Databases themselves. Fortunately,
lately we have assisted in this area to the consolidation of standards under the unique CWM
(Common Warehouse Metamodel) from the OMG (Object management Group).
Go to http://www.omg.org/cwm/ for details on OMG.
See the document “Database Metadata Standard” for details. The CWM standardizes a
complete, comprehensive metamodel that enables data mining across database boundaries
at an enterprise and goes well beyond. Like a UML profile but in data space instead of
application space, it forms the MDA mapping to database schemas. The product of a
cooperative effort between OMG and the Meta-Data Coalition (MDC), the CWM does for
data modelling what UML does for application modelling.
The models outlined above focus on the information used by the processes but do not give
any tool to describe the processes themselves in any way. UML (Unified Modelling
Language) is the mostly accepted and supported way of defining the Business rules and
processes. UML is object based which allow the model to be easily mapped to modern
object oriented languages like Java. In fact, UML along with XML (eXtensible Markup
Language) and XMI (XML Metadata Interchange) are used as a base for the CWM
standard mentioned above.

Page 33
APPENDIX E: GLOSSARY
ANSI (AMERICAN NATIONAL STANDARDS INSTITUTE): An association formed
by the American Government and industry to produce and disseminate widely used
industrial standards.
ATTRIBUTE: a noun describing a value which will be found in each tuple in a relation.
Usually represented as a column of a relation. It’s a property that can assume values for
entities or relationships. Entities can be assigned several attributes .
CANDIDATE KEY: one or more attributes which will uniquely identify one tuple in a
relation. A candidate key is a potential primary key.
COLUMN: A component of a table that holds a single attribute of the table.
COMPOSITE KEY: A key in a database table made up of several fields. Same as
concatenated key.
CONCEPTUAL VIEW: The schema of a database
DATA: A recording of facts, concepts, or instructions on a storage medium for
communication, retrieval, and processing by automatic means and presentation as
information that is understandable by human being.
DATA AGGREGATE: A collection of Data items.
DATA DICTIONARY: It’s contains definitions of Data, the relationship of one category
of data to another, the attributes and keys of groups of data, and so forth. Software tools
for recording these information are used.
DATA ELEMENT: A uniquely named and well-defined category of data that consists of
data items, and that is included in the record of an activity.
DATA ENTRY: The process of entering data into a computerized database or spreadsheet.
Data entry can be performed by an individual typing at a keyboard or by a machine entering
data electronically.
DATA MINING: Term for a class of database applications that look for hidden patterns
in a group of data. For example, data mining software can help retail companies find
customers with common interests. The term is commonly misused to describe software that
presents data in new ways. True data mining software doesn't just change the presentation,
but actually discovers previously unknown relationships among the data.
DATA MART, DATAMART: A database, or collection of databases, designed to help
managers make strategic decisions about their business. Whereas a data warehouse

Page 34
combines databases across an entire enterprise, data marts are usually smaller and focus on
a particular subject or department. Some data marts, called dependent data marts, are
subsets of larger data warehouses.
DATA MODEL: 1) the logical data structures, including operations and constraints
provided by a DBMS for effective Database processing. 2) The system used for the
representation of Data (e.g., the ERD or relational model). A data model is an abstract
representation of the data used by an organization, such that a meaningful interpretation of
the data may be made by the model's readers. The data model may be at a conceptual,
external or internal level (as defined by ANSI).
DATA SOURCE: The source where the data to be accessed is stored. A generic name for
data whether stored in a conventional data source (such as Oracle or in a file system such
as RMS and VSAM). The name given to a data source in the binding.
DATA WAREHOUSE: A copy of transaction data specifically structured for query and
analysis. A collection of data designed to support management decision-making. Data
warehouses contain a wide variety of data that present a coherent picture of business
conditions at a single point in time. Development of a data warehouse includes
development of systems to extract data from operating systems plus installation of a
warehouse database system that provides managers flexible access to the data. The term
data warehousing generally refers to combine many different databases across an entire
enterprise. Contrast with data mart.
DATABASE: 1) a collection of all the data needed by a person or organization to perform
needed functions 2) a collection of related files 3) any collection of data organized to
answer queries 4) (informally) a database management system
DATABASE MANAGER: 1) the person with primary responsibility for the design,
construction, and maintenance of a database. 2) (informally) a database management
system.
DENORMALISATION: To allow redundancy in a table so that table can remain flat,
rather than normalized.
DB2, DB3, DB4: The IBM relational database systems
DBMS(DATABASE MANAGEMENT SYSTEM): Also called database manager, it’s
an integrated collection of programs designed to allow people to design databases, enter
and maintain data, and perform queries. It contains the tools to manage the data and the
structures by DML and DDL.
DDL (DATABASE DEFINITION LANGUAGE): it’s the language used to define
database tables structures, relationships, triggers, procedures needed to build the skeleton
of the database.

Page 35
DML (DATA MANIPULATION LANGUAGE): this language is used to perform query
to databases.
Distributed Database: A database in which the resources are stored on more than one
computer system, often at different physical locations.
ENTITY: a real-world object, observation, transaction, or person about which data are to
be stored in a database.
ENTITY-RELATIONSHIP (ER OR ERD) DIAGRAM: design tool used primarily for
relational databases in which entities are modeled as geometric shapes and the relationships
between them are shown as labeled arcs. It’s a model of an organization’s data in which
the objective has been to remove all repeated values by creating more tables.
FIELD: term used by Access as a synonym for attribute.
FILE: 1) the separately named unit of storage for all data and programs on most computers.
For example, a relation or a whole database may be stored in one file. 2) term used as a
synonym for relation in some (particularly older) database managers, like dBase.
INCOHERENT DATA: a value of an attribute that doesn’t reflect the real state of the
data. An incorrect address of a contact or a couple of similar data are two examples of
incoherent data.
INDEX: 1) a method used to reorder tuples or to display them in a specific order 2) a data
structure used to give rapid, random access to relations. Indexes are most often used with
large relations.
JOIN: An operation that takes two relations as operands and produces a new relation by
concatenating the tuples and matching the corresponding columns when a stated condition
holds between the two. It uses data from more than one relation (table). The relations must
have at least one attribute (called the join or linking attribute) in common.
KEY: an attribute or combination of attributes. A combination of their values will be used
to select tuples from a relation.
MANY-TO-MANY RELATIONSHIP: One or more tuples in one relation may be related
to one or more tuples in a second relation by a common value of a join attribute. This
implies that each value of the join attribute may appear any number of times in either
relation or in both.
NORMAL FORM: 1) a condition of relations and databases intended to reduce data
redundancy and improve performance 2) The method of normalizing a database. There are
three main normal forms: First, Second, and Third. First Normal Form says that each field
in a table must contain different information. Second Normal Form says that no field values
can be derived from another field. The Third Normal Form says that no duplicate

Page 36
information is permitted within two or more tables. Normalized tables are linked using key
fields.
NORMALIZE: The process of removing redundancy in data by separating the data into
multiple tables, decomposing complex data structures into natural structures.
ODBC (OBJECT DATABASE CONNECTIVITY): A standard interface between a
database and an application that is trying to access the data in that database. ODBC is
defined by an international (ISO) and a national (ANSI) standard. The moist recent version
is called SQL-92.
ONE-TO-MANY RELATIONSHIP: exactly one tuple in one relation is related by a
common join attribute to many tuples in another relation. This implies that each value of
the join attribute is unique in the first relation but not necessarily unique in the second.
ONE-TO-ONE RELATIONSHIP: exactly one tuple in one relation is related by a
common join attribute to exactly one tuple in another relation. This implies that each value
of the join attribute appears no more than once in each of the relations.
PERSISTENT QUERY: a query which is stored for reuse
PRIMARY KEY: a key such that the value of the key attribute(s) will uniquely identify
any tuple in the relation. A relation must not have more than one primary key.
QUERY: literally, a question. 1) a command, written in a query language, for the database
to present a specified subset of the data in the database. 2) the subset of data produced as
output in response to a query
QUERY LANGUAGE: a computer language which can be used to express queries.
QUERY RESOLUTION: the process of collecting the data needed to answer a query.
RECORD: term used as a synonym for tuple in some (particularly older) database
management systems, like dBase.
RECURSIVE QUERY: a query in which the output of the query is then used as input for
the same query.
RDBMS(RELATIONAL DATABASE MANAGEMENT SYSTEM): see Database
Management System and Relational Database.
RECORD: In database management systems, a complete set of information. Records are
composed of fields, each of which contains one item of information. A set of records
constitutes a file. For example, a personnel file might contain records that have three fields:
a name field, an address field, and a phone number field. In relational database management
systems, records are called tuples.

Page 37
REDUNDANCY: A feature provided by relational database management systems
(RDBMS's) that prevents users or applications from entering inconsistent data. Most
RDBMS's have various referential integrity rules that you can apply when you create a
relationship between two tables. It’s the practice of storing more than one occurrence of
data. In the case where data can be updated, redundancy poses serious problems. In the
case where data is not updated, redundancy is often a valuable and necessary design tool.
The duplication of data in the database to improve the ease and speed of access to data can
arise the risk that changes may cause conflicting values.
REFERENTIAL INTEGRITY: A feature provided by relational database management
systems (RDBMS's) that prevents users or applications from entering inconsistent data. An
integrity mechanism ensuring vital data in a database, such as the unique identifier for a
given piece of data, remains accurate and usable as the database changes. Referential
integrity involves managing corresponding data values between tables when the foreign
key of a table contains the same values as the primary key of another table. For example,
suppose Table B has a foreign key that points to a field in Table A. Referential integrity
would prevent from adding a record to Table B that cannot be linked to Table A. In
addition, the referential integrity rules might also specify that whenever you delete a record
from Table A, any records in Table B that are linked to the deleted record will also be
deleted. This is called cascading delete. Finally, the referential integrity rules could specify
that whenever you modify the value of a linked field in Table A, all records in Table B that
are linked to it will also be modified accordingly. This is called cascading update.
RELATION: the basic collection of data in a relational database. Usually represented as
a rectangular array of data, in which each row (tuple) is a collection of data about one entity
RELATIONAL DATABASE: A type of database management system (DBMS) that store
data in the form of related table. Relational databases are powerful because they require
few assumptions about how data is related or how it will be extracted from the database.
As a result, the same database can be viewed in many different ways. An important feature
of relational system is that a single database can be spread across several tables. This differs
from flat-file databases, in which each database is self-contained in a single table.
REPLICATION: Duplication of table schema and data or stored procedure definitions
and calls from a source database to a destination database, usually on separate servers.
ROW: term used by Access as a synonym for tuple
RUNNING A QUERY: Term for query resolution
SCHEMA: 1).a description of a database. It specifies (among other things) the relations,
their attributes, and the domains of the attributes. In some database systems, the join
attributes are also specified as part of the schema. 2) the description of one relation
SECONDARY KEY: a key which is not the primary key for a relation.

Page 38
SELECT: a query in which only some of the tuples in the source relation appear in the
output
SEQUEL: see SQL
STORED PROCEDURE: In database management systems (DBMSs) , an operation that
is stored with the database server. Typically, stored procedures are written in SQL. Stored
procedures execute faster than ordinary SQL requests because they have been compiled
and optimized by the server. By keeping the requests on the SQL server, they don't have to
be coded into the user's front end, thereby allowing the program to load and execute faster.
Stored procedures are an important element in load balancing.
SQL: pronounced 'Sequel', stands for Sequential Query Language, the most common text-
based database query language The SQL is both a DDL and a DML languages. DLL is
defined by CREATE, ALTER statements and other commands. DML is represented by
SELECT, UPDATE, DELETE, INSERT statements and so on. There are different
standards of SQL: ANSI-SQL, T-SQL, etc.
TABLE: A relation that consists of a set of columns with a heading and a set of rows (i.e.,
tuples). It’s a noun used as a synonym for relation in relational theory.
TECHNICAL REENGINEERING: an organizational restructuring based on a
fundamental re-examination of why a database exists.
TRANSACTION: 1) the fundamental unit of change in many (transaction-oriented)
databases. A single transaction may involve changes in several relations, all of which must
be made simultaneously in order for the database to be internally consistent and correct. 2)
the real-life event which is modeled by the changes to the database.
TRIGGER: A detectable event that causes another action to happen. For instance,
changing a discount rate in a grocery store's inventory database may cause an alert to be
emailed to a manager.
TUPLE: within a relation, a collection of all the facts related to one entity. Usually
represented as a row of data. In relational database systems, a record. See record.
VALUE: the computer representation of a fact about an entity.

Page 39
COMMENTS
Database IPGRI_DB:
• Does “Centres” table contain characteristics of Institutes?
• Is the CROPCODE equal to SPECIE in AMS?
• IND_PROF e IND_TAGS not included
• Delete fields of “ipgriregion” from institute and contact
• INST: Name_nat can be considered as “txtInstituteName”
• PktxtCountryISOCodeCode is country.cty??
Database IPGRIAddressBook:
• The “Flags” table contains the field names of ipgri_db
“contacts” table
• USERLOG could be treated as “user” table
Database IPGRI-Eur-PopulusClones:
• What is Clone and Accession?
• “Clones” table skipped
• A country is defined by: ISOCODE, CTY field or by name?
• An institute is defined by: ISOCODE or INSTCODE field?
• Must be “Main” table renamed to “Accession” ?
• Main: “INSTITUTION WHERE MAINTAINED” where is
the source of institutions?
• “Main” table is not completely processed
• “Maintenance” table processed
Database IPGRIWEB:
l ANNUALREPORT skipped
l ANNUALREPORT_TEXT skipped
l conflict_PUBSURVEY skipped
l conflict_TRAINSURVEY skipped
l conflict_WEBSURVEY skipped
l COUNTRY contain institutional Country table translated in
different languages: useful. ISO3 field is equal to
Country.cty.
l The table CROPS_TYPE and ipgri_db.material contain the
same kind of data?
l Is CROP tabela subset of CROPS_TYPE?
l CROPS table must be renamed to “CropNames”
l EVENT table: forced the primary key to match with
ipgri_db.events
l GENEFLOW used in website: skipped
l Should IPGRI Users have only one table for the access to all
IPGRI web-applications?
l About language: many tables have different translation of a
particular name. I.e. crop can be mentioned in English, Spain
or French. Could we create an external database of all
international names?

Page 40
l IPGRIWEB.OFFICE_LOCATION.COUNTRY links to
COUNTRY name either than country cty.
l OWNER table should be merged with Acronym table?
l PGR skipped
l PGR_ARTICLE skipped
l PUBBLICATION contains articles published in the website
l PUBBLICATION skipped
l PUBBLICATION_OWNER (linked to PUBBLICATION)
is skipped
l PUBSURVEY skipped
l RESOURCE skipped
l SERIE skipped
l STAFF: should id_ownergroup and id_ownergroup2 linked
to GROUPS in TIP?
l Does STAFF contain only and all IPGRI members?
l STAFF must be revisited
l THEME skipped
l TOPIC skipped
l TRAINING skipped: purpose not clear
l TRUSTEES skipped: purpose not clear
l URL_OWNER skipped
l URL_REGION: regions are ipgriregions?
l URL_TOPIC skipped
l USETYPE skipped
l WEBPAGE, WEBPAGE_AGROVOC,
WEBPAGE_COUNTRY, WEBPAGE_CROP,
WEBPAGE_EVENT, WEBPAGE_INSTITUTIONAL,
WEBPAGE_NETWORK, WEBPAGE_REGION,
WEBPAGE_RESOURCE, WEBPAGE_THEME,
WEBPAGE_TRAINING, WEBSURVEY skipped
IPGRNewsletter skipped
Presupuesto skipped
Proyectos database:
l Pbcatcol skipped: purpose not clear
l Pbcatedt skipped: purpose not clear
l Pbcatfmt skipped: purpose not clear
l Pbcattbl skipped: purpose not clear
l pbcatvld skipped: purpose not clear

Institutional database analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Institutional database analysis

Similar to Institutional database analysis (20)

Recently uploaded

Recently uploaded (20)

Institutional database analysis