SlideShare a Scribd company logo
INSTITUTIONAL
DATABASES
FOUNDATIONS FOR THE LIFE CYCLE
Massimo Buonaiuto
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 2
SUMMARY
SUMMARY................................................................................................................... 2
INTRODUCTION ........................................................................................................ 5
INFORMATION AS AN ORGANIZATION ASSET ................................................. 6
CURRENT STATUS AND REMARKABLE PROBLEMS........................................ 7
BAD STRUCTURES AND SCHEMES.............................................................................................. 7
REDUNDANCY ............................................................................................................................... 7
BAD DATABASE MANAGEMENT SYSTEMS............................................................................... 7
PURPOSES NOT ALWAYS DEFINED............................................................................................. 7
DATA DICTIONARY....................................................................................................................... 8
DOCUMENTATION......................................................................................................................... 8
REFERENCES .................................................................................................................................. 8
STANDARD NAMING SCHEMES................................................................................................... 8
RELATIONSHIPS AMONG DATABASES....................................................................................... 8
RELATIONSHIPS AMONG TABLES: ABSENCE OF CASCADING FEATURES .......................... 8
LOOKUP TABLES............................................................................................................................ 8
STANDARD DATABASES .............................................................................................................. 8
VIEWS.............................................................................................................................................. 9
BACKUP AND RECOVERY............................................................................................................. 9
EXPORTING OF RECORDS BETWEEN TWO USERS.................................................................... 9
PRIMARY, SECONDARY AND FOREIGN KEYS........................................................................... 9
INDEXING ....................................................................................................................................... 9
PERSISTENT QUERIES................................................................................................................... 9
DISTRIBUTE DATABASES AND REPLICATION .......................................................................... 9
THE PROJECT .......................................................................................................... 10
PROJECT DETAILS ....................................................................................................................... 11
PROJECT TEAM..........................................................................ERROR! BOOKMARK NOT DEFINED.
DATA ARCHITECTURE.......................................................................................... 12
INTRODUCTION AND BACKGROUND....................................................................................... 12
A SIMPLE DATA ARCHITECTURE .............................................................................................. 12
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 3
DATA MODELING OVERVIEW.................................................................................................... 13
METADATA................................................................................................................................... 13
DATABASE MANAGEMENT SYSTEM (DBMS) OVERVIEW..................................................... 13
DATA ACCESS MIDDLEWARE OVERVIEW............................................................................... 13
DATA ACCESS IMPLEMENTATION OVERVIEW....................................................................... 13
DATA SECURITY OVERVIEW ..................................................................................................... 13
THE DAWN OF A NEW ARCHITECTURE ........................................................... 14
ADVANTAGES.............................................................................................................................. 14
DISADVANTAGES........................................................................................................................ 14
GENERAL RECOMMENDATIONS........................................................................ 15
WHY INST. DATABASES ARE NOT UP TO USERS EXPECTATIONS? ...........................16
INSTITUTIONAL DATABASE ................................................................................ 17
DEFINITION................................................................................................................................... 17
IMPLEMENTATION...................................................................................................................... 17
NAMING CONVENTION............................................................................................................... 17
DBMS Naming Convention.......................................................................................... 17
Field naming convention............................................................................................... 18
Default fields ................................................................................................................ 18
Primary and Foreign key naming convention ................................................................ 19
Data standard: the Unicode Standard............................................................................. 21
Data Sources................................................................................................................. 21
APPENDIX A: DATABASE DESIGN....................................................................... 23
APPENDIX B: DATABASE EVALUATION BY METRICS .................................. 25
APPENDIX C: REDUNDANCY AND NORMALISATION.................................... 27
NORMALISATION ........................................................................................................................ 27
EXAMPLE...................................................................................................................................... 28
APPENDIX D: DATABASE STANDARDS.............................................................. 31
APPENDIX E: GLOSSARY ...................................................................................... 33
COMMENTS .............................................................................................................. 39
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 4
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 5
INTRODUCTION
IPGRI is an international research institute of United Nations with a mandate to advance
the conservation and use of genetic diversity for the well being of present and future
generations.
IPGRI aims to meet three major objectives:
§ Countries, particularly developing countries, can better assess and meet their own
plant genetic resources needs
§ International collaboration in the conservation and use of genetic resources is
strengthened
§ Knowledge and technologies relevant to the improved conservation and use of plant
genetic resources are developed and disseminated
The information collected by scientists and all people who work in IPGRI, are fundamental
for IPGRI mission. All these data are stored in different kind of supports: databases,
documents, papers, and backup tools.
The primary objective of the Database Inventory Project is to provide a detailed analysis
of the databases considered Institutionally important. This aim is reached by deep
investigation of the context in which the databases are used.
There are various areas of Information and Knowledge Management in IPGRI, which need
to be strengthened. In particular, there is great demand for Intranet access to administrative
information, such as Personnel, Budgeting, Financing etc. The lack of Institutional
Database Management is strongly felt as a weak point for the Institute, which should
instead, lead the way in this area. In addition, the Intranet should acquire the capability to
search in the Institutional documents as agreed in the last several meetings.
Currently, databases are managed by various responsible persons in different groups and
with poor technical support as well as integration. Some of them such as Contacts Database
exist in various forms in different sites. Other ones are not scalable, shared among staff
members or the tools used to interact with it are obsolete. The obvious result is that a
combined search would currently require a huge manual work.
In addition, if the databases are ever to become a truly valuable information asset in IPGRI
a mechanisms must be put in place for managing and controlling the quality of the content
stored.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 6
INFORMATION AS AN ORGANIZATION ASSET
Efficient Organizations nowadays base their processes on the quick and flexible access to
proper Information. Information can become so critical and expensive to produce that it
must be made available to other groups in the Organization “Anytime - Anywhere”.
The process required to make Information available to others is expensive. It is not
justifiable to store all possible pieces of Information for Organizational access. Therefore,
in planning its Databases an Organization should look at its objectives and the processes
required to achieve them to quantify the value of the Information created at each stage.
Once the Organization has decided what Information has a value that justifies its
availability to other staff that Information becomes an asset and should be treated as any
other assets in the Organization.
Can IPGRI recognize itself in this model?
Information is the base of our work. In IPGRI we would not be going very far without it.
IPGRI is a very dispersed and distributed Institution. Several projects make use of
information that can be reutilised in other projects even at the same time.
However, the Institute has seen the growth of such information sets without the use of
Database tools. For example, Word processors and Spreadsheets have and are used to store
tables of different nature. This is all fine, as long as a careful evaluation of the value of the
Information that must be shared is performed on a periodic basis.
As a final statement:
All Information that becomes an Institutional Asset will have to follow a life cycle,
which process is defined in “Appendix A: Database Design” and be based on the
standards described in “Appendix D: Database Standard”.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 7
CURRENT STATUS AND REMARKABLE PROBLEMS
In IPGRI, a lot of databases have been developed without any approved standard. Poor
documentation exists, but some patterns can be detected about the current status. These
considerations suggested to perform a deep study of those databases that are considered by
IPGRI institutionally important, achieving a consolidated view of them, gathered in an
inventory. An archive of this information has to be created and maintained for the time
being. From this necessity the Institutional Databases Project arose.
For a definition of the terms used below pleases refer to Appendix E.
These are some of the problems that we found in IPGRI Institutional Databases.
BAD STRUCTURES AND SCHEMES
Many IPGRI Databases are created on bad structures. Any Database Theories have not
been applied. It is evident the lack of the following important Database properties:
I. Correctness
II. Reliability
III. Maintainability
IV. Flexibility
V. Testability
VI. Reusability
VII. Interoperability
Refer to Appendix B for the definitions of these terms.
REDUNDANCY
IPGRI Databases are often duplicated in several versions and the maintenance is often very
heavy and produces incoherent records, because it is necessary to make updates into each
version. Examples: Contacts and Publications databases. A good solution to this problem
is normalisation (see Appendix C).
BAD DATABASE MANAGEMENT SYSTEMS
All databases should be developed using dedicated DBMS, Database Management
Systems, like MS SQL Server, MySQL, MS Access, Oracle, etc. There are many databases
not implemented using these tools, but with different kind of applications like Word
Processors and Spreadsheet Tools. It becomes hard to share and manipulate this
Information.
PURPOSES NOT ALWAYS DEFINED
Every database should satisfy a well-defined purpose. We found many databases without
a general objective accepted by IPGRI as a whole. Reengineering is needed for many
databases, like Europe LOA database.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 8
DATA DICTIONARY
There isn’t any data dictionary standard. Data dictionary is a fundamental document that
allows a rapid access to many rates of quality, as interoperability, reliability,
maintainability, flexibility, testability and reusability (see Appendix B).
DOCUMENTATION
Most of the databases don’t have any documentation, neither user manuals nor technical
papers. Data Source Documents are primary importance issues for database administrators,
because they contain all information about the settings used for a particular database.
REFERENCES
Producing correct Databases requires a source of references. These could be created once
and used always, as documents of standards and directives. We didn’t find any Institutional
guidelines about database design.
STANDARD NAMING SCHEMES
IPGRI doesn’t use a naming convention for structures and interfaces of databases
RELATIONSHIPS AMONG DATABASES
We noticed the need to find the proper links among the different databases. But the
existence of multiple Databases on the same topic (redundancy) and the absence of an up-
to-date database inventory make this task more complex.
RELATIONSHIPS AMONG TABLES: ABSENCE OF CASCADING
FEATURES
Relationships among tables of the same database are necessary for referential integrity.
Most of the databases don’t use them to avoid loss of meaning of stored data. Referential
integrity is a feature provided by relational database management systems (RDBMS's)
that prevents users or applications from entering inconsistent data. Most RDBMS's have
various referential integrity rules that can be applied when a relationship is created
between two tables. See the Glossary (Appendix E) for the referential integrities
proprieties.
LOOKUP TABLES
A table in a database that contains recurring values for a specific field should be used as
unique source of stored data. The update of these kind of tables has to be centrally
controlled. This technique is a user convenience that also promotes referential integrity.
STANDARD DATABASES
The use of data, considered as standard source of information by other international
Institutes, is fundamental to produce a good interface between IPGRI and other
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 9
organizations. In fact the use of structures recognized by other institutes aids the sharing
process and makes IPGRI a good institute to treat with. Agrovoc is an example.
VIEWS
Databases Views are not used except for certain cases. The views permit the extrapolations
of subset data from a database. For example: the Germplasm Database stores information
about Institutes that collect a selected taxon. A view should be all Germplasm data without
Institutes information that couldn’t be useful for a particular application. In data
warehouses, the Data Mart is an evolution of the view concept.
BACKUP AND RECOVERY
Many databases don’t implement any backup policy to preserve data from crashes and
other events.
EXPORTING OF RECORDS BETWEEN TWO USERS
The exporting of records from a database, with the intention to send data between different
users, occurs without a previous agreement of the involved parts. In fact, we found that
many hours are spent converting from a database format to another.
PRIMARY, SECONDARY AND FOREIGN KEYS
Each Table of every database should use correct keys to uniquely individuate tuples in the
data. The Team found many tables without good kind of keys. For example: the current
version of the table tip in the Travel Information Plan database contains ID field as primary
key: the correct key should be travel code that could uniquely identify a TIP.
INDEXING
Indexes are used to order tables or to display them in a specific order, by a data structure
used to give rapid, random access to relations. Indexes are most often used with large
relations and can give high advantages to database queries. We didn’t find any policy on
database indexing.
PERSISTENT QUERIES
Persistent queries should be applied to the queries most used in applications. They should
be implemented directly in DBMS tools in order to obtain speed in data accessing. SQL
Server Stored Procedures and Queries stored in Access are examples of these kind of
queries. This tool is not often implemented in IPGRI databases.
DISTRIBUTE DATABASES AND REPLICATION
Institutional Databases are not always distributed or replicated. All Institutional Databases
that are remotely accessed should be replicated.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 10
THE PROJECT
For each Institutional Database the Team will provide a recommendation document with
the following topics:
1) Analysis of the data architecture, interfaces and data entry procedures and tools
with Entity Relationship diagram and analysis of all strengths and limitations.
The main objective is finding holes in the data input that would allow inconsistent
data to be produced.
2) A Data Dictionary covering the entire set of data represented.
3) A list of redundancies on the data architecture and data content obtained as a
comparison among the Databases.
4) Improvements to the Data entry process to support multi-site, multi-user updates
5) A list of suggested improvements including development tools standards.
6) Values of main rates of quality with simple questionnaire.
7) Skills that were required for the design of the data structure and the interface.
8) A Map of redundancies among Databases and a list of suggested database merges
with a list of steps to be taken
A final presentation will be given to Management with summarized results.
A Collaboration web site has been created, with Sharepoint, for quick interaction during
the analysis phase and final delivery of the reports. All users can discuss about the
published documents: databases, documents, spreadsheets can be uploaded and
downloaded and new forums can be created around them to discuss different topics.
Interaction with IPGRI staff to collect survey information and files needed to analyze the
databases, their development and data entry processes are fundamental in this project.
Various databases, such as Contacts, share some common problems, such as being able to
update records data can be viewed by all other parties without requiring a manual merge
process. In these instances, it maybe necessary to implement a different Database
architecture to allow users in different sites to share a Distributed Database that will allow
the selected update of data with an automatic replication. This Distributed Database
architecture will require partitioning of the data tables for controlled update.
We will give a detailed look at the skills that were required for the development of the
existing Databases in the regions and at HQ. Along with suggestions from the various
parties these will constitute the basis for a recommendation on Development tools
standards.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 11
The findings will provide the basis for Management to understand the extent of usage of
the Institutional Databases in IPGRI and the reliability of the data content. The presentation
and all the documents produced will represent the basis for the successive activities in this
area.
PROJECT DETAILS
The suggested procedures to obtain the above output are as follows:
A. Identify an initial set of Databases that should be analysed and staff members that
should be ready to provide all the information needed.
B. Send a message to all IPGRI staff advising about the activity, giving the initial list
of Databases and Database contacts and asking for suggestions on what additional
Databases/Staff members should be included in the activity.
C. Interact with all Database contacts to collect a sample of the database along with
user, maintenance and development documentation. In addition, a list of questions
will be sent which will enable to quantify the quality of the data content, any
projected activity or any other addition/fixes that would improve the Database.
D. Creation of the Collaboration web site.
E. Actual analysis takes place.
F. Final presentation.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 12
DATA ARCHITECTURE
The mission of Data Architecture is to establish and maintain an adaptable infrastructure
designed to facilitate the access, definition, management, security, and integrity of data
across the state.
INTRODUCTION AND BACKGROUND
Data and information are extremely valuable assets of the institute. Data Architecture
establishes an infrastructure for providing access to high quality, consistent data wherever
and whenever it is needed. This infrastructure is a prerequisite for fulfilling the requirement
for data to be easily accessible and understandable by authorized end users and IPGRI
applications. Data and access to data are focal points for many areas of the Technical
Architecture. Data Architecture influences how data is stored and accessed, including
online input and retrieval, outside application access, backup and recovery, and data
warehouse access. An established Data Architecture is the foundation for many other
components of the IPGRI technical architecture.
Using a good data architecture ensures that data is:
1) Defined consistently across the Institute
2) Re-useable and shareable
3) Accurate and up-to-date
4) Secure
5) Centrally managed
A SIMPLE DATA ARCHITECTURE
The Data Architecture consists of the following technical topics, including the
recommended best practices, implementation guidelines, and standards, as they apply:
1) Data Modeling
2) Metadata
3) Database Management System (DBMS)
4) Data Access Middleware
5) Data Access Implementation
6) Data Security
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 13
DATA MODELING OVERVIEW
How data is modeled and designed inside an application can significantly impact the way
an application runs and how other applications can access that data. This topic covers a
basic overview of data modeling.
METADATA
The way to describe or define data is through metadata. Metadata is "information about
data". Metadata is stored in a repository containing detailed descriptions about each data
element. A generic implementation is as a data dictionary, with full description of the
database fields. By using the formats described in the metadata repository, whether the
data resides in a single location or in multiple databases across the IPGRI, the same data
management principles apply.
DATABASE MANAGEMENT SYSTEM (DBMS) OVERVIEW
Database Management System (DBMS) addresses the Data Architecture recommendations
for projects selecting, designing, and implementing database management systems. In
order to meet existing and future database needs, a relational database technology is
recommended, particularly for online transactional business applications. An emerging
technology in the database world is the object database technology.
DATA ACCESS MIDDLEWARE OVERVIEW
Data access middleware addresses the Data Architecture recommendations for the
implementation of data access middleware. Data access middleware is the communications
layer between data access programs and databases.
DATA ACCESS IMPLEMENTATION OVERVIEW
The Implementing Data Access topic is a key topic, it’s a fundamental component of every
application. This topic discusses recommendations for implementing data access within an
application and to outside applications.
DATA SECURITY OVERVIEW
Data security is an important piece of the Data Architecture and the application security
model. This topic provides an overview of data security and discusses the best practices
for protecting data.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 14
THE DAWN OF A NEW ARCHITECTURE
We will describe here what is suggested as a new Database Architecture that will be able
to solve most of the problems we have listed in the “Current Status” section from the
standpoint of concurrent access and update to the Institutional Databases.
In the past to consolidate data coming from different locations a considerable amount of
time was spent in merging the records manually. This is no required if we set up an
Architecture whereby each location replicates the changes to the other major sites.
ADVANTAGES
1) Staff will be able to export data from the SQL server database present in their
regional office. The exporting can be performed using Microsoft access, Excel or
any other ODBC compliant tool depending on the needs. Therefore, people will be
able to run statistics, create graphs or perform any kind of processing using their
preferred tool.
2) The interface to the Database will be the same for all sites. In practical terms each
site will become a mirror of the others. Staff travelling will be able to access any
of the 6 web sites from the Internet.
DISADVANTAGES
Additional Administration will be required from a group with the know how to manage
SQL Server. This can be accomplished by a good extent using remote control tools, such
as vnc. In addition, once the migration to Windows 2000 will take place at the remote
sites, administration will be possible using the remote control features built into the
Operating System.
Replication
SSA
HQ
APO
Replication
SSAReplication
Americas
Replication
CWANAReplication
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 15
GENERAL RECOMMENDATIONS
The Database Project Team has defined general recommendations for the databases:
1) A purpose document should exist to describe the reasons of the database, who
created it, where it is stored and how it’s accessed and maintained
2) Only DBMS tools should be used to create the Institutional Databases
3) The creation of each Institutional Database should require the following standard
processes which underline certain related documents:
a. Conceptual Design: Defines the interaction between users and the database to be
created using text and graphics. The documents produced are:
i. Requirements Document
ii. Specification Document
iii. Planning Document
b. Logical Design: indicates which are the data flows between the actors involved
during the interactions. The document produced is:
iv. Entity Relationship Graph
c. Physical Design: includes the physical creation of the database. The documents
produced are:
v. Implementation Document
vi. User and Technical Manuals
vii. Maintenance Document
(For a brief description of the above documents see Appendix A)
4) The exporting of records from a database with the intention to share data, between
different users, should occur with a previous agreement of the involved parts. This
rule is supposed to simplify the importing procedures at the receiving site.
5) Standardization of data interface and data dictionary is required by a standard object
naming scheme and naming convention.
6) Each database major properties should be saved in an inventory. This inventory
should be implemented as a web-enabled database.
7) The use of international databases of standards (like FAO Agrovoc database) is
tempted. In this manner, there will be the bases for international cooperation.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 16
8) Planning ahead: when a database is designed, a future development cases should be
considered
Example: Contacts database should be designed for mailing list as well.
9) It is suggested that a special team should be created to support the creation and
maintenance of Institutional Databases.
10) Although not a precise standard, there are some well-defined rules that can be used
for extending the Data Dictionary. See document “How to model People and
Organization” for a sample of this. In addition, initiatives are ongoing to set
standards in this area following the increasing popularity of XML. We will look at
these initiatives and try to find out if they can be of any help in this area.
It can never be stressed enough how important it is to keep up-to-date documentation like
ER diagrams, very useful for showing relationships between tables, and a data dictionary
that describes what each field is used for and any aliases that may exist. Documenting SQL
statements is a must as well. In this manner, the database will be a powerful resource for
all the IPGRI staff.
WHY INST. DATABASES ARE NOT UP TO USERS EXPECTATIONS?
It has been verified that various databases are in use at different sites for the same purpose.
One clear example is the Contacts database. The main reasons for this situation are the
following:
a) Several IPGRI sites are badly served from the communications point of view. It is
difficult for most of the sites to work with sufficient efficiency on a centralized
Database, even if the interface is provided through a web browser using the Internet
as a transport.
b) The lack of Enterprise level applications providing reliability and scalability. This
has resulted in consistent work required to centrally consolidate the data present at
each site.
c) The lack of commitment at HQ on the maintenance of the Institutional Databases
has given rise to independent versions of the databases.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 17
INSTITUTIONAL DATABASE
The Database Project Team has defined rules to be applied to the Institutional Database:
DEFINITION
The Institutional Database is the collection of data definite as important for the IPGRI
Institute.
IMPLEMENTATION
IPGRI uses relational database management system (RDBMS) as collector of
institutional databases.
The Institutional Database is implemented as Tables in a adopted RDBMS.
Each Table is defined by Data Definition Language (DDL) SQL or RDBMS wizards and
must contain primary keys.
A relationship diagram has to be published and be available for internal users.
NAMING CONVENTION
DBMS Naming Convention
The structure of Institutional Database will follow a naming convention.
Each institutional database should have a full data dictionary as documentation.
Each interface of Institutional Database will have a default layout and be programmed
using a general naming convention of variables and controls.
From Leszynski Database Naming Convention the following Database naming
convention is assumed:
Only one consented exception is accepted: the table name can be
a. an aggregate name
b. in plural form without any prefix
For example:
Data Type Prefix Example
Tables tbl tblContacts
Views vws vwsEurope
Queries qry qryLookUp
Forms frm frmContacts
Reports rpt rptMain
Macros mcr mcrMySubs
Modules mod modFunction
Stored Proc. sp_ sp_records
Triggers trg trgOnClick
Indexes ind indMYField
Primary Keys ID IDContacts
Database naming convention
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 18
a. Repository or Warehouse are aggregate names
b. tblContacts and tblCountries can be renamed to Contacts and Countries
Field naming convention
Each field (except for primary keys and foreign keys) of the tables will have only first
letter capitalised and will follow this naming convention:
<type><Singular Table Name><Singular Name>
where
l Types: one value of Database Types naming convention table (see below)
l Singular Table Name: the table name in singular form
l Singular Names: any name with the first letter in upper case, without any
symbol and spaces (like underscore, dollars, etc.). Multiple names will be
linked without spaces: “Telephone and Fax” field will become
txtContactsTelFax
An example of Contacts field: the Surname field could be strContactSurname (note that
the table name in this field name is singular).
In this manner it can be easily decomposed: strContactSurname is a contact field and
contains a text value that is the surname.
Except for the prefixes, all parts have only the first letter in upper case, as you can noted
above, except for abbreviations that are always in capital form.
Only letters must be used, in the following ranges: [a…z] and [A…Z]; space, dot, minus
and other ASCII symbols cannot be used for database structures, except for underscore
symbol (_): keep in mind that SQL Server and other DBMS consider the underscore like
a wildcard and this could be arise some problems accessing to the data.
Sometimes the length of a field could be too long: for example
pktxtCollectingMissionInstOriginalInstColumn: this is the trade off between
transparency, easy-rules and tiresome disadvantages.
Default fields
Some field names are equivalent or synonyms: "remarks" and "comments" for
example. a good rule is to use only one name for the same data.
It’s a good rule build a field name starting from the context.
For example:
A budget has a “Code” and a brief “Description” in a table “Europe” of “LOA” database.
The implementation of these field names are:
l TxtLOABudgetCode
l TxtLOABudgetDescription
As you can see they are built from the “Budget” Prefix. An alphabetic order would
evidence that there are two fields about the Budget of LOA: Code and Description. If we
inverted the order then we could have: TxtLOACodeBudget and
TxtLOADescriptionBudget that are not clear as the first set.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 19
Here we are a brief list of common fields and a strong suggestion synonym to use in data
structure:
l Notes, Remarks, Comments: Remarks
l Info, Descriptions: Description
l [field name]ID, ID[field name]: ID[table name]
l Starting period: [prefixes]DateFrom
l Ending period: [prefixes]DateTo
l Telephone: [prefixes]Tel
l Email: [prefixes]Email
l Detail, Details: mem[singular table name]Details
l Update, Updated, InputDate: dat[singular table name]Update. InputDate could be
different from Update name only if it’s relevant the date of the first input of a record.
l URL, Website, http, webaddress: txt[singular table name]URL
Primary and Foreign key naming convention
General primary key is defined as:
ID<Table Name>
It’s used when the tuple (called also record in relational databases) has not a unique
attribute value (called also value of the field in relational databases) that can be used as
primary key (see Entity/Relationship Theory).
The type name is not necessary because this kind of field is always a counter handled by
the RDBMS.
For example: IDMyTables is the primary key of MyTables table. Note that the table
name is in plural form.
When there are attributes as primary keys they can be implemented as
pk<type><Singular Table Name><Singular Name>
For example: Contacts can have PIN (Personal Identification Number) as primary key. It
will become pkintContactPIN. In this manner a brief analysis of the contacts database
structure will evidence that the primary key is the PIN number (it is evident that PIN is an
abbreviation, being in capital form).
The foreign keys will have the name of the related primary key.
At first sight, this rule bring to misunderstand the primary keys from other foreign keys,
but the table name included in each field name will exclude this error.
For example, Contacts table could contain IDCountries as foreign key (IDCountries could
be the primary key of Countries table).
These field types are adopted:
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 20
Object Naming Convention for Institutional Database Interface programming:
Use above table in programming when you want to reference to a database object.
This variable convention naming table is adopted for Institutional Database Interface
programming:
Data Type Prefix SQLServer Type MSAccess Type Example
Boolean bit bit bool blnAccepted
Byte byt binary yes/no bytPixelValue
Counter idx uniqueidentifier counter idxPrimaryKeys
Currency cur money currency curMoney
Date dat datetime date/time datMyDate
DateTime dtm datetime date/time dtmFirstTime
Double dbl double numeric dblTotalDistance
Float flt float float fltValue
Image img image ole Object imgPhoto
Integer int smallint numeric intCount
Long lng int numeric lngFreeSpace
Memo mem nvarchar memo memComments
Object obj varbinary ole Object objListBox
Smallint sml smallint numeric smlVariable
String str nvarchar text strAddress
Database Types naming convention
Objects Prefix
Connection conn
Database db
Field fld
Group grp
Index idx
Property prop
QueryDef sql
Recordset rs
Relation rel
TableDef td
User usr
Password pwd
Workspace ws
Objects Naming convention
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 21
This convention naming table is adopted for Institutional Database Interface
programming:
Data standard: the Unicode Standard
Before the development of the Unicode standard, character data was limited to sets of 256
characters. This limitation came from the one-byte storage space used by a single character;
one byte can represent only 256 different bit combinations. The Unicode standard expands
the number of possible values for character data. By doubling the amount of storage space
used for a single character, the Unicode standard exponentially increases the number of
possible character values from 256 to 65,536. With this increased range, the Unicode
standard includes letters, numbers, and symbols used in languages around the world,
including all of the values from the previously existing character sets.
IPGRI will use this code for the data inserted in Institutional Database.
Data Sources
Data, articles and other publications could come from a unique source represented in a
universal format and published using many supports: HTML, papers, etc. It requires some
rules with a complexity that is inversed proportionally to the flexibility. Many publications
like annual report, PGR, newsletters, etc. could be
represented in an unique database and published using XML or other
descriptive languages and supports in different formats. In this way an
Data Type Prefix Example
Boolean bit blnAccepted
Byte byt bytPixelValue
Date or Time dat dtmFirstTime
Double dbl dblTotalDistance
Integer int intCount
Long lng lngFreeSpace
Object obj objListBox
Single sng sngLength
String str strAddress
Variant vrn vrnObject
Error err ErrMessage
Variable naming convention
Scope Prefix Example
Browsing bws bwsMain
Deleting del delMask
Editing edit editInterface
Adding New Record new newInterface
Confirming Questions qst qstIMask
Printing Errors err errMessage
Exiting exi exiMask
Table Lookup tbl tblContacts
Naming convention for interfaces
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 22
Annual Report Issue could have a unique origin and published in Internet, in
PDF, etc.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 23
APPENDIX A: DATABASE DESIGN
The database design is a well-defined standard procedure that arise from different user
needs. The main purpose of a database is storing homogeneous information about a wee-
defined argument, with the aim of sharing these data among different users.
Database design is fundamental to obtain correct specifications and final database that
matches with initial design. It produces some documents that describe various aspects of
the use, the processes and the cases in which the database is used. These aspects are handy
when an IPGRI member would like to know if there are databases that collect some kind
of data.
As a software product, the database should come up from many standard processes that
produced different documents that evidence how the database is created.
Many organization adopted these kind of design for their databases. It’s easy to understand
the sense of that database and how can be accessed the data stored when there are some
papers that explain different aspect of the database. A user searching for particular data
could read the requirement document. A programmer that should access to the database
could read the Specification and Implementation Documents. Finally, accounting staff
should access the Planning document to obtain details about the total cost of the database
without require any additional documents. All the history of the database is included in a
few sheets. Database maintainers often don’t know whether a database is still used or not.
Technical mistakes come often out due to the impossibility to determine the origin of a
particular database previous installed.
To design a good database these standard processes should be as follows:
§ Requirement Process: analyses the current status of the data to be imported in the
database and evidences the needs to be satisfied by the database
§ Specification Process: defines all database features
§ Planning Process: the technologies are defined (DBMS used, interface web-
enabled proprietary, etc.) and the cost is indicated
§ Implementation Process: the database is created and the interface is built
§ Maintenance Process: the database has to be maintained
The documents produced for these processes are the following:
1. Requirement Document: It covers the WHY an implementation of this Database
is being attempted. It contains the needs to be satisfied by the new database, the
current status of the data to be stored and who will benefit from the database.
For example: papers contain a lot of data to be shared among members that need an
independent access to it. It produces a detailed list of the necessities to be satisfied,
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 24
with all the advantages gained from the final implemented database. This document
is fundamental for all people that want to know the purposes of the database and
can avoid redundancy and duplication of data to be stored.
2. Specification Document: It covers the HOW an implementation of this Database
will be used. It evidences the characteristics of the final database, its features, how
the data is accessed, without specifying the technologies used during the
implementation. Entity Relationships, UML and ORM show detailed description
of the database structure.
3. Planning Document: shows tools used for hosting the data, with the costs of the
creation and maintenance task. This document clarifies which technology will be
used.
4. Creation and Integration Document or Implementation Document: It contains
a brief description of the implementation (DMBS used, database name, complete
path in the server, name of tools used to access data, like ASP applications, and
technical information used by technicians that maintain the database, like technical
documentation and user manual).
5. Maintenance Document: describes how the maintenance is performed; if the
maintenance process is executed outside the Institute, this document will be the
contract of maintenance.
Requirement
Process
Specification
Process
Planning
Process
Requirement
Document
Specification
Document
Planning
Document
Implementation
Process
C&I
Document
Maintenance
Process
Maintenance
Document
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 25
APPENDIX B: DATABASE EVALUATION BY METRICS
All databases are weighable defining metrics parameters. The quality of a database is
defined by different rates of quality.
The main important ones are:
§ Correctness: indicates whether exists matching between specification and
implementation.
§ Reliability: concerns fault tolerance, data coherent, etc.
§ Integrity: evaluates the security of the data from non-authorized attack.
§ Maintainability: how the database is maintained.
§ Flexibility: concerns expandability, modular propriety, etc.
§ Testability: the database structure and his collocation should be assented by
technical staff.
§ Reusability: the database could be used for other purposes. For example Contacts
Database should be accessed by mailing-list tools.
§ Interoperability: indicates the relationships with other databases
Testing mentioned quality attributes needs to be spread in engineering criterions to be
evaluated using checklist methods. Every attribute is judged giving a weight to all sub
attributes that constitute above rates. This checklist are simply questionnaires made up by
technical staff.
For example: Flexibility has this sub attributes:
a) Consistency
b) Complexity
c) Generality
d) Modularity
e) Auto-documentation
The relative checklist questions could be:
i. Is the database produced following IPGRI standard techniques?
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 26
ii. Is the structure comprehensible?
iii. Is the database usable for other requirements?
iv. Are the tables decomposable?
v. Can a user comprehend the meaning of the database without access to other
documents?
The value of the flexibility rate is defined as:
Vflex =
Doing this method for all rates the quality of a database is correctly defined.
ghtsanswer weimaximumofsum
ghtsanswer weiofsum
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 27
APPENDIX C: REDUNDANCY AND NORMALISATION
One of the objectives in designing a relational database is the reduction of duplication in
the stored data. Duplicated data items represent redundancy. That is, duplicated items take
up more storage space than is absolutely necessary. We might put up with this loss in
storage space if it were not for a more significant consequence of redundancy.
If a data item is stored in more than one place then, when we need to change that item we
must do so in every location that it is to be found. The more copies there are the more
difficult this is. If we miss just one copy then the database is in an invalid state (being
incoherent) and there is no easy way to know which of the stored versions is correct.
In relational database design, the process of organizing data to minimize redundancy.
Normalization usually involves dividing a database into two or more tables and defining
relationships between the tables. The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table and then propagated through the
rest of the database via the defined relationships. There are three main normal forms, each
with increasing levels of normalization:
a. First Normal Form (1NF): Each field in a table contains different information.
For example, in an employee list, each table would contain only one birth date field.
b. Second Normal Form (2NF): No field values can be derived from another field.
For example, if a table already included a birth date field, it could not also include
a birth year field, since this information would be redundant.
c. Third Normal Form (3NF): No duplicate information is permitted. So, for
example, if two tables both require a birth date field, the birth date information
would be separated into a separate table, and the two other tables would then access
the birth date information via an index field in the birth date table. Any change to a
birth date would automatically be reflect in all tables that link to the birth date table.
NORMALISATION
A technique exists by which we can arrange our data so that redundancy is minimised.
Normalisation arranges the data in a succession of normal forms. Each normal form further
reduces the degree of duplication.
The first step is to make sure data is in First Normal Form (1NF). This is quite easy as
we simply make sure there are no repeating groups. Any groups which repeat are placed in
a separate table. The trick here is to look at the key. The key is the attribute which can be
used to uniquely determine the row of the table that we are interested in. If there are any
repeating groups then the key does not adequately determine the contents of a row and the
table is not in 1NF. Now, since an E/R model of the data is necessary, entities has a primary
key. The key was chosen so that it did uniquely identify each occurrence and so the entity
should be in 1NF already.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 28
EXAMPLE
A manufacturing company makes products from a variety of components. Each product
has a unique product number, a name and an assembly time. Each component has a unique
component number, a description, a supplier code and a price. Assume we have an entity
definition from our data modelling which looks like this:
Product (ProdCode, Name, Time, ComponentCode, Description, Quantity, Supplier, Cost)
The primary key is the product code ProdCode but look at some typical occurrences of this
entity and we will see some problems. An example tuple of the relation defined above is:
(325,Trolley,0.35,B1378,Wheel,6,S2341,0.22)
While the product code uniquely identifies each product it does not serve to identify each
occurrence of the Product entity therefore the entity definition is not in 1NF. The problem
is caused by the fact that some of the (non-key) attributes are not dependant upon the
primary key. For example, the description, supplier and cost are dependant upon the
component code and the quantity is dependent upon the combination of the product code
and the component code. Dependency in this case is about how we can work out one
attribute once given another. For example, ProductCode 325 tells us we are dealing with a
Trolley but not which supplier(s). If we are given the ComponentCode though, we can
determine which Supplier.
Thus, the Supplier depends upon the ComponentCode. The reverse is not true and if we
know the Supplier we cannot determine the either the ComponentCode or the ProductCode.
We can group together those attributes where there are some dependencies by writing a list
of functional dependencies. In order to get the definition into 1NF we need to extract those
groups which repeat and put them into an entity of their own. In this case the last five
attributes form a repeated group for each instance of a product code and must go into a
separate entity. We must take care however to take a copy of the primary key as that will
be needed to form a link between the two new entities. Our two new entity definitions are:
§ Product (ProductCode, Name, Time)
§ Component (ProductCode, ComponentCode, Description, Quantity, Supplier,
Cost)
You may complain that the new entity Component contains a repeating product code.
However, this is now a necessary part of the primary key of Component and represents the
least duplication we can have and still maintain a link between the two entities. Each
product now has its name and assembly time stored only once so that if that changes we
only have to change it in one place. There is a further step that we can take. You should
notice that, in the occurrence entity, the Description, Supplier and Cost get repeated
because they depend on part of the primary key not all of it. We can transform the
Component entity into two new 2NF entities. A 2NF entity is one where all the (non-key)
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 29
attributes depend on all of the primary key. We shall extract the offending attributes and
create two new entities with the following definitions:
§ Parts (ProductCode, ComponentCode, Quantity)
§ Component (ComponentCode, Description, Supplier, Cost)
Notice that Quantity is a function of both ProductCode and ComponentCode and that
Description, Supplier and Cost are functions only of the ComponentCode.
The tables will now look like this:
§ Parts (ProdCode CompCode Quantity ) ; Tuple Example : (325, B1378, 6)
§ Component (CompCode Description Supplier Cost ); Tuple Example: (B1378,
Wheel, S2341, 0.22)
There is a third stage that can be applied although our data now satisfies the conditions for
that and there is little else we can do to remove redundancy. The important thing to realise
is that our data is now stored in a way that minimises the amount of duplication.
That will help to maintain the integrity of the database and the quality of the data. The
complete definition and the occurrence tables are shown on the next page. Compare them
with the original definition and table carefully and note the differences.
The normalised database has now the following entities:
§ Product (ProductCode, Name, Time)
§ Parts (ProductCode, ComponentCode, Quantity)
§ Component (ComponentCode, Description, Supplier, Cost)
NORMALIZE PROCESS TO ELIMINATE REDUNDANCY
Normalization process helps eliminate the redundancy of data in a database by ensuring
that all fields in a table are atomic. There are several forms of normalization, but the Third
Normal Form (3NF) is generally regarded as providing the best compromise between
performance, extensibility, and data integrity. Briefly, 3NF states that:
§ Each value in a table is to be represented only once
§ Each row in a table should be uniquely identifiable. (It should have a unique key)
§ No non-key information that relies upon another key should be stored in the table
Databases in 3NF are characterized by a group of tables storing related data that is joined
together through keys.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 30
For example, a 3NF database for storing customers and their related orders would likely
have two tables: Customer and Order.
The Order table would not contain any information about an order’s related customer.
Instead, it would store the key that identifies the row containing the customer’s information
in the Customer table.
Higher levels of normalization exist, but is not always necessarily better. In fact, for some
projects, even 3NF may introduce too much complexity into the database to be worth the
rewards.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 31
APPENDIX D: DATABASE STANDARDS
Here we will look at the standards to be used in the preparation of the documents listed in
Appendix A on Database Design and the products and interfaces that are to be used for the
implementation itself. Database design can be a very complex task, but it starts with an
important iteration with the final users. In fact, the project leader in a Database project
should be chosen as a champion in the area where the Database is going to be used.
For this purpose the first step in a Database Project is to map the requirements to a
Conceptual Model.
The Conceptual model has nothing to do with technology and a lot with trying to capture
what kind of information we want to store to solve our business problem. Several models
have been created, each with its strengths and weaknesses but none has emerged as the best
in all-possible situations. Therefore, it is most probable that more than one model will have
to be used in this area. See the document “Evaluation of modelling Techniques” for details.
Currently, we are oriented toward using the ER (Entity Relationship) and the ORM (Object
Role Modeling) models.
See “Entity Relationship diagrams documentation and presentation” at
http://dec.bournemouth.ac.uk/staff/kcox/ERDs/index.htm and the document “Modeling,
Data Semantics and Natural Language” for another analysis of the Conceptual models and
for details on ORM.
ER is best for quick reference and maintenance and it is widely known between developers
while ORM is best for the interaction with the users. ORM allows also to model the
business rules that apply over the information.
After the Conceptual model is ready it can be mapped to a Logical model.
At this stage, we have to choose the type of Database system we are going to use such as
Hierarchical, Networked or Relational. Because, the industry has been orienting itself
toward the Relational model already from a long time we really do not have much choice
here. The Relational model is mathematically well founded and it has given rise to a
number of important standards that allow the cooperation of different products in the same
application.In particular, SQL (Standard Query Language) is a declarative language that
can be used to work on the Relational Databases and, although very powerful it is oriented
toward the final user. For more information on SQL see the document “Introduction to
Structured Query Language”.
At a third stage we will have to map the Logical model to the Physical model. Now we
have to make our choice of a product. We have come to this point after having created the
most important documentation using models that are product independent. To help us
further, we have to adopt products based on standard interfaces, such as ODBC, which will
simplify the migration to new products in the future. In addition, other requirements will
become important at this stage such as:
a) Network support group
b) Know How
c) The ability to replicate data on slow links
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 32
d) Security granularity requirements
e) Decision support components
f) Data Warehousing capabilities
The Conceptual model defines Information at the highest level of abstraction while the
Physical model describes the details at the lowest level of abstraction. Due to their
importance for the success of a Database Design project the models chosen must be kept
always in sync.
Organizational Decision makers can take great advantage from the central consolidation of
Information stored in different products. This way they can perform high level analysis on
the Information which is already available in different areas of the Organization. This is
the target of Data warehouses. However, letting different products talk to each other is
requires standards.
For this reason, Database vendors started to include a Data Repository that stores all Meta
data information about the databases along with the Databases themselves. Fortunately,
lately we have assisted in this area to the consolidation of standards under the unique CWM
(Common Warehouse Metamodel) from the OMG (Object management Group).
Go to http://www.omg.org/cwm/ for details on OMG.
See the document “Database Metadata Standard” for details. The CWM standardizes a
complete, comprehensive metamodel that enables data mining across database boundaries
at an enterprise and goes well beyond. Like a UML profile but in data space instead of
application space, it forms the MDA mapping to database schemas. The product of a
cooperative effort between OMG and the Meta-Data Coalition (MDC), the CWM does for
data modelling what UML does for application modelling.
The models outlined above focus on the information used by the processes but do not give
any tool to describe the processes themselves in any way. UML (Unified Modelling
Language) is the mostly accepted and supported way of defining the Business rules and
processes. UML is object based which allow the model to be easily mapped to modern
object oriented languages like Java. In fact, UML along with XML (eXtensible Markup
Language) and XMI (XML Metadata Interchange) are used as a base for the CWM
standard mentioned above.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 33
APPENDIX E: GLOSSARY
ANSI (AMERICAN NATIONAL STANDARDS INSTITUTE): An association formed
by the American Government and industry to produce and disseminate widely used
industrial standards.
ATTRIBUTE: a noun describing a value which will be found in each tuple in a relation.
Usually represented as a column of a relation. It’s a property that can assume values for
entities or relationships. Entities can be assigned several attributes .
CANDIDATE KEY: one or more attributes which will uniquely identify one tuple in a
relation. A candidate key is a potential primary key.
COLUMN: A component of a table that holds a single attribute of the table.
COMPOSITE KEY: A key in a database table made up of several fields. Same as
concatenated key.
CONCEPTUAL VIEW: The schema of a database
DATA: A recording of facts, concepts, or instructions on a storage medium for
communication, retrieval, and processing by automatic means and presentation as
information that is understandable by human being.
DATA AGGREGATE: A collection of Data items.
DATA DICTIONARY: It’s contains definitions of Data, the relationship of one category
of data to another, the attributes and keys of groups of data, and so forth. Software tools
for recording these information are used.
DATA ELEMENT: A uniquely named and well-defined category of data that consists of
data items, and that is included in the record of an activity.
DATA ENTRY: The process of entering data into a computerized database or spreadsheet.
Data entry can be performed by an individual typing at a keyboard or by a machine entering
data electronically.
DATA MINING: Term for a class of database applications that look for hidden patterns
in a group of data. For example, data mining software can help retail companies find
customers with common interests. The term is commonly misused to describe software that
presents data in new ways. True data mining software doesn't just change the presentation,
but actually discovers previously unknown relationships among the data.
DATA MART, DATAMART: A database, or collection of databases, designed to help
managers make strategic decisions about their business. Whereas a data warehouse
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 34
combines databases across an entire enterprise, data marts are usually smaller and focus on
a particular subject or department. Some data marts, called dependent data marts, are
subsets of larger data warehouses.
DATA MODEL: 1) the logical data structures, including operations and constraints
provided by a DBMS for effective Database processing. 2) The system used for the
representation of Data (e.g., the ERD or relational model). A data model is an abstract
representation of the data used by an organization, such that a meaningful interpretation of
the data may be made by the model's readers. The data model may be at a conceptual,
external or internal level (as defined by ANSI).
DATA SOURCE: The source where the data to be accessed is stored. A generic name for
data whether stored in a conventional data source (such as Oracle or in a file system such
as RMS and VSAM). The name given to a data source in the binding.
DATA WAREHOUSE: A copy of transaction data specifically structured for query and
analysis. A collection of data designed to support management decision-making. Data
warehouses contain a wide variety of data that present a coherent picture of business
conditions at a single point in time. Development of a data warehouse includes
development of systems to extract data from operating systems plus installation of a
warehouse database system that provides managers flexible access to the data. The term
data warehousing generally refers to combine many different databases across an entire
enterprise. Contrast with data mart.
DATABASE: 1) a collection of all the data needed by a person or organization to perform
needed functions 2) a collection of related files 3) any collection of data organized to
answer queries 4) (informally) a database management system
DATABASE MANAGER: 1) the person with primary responsibility for the design,
construction, and maintenance of a database. 2) (informally) a database management
system.
DENORMALISATION: To allow redundancy in a table so that table can remain flat,
rather than normalized.
DB2, DB3, DB4: The IBM relational database systems
DBMS(DATABASE MANAGEMENT SYSTEM): Also called database manager, it’s
an integrated collection of programs designed to allow people to design databases, enter
and maintain data, and perform queries. It contains the tools to manage the data and the
structures by DML and DDL.
DDL (DATABASE DEFINITION LANGUAGE): it’s the language used to define
database tables structures, relationships, triggers, procedures needed to build the skeleton
of the database.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 35
DML (DATA MANIPULATION LANGUAGE): this language is used to perform query
to databases.
Distributed Database: A database in which the resources are stored on more than one
computer system, often at different physical locations.
ENTITY: a real-world object, observation, transaction, or person about which data are to
be stored in a database.
ENTITY-RELATIONSHIP (ER OR ERD) DIAGRAM: design tool used primarily for
relational databases in which entities are modeled as geometric shapes and the relationships
between them are shown as labeled arcs. It’s a model of an organization’s data in which
the objective has been to remove all repeated values by creating more tables.
FIELD: term used by Access as a synonym for attribute.
FILE: 1) the separately named unit of storage for all data and programs on most computers.
For example, a relation or a whole database may be stored in one file. 2) term used as a
synonym for relation in some (particularly older) database managers, like dBase.
INCOHERENT DATA: a value of an attribute that doesn’t reflect the real state of the
data. An incorrect address of a contact or a couple of similar data are two examples of
incoherent data.
INDEX: 1) a method used to reorder tuples or to display them in a specific order 2) a data
structure used to give rapid, random access to relations. Indexes are most often used with
large relations.
JOIN: An operation that takes two relations as operands and produces a new relation by
concatenating the tuples and matching the corresponding columns when a stated condition
holds between the two. It uses data from more than one relation (table). The relations must
have at least one attribute (called the join or linking attribute) in common.
KEY: an attribute or combination of attributes. A combination of their values will be used
to select tuples from a relation.
MANY-TO-MANY RELATIONSHIP: One or more tuples in one relation may be related
to one or more tuples in a second relation by a common value of a join attribute. This
implies that each value of the join attribute may appear any number of times in either
relation or in both.
NORMAL FORM: 1) a condition of relations and databases intended to reduce data
redundancy and improve performance 2) The method of normalizing a database. There are
three main normal forms: First, Second, and Third. First Normal Form says that each field
in a table must contain different information. Second Normal Form says that no field values
can be derived from another field. The Third Normal Form says that no duplicate
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 36
information is permitted within two or more tables. Normalized tables are linked using key
fields.
NORMALIZE: The process of removing redundancy in data by separating the data into
multiple tables, decomposing complex data structures into natural structures.
ODBC (OBJECT DATABASE CONNECTIVITY): A standard interface between a
database and an application that is trying to access the data in that database. ODBC is
defined by an international (ISO) and a national (ANSI) standard. The moist recent version
is called SQL-92.
ONE-TO-MANY RELATIONSHIP: exactly one tuple in one relation is related by a
common join attribute to many tuples in another relation. This implies that each value of
the join attribute is unique in the first relation but not necessarily unique in the second.
ONE-TO-ONE RELATIONSHIP: exactly one tuple in one  relation is related by a
common join attribute to exactly one tuple in another relation. This implies that each value
of the join attribute appears no more than once in each of the relations.
PERSISTENT QUERY: a query which is stored for reuse
PRIMARY KEY: a key such that the value of the key attribute(s) will uniquely identify
any tuple in the relation. A relation must not have more than one primary key.
QUERY: literally, a question. 1) a command, written in a query language, for the database
to present a specified subset of the data in the database. 2) the subset of data produced as
output in response to a query
QUERY LANGUAGE: a computer language which can be used to express queries.
QUERY RESOLUTION: the process of collecting the data needed to answer a query.
RECORD: term used as a synonym for tuple in some (particularly older) database
management systems, like dBase.
RECURSIVE QUERY: a query in which the output of the query is then used as input for
the same query.
RDBMS(RELATIONAL DATABASE MANAGEMENT SYSTEM): see Database
Management System and Relational Database.
RECORD: In database management systems, a complete set of information. Records are
composed of fields, each of which contains one item of information. A set of records
constitutes a file. For example, a personnel file might contain records that have three fields:
a name field, an address field, and a phone number field. In relational database management
systems, records are called tuples.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 37
REDUNDANCY: A feature provided by relational database management systems
(RDBMS's) that prevents users or applications from entering inconsistent data. Most
RDBMS's have various referential integrity rules that you can apply when you create a
relationship between two tables. It’s the practice of storing more than one occurrence of
data. In the case where data can be updated, redundancy poses serious problems. In the
case where data is not updated, redundancy is often a valuable and necessary design tool.
The duplication of data in the database to improve the ease and speed of access to data can
arise the risk that changes may cause conflicting values.
REFERENTIAL INTEGRITY: A feature provided by relational database management
systems (RDBMS's) that prevents users or applications from entering inconsistent data. An
integrity mechanism ensuring vital data in a database, such as the unique identifier for a
given piece of data, remains accurate and usable as the database changes. Referential
integrity involves managing corresponding data values between tables when the foreign
key of a table contains the same values as the primary key of another table. For example,
suppose Table B has a foreign key that points to a field in Table A. Referential integrity
would prevent from adding a record to Table B that cannot be linked to Table A. In
addition, the referential integrity rules might also specify that whenever you delete a record
from Table A, any records in Table B that are linked to the deleted record will also be
deleted. This is called cascading delete. Finally, the referential integrity rules could specify
that whenever you modify the value of a linked field in Table A, all records in Table B that
are linked to it will also be modified accordingly. This is called cascading update.
RELATION: the basic collection of data in a relational database. Usually represented as
a rectangular array of data, in which each row (tuple) is a collection of data about one entity
RELATIONAL DATABASE: A type of database management system (DBMS) that store
data in the form of related table. Relational databases are powerful because they require
few assumptions about how data is related or how it will be extracted from the database.
As a result, the same database can be viewed in many different ways. An important feature
of relational system is that a single database can be spread across several tables. This differs
from flat-file databases, in which each database is self-contained in a single table.
REPLICATION: Duplication of table schema and data or stored procedure definitions
and calls from a source database to a destination database, usually on separate servers.
ROW: term used by Access as a synonym for tuple
RUNNING A QUERY: Term for query resolution
SCHEMA: 1).a description of a database. It specifies (among other things) the relations,
their attributes, and the domains of the attributes. In some database systems, the join
attributes are also specified as part of the schema. 2) the description of one relation
SECONDARY KEY: a key which is not the primary key for a relation.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 38
SELECT: a query in which only some of the tuples in the source relation appear in the
output
SEQUEL: see SQL
STORED PROCEDURE: In database management systems (DBMSs) , an operation that
is stored with the database server. Typically, stored procedures are written in SQL. Stored
procedures execute faster than ordinary SQL requests because they have been compiled
and optimized by the server. By keeping the requests on the SQL server, they don't have to
be coded into the user's front end, thereby allowing the program to load and execute faster.
Stored procedures are an important element in load balancing.
SQL: pronounced 'Sequel', stands for Sequential Query Language, the most common text-
based database query language The SQL is both a DDL and a DML languages. DLL is
defined by CREATE, ALTER statements and other commands. DML is represented by
SELECT, UPDATE, DELETE, INSERT statements and so on. There are different
standards of SQL: ANSI-SQL, T-SQL, etc.
TABLE: A relation that consists of a set of columns with a heading and a set of rows (i.e.,
tuples). It’s a noun used as a synonym for relation in relational theory.
TECHNICAL REENGINEERING: an organizational restructuring based on a
fundamental re-examination of why a database exists.
TRANSACTION: 1) the fundamental unit of change in many (transaction-oriented)
databases. A single transaction may involve changes in several relations, all of which must
be made simultaneously in order for the database to be internally consistent and correct. 2)
the real-life event which is modeled by the changes to the database.
TRIGGER: A detectable event that causes another action to happen. For instance,
changing a discount rate in a grocery store's inventory database may cause an alert to be
emailed to a manager.
TUPLE: within a relation, a collection of all the facts related to one entity. Usually
represented as a row of data. In relational database systems, a record. See record.
VALUE: the computer representation of a fact about an entity.
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 39
COMMENTS
Database IPGRI_DB:
• Does “Centres” table contain characteristics of Institutes?
• Is the CROPCODE equal to SPECIE in AMS?
• IND_PROF e IND_TAGS not included
• Delete fields of “ipgriregion” from institute and contact
• INST: Name_nat can be considered as “txtInstituteName”
• PktxtCountryISOCodeCode is country.cty??
Database IPGRIAddressBook:
• The “Flags” table contains the field names of ipgri_db
“contacts” table
• USERLOG could be treated as “user” table
Database IPGRI-Eur-PopulusClones:
• What is Clone and Accession?
• “Clones” table skipped
• A country is defined by: ISOCODE, CTY field or by name?
• An institute is defined by: ISOCODE or INSTCODE field?
• Must be “Main” table renamed to “Accession” ?
• Main: “INSTITUTION WHERE MAINTAINED” where is
the source of institutions?
• “Main” table is not completely processed
• “Maintenance” table processed
Database IPGRIWEB:
l ANNUALREPORT skipped
l ANNUALREPORT_TEXT skipped
l conflict_PUBSURVEY skipped
l conflict_TRAINSURVEY skipped
l conflict_WEBSURVEY skipped
l COUNTRY contain institutional Country table translated in
different languages: useful. ISO3 field is equal to
Country.cty.
l The table CROPS_TYPE and ipgri_db.material contain the
same kind of data?
l Is CROP tabela subset of CROPS_TYPE?
l CROPS table must be renamed to “CropNames”
l EVENT table: forced the primary key to match with
ipgri_db.events
l GENEFLOW used in website: skipped
l Should IPGRI Users have only one table for the access to all
IPGRI web-applications?
l About language: many tables have different translation of a
particular name. I.e. crop can be mentioned in English, Spain
or French. Could we create an external database of all
international names?
IPGRI – Foundations of Institutional Database Management Life Cycle
Page 40
l IPGRIWEB.OFFICE_LOCATION.COUNTRY links to
COUNTRY name either than country cty.
l OWNER table should be merged with Acronym table?
l PGR skipped
l PGR_ARTICLE skipped
l PUBBLICATION contains articles published in the website
l PUBBLICATION skipped
l PUBBLICATION_OWNER (linked to PUBBLICATION)
is skipped
l PUBSURVEY skipped
l RESOURCE skipped
l SERIE skipped
l STAFF: should id_ownergroup and id_ownergroup2 linked
to GROUPS in TIP?
l Does STAFF contain only and all IPGRI members?
l STAFF must be revisited
l THEME skipped
l TOPIC skipped
l TRAINING skipped: purpose not clear
l TRUSTEES skipped: purpose not clear
l URL_OWNER skipped
l URL_REGION: regions are ipgriregions?
l URL_TOPIC skipped
l USETYPE skipped
l WEBPAGE, WEBPAGE_AGROVOC,
WEBPAGE_COUNTRY, WEBPAGE_CROP,
WEBPAGE_EVENT, WEBPAGE_INSTITUTIONAL,
WEBPAGE_NETWORK, WEBPAGE_REGION,
WEBPAGE_RESOURCE, WEBPAGE_THEME,
WEBPAGE_TRAINING, WEBSURVEY skipped
IPGRNewsletter skipped
Presupuesto skipped
Proyectos database:
l Pbcatcol skipped: purpose not clear
l Pbcatedt skipped: purpose not clear
l Pbcatfmt skipped: purpose not clear
l Pbcattbl skipped: purpose not clear
l pbcatvld skipped: purpose not clear

More Related Content

What's hot

Team Collaboration: a Comparative Analysis of Email and CFCS
Team Collaboration: a Comparative Analysis of Email and CFCSTeam Collaboration: a Comparative Analysis of Email and CFCS
Team Collaboration: a Comparative Analysis of Email and CFCS
Gyeabour Akwasi Fosuhene Jr.
 
Mba thesis by owolabi & kingsley
Mba thesis by owolabi & kingsleyMba thesis by owolabi & kingsley
Mba thesis by owolabi & kingsley
Jeannifer Villanueva
 
CSF of ERP Implementations in Sri Lankan Companies
CSF of ERP Implementations in Sri Lankan CompaniesCSF of ERP Implementations in Sri Lankan Companies
CSF of ERP Implementations in Sri Lankan Companies
Chamil Hathurusinghe - ACMA, CPA, MSc
 
WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206
WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206
WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206
Joke Hoogerbrugge
 
DOT Open Gov Plan Final
DOT Open Gov Plan FinalDOT Open Gov Plan Final
DOT Open Gov Plan Final
GovLoop
 
Mauritius country case study
Mauritius country case studyMauritius country case study
Mauritius country case study
Nawsheen Hosenally
 
USTR Open Gov Plan
USTR Open Gov PlanUSTR Open Gov Plan
USTR Open Gov Plan
GovLoop
 
final dissertation pambuka
final dissertation pambukafinal dissertation pambuka
final dissertation pambuka
Takesure Pambuka
 
Google project
Google projectGoogle project
Google project
Vrutant Vakharia
 
Fb english final_2008
Fb english final_2008Fb english final_2008
Fb english final_2008
Colin McLarty CMA
 
Satya final project
Satya final projectSatya final project
Satya final project
EY
 
Improving the capacity of young people to engage in decision making anf lend ...
Improving the capacity of young people to engage in decision making anf lend ...Improving the capacity of young people to engage in decision making anf lend ...
Improving the capacity of young people to engage in decision making anf lend ...
Dr Lendy Spires
 
Guideline for Green Management Program (GMP) to Promote Environmental Managem...
Guideline for Green Management Program (GMP) to Promote Environmental Managem...Guideline for Green Management Program (GMP) to Promote Environmental Managem...
Guideline for Green Management Program (GMP) to Promote Environmental Managem...
Environmental Consortium for Leadership Development (EcoLeaD)
 
Project Plan And Srs Final
Project Plan And Srs FinalProject Plan And Srs Final
Project Plan And Srs Final
guest24783f
 
Laporan akhir si cerdas ver -3 .docx
Laporan akhir si cerdas ver -3 .docxLaporan akhir si cerdas ver -3 .docx
Laporan akhir si cerdas ver -3 .docx
Fajar Baskoro
 
2012 nonprofit-social-networking-benchmark
2012 nonprofit-social-networking-benchmark2012 nonprofit-social-networking-benchmark
2012 nonprofit-social-networking-benchmark
Thiago Moura
 
Deller rpl thesis
Deller rpl thesisDeller rpl thesis
Deller rpl thesis
Linda Meyer
 
BusinessPlan-SHinE Final Version
BusinessPlan-SHinE Final VersionBusinessPlan-SHinE Final Version
BusinessPlan-SHinE Final Version
Shay Clark
 
Gibbons Dissertation 3.22.16 Distribution
Gibbons Dissertation 3.22.16 DistributionGibbons Dissertation 3.22.16 Distribution
Gibbons Dissertation 3.22.16 Distribution
Edward Gibbons, MBA, PhD student
 

What's hot (19)

Team Collaboration: a Comparative Analysis of Email and CFCS
Team Collaboration: a Comparative Analysis of Email and CFCSTeam Collaboration: a Comparative Analysis of Email and CFCS
Team Collaboration: a Comparative Analysis of Email and CFCS
 
Mba thesis by owolabi & kingsley
Mba thesis by owolabi & kingsleyMba thesis by owolabi & kingsley
Mba thesis by owolabi & kingsley
 
CSF of ERP Implementations in Sri Lankan Companies
CSF of ERP Implementations in Sri Lankan CompaniesCSF of ERP Implementations in Sri Lankan Companies
CSF of ERP Implementations in Sri Lankan Companies
 
WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206
WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206
WPP on HIV in the Public Sector-Stakeholder Workshop Report final 121206
 
DOT Open Gov Plan Final
DOT Open Gov Plan FinalDOT Open Gov Plan Final
DOT Open Gov Plan Final
 
Mauritius country case study
Mauritius country case studyMauritius country case study
Mauritius country case study
 
USTR Open Gov Plan
USTR Open Gov PlanUSTR Open Gov Plan
USTR Open Gov Plan
 
final dissertation pambuka
final dissertation pambukafinal dissertation pambuka
final dissertation pambuka
 
Google project
Google projectGoogle project
Google project
 
Fb english final_2008
Fb english final_2008Fb english final_2008
Fb english final_2008
 
Satya final project
Satya final projectSatya final project
Satya final project
 
Improving the capacity of young people to engage in decision making anf lend ...
Improving the capacity of young people to engage in decision making anf lend ...Improving the capacity of young people to engage in decision making anf lend ...
Improving the capacity of young people to engage in decision making anf lend ...
 
Guideline for Green Management Program (GMP) to Promote Environmental Managem...
Guideline for Green Management Program (GMP) to Promote Environmental Managem...Guideline for Green Management Program (GMP) to Promote Environmental Managem...
Guideline for Green Management Program (GMP) to Promote Environmental Managem...
 
Project Plan And Srs Final
Project Plan And Srs FinalProject Plan And Srs Final
Project Plan And Srs Final
 
Laporan akhir si cerdas ver -3 .docx
Laporan akhir si cerdas ver -3 .docxLaporan akhir si cerdas ver -3 .docx
Laporan akhir si cerdas ver -3 .docx
 
2012 nonprofit-social-networking-benchmark
2012 nonprofit-social-networking-benchmark2012 nonprofit-social-networking-benchmark
2012 nonprofit-social-networking-benchmark
 
Deller rpl thesis
Deller rpl thesisDeller rpl thesis
Deller rpl thesis
 
BusinessPlan-SHinE Final Version
BusinessPlan-SHinE Final VersionBusinessPlan-SHinE Final Version
BusinessPlan-SHinE Final Version
 
Gibbons Dissertation 3.22.16 Distribution
Gibbons Dissertation 3.22.16 DistributionGibbons Dissertation 3.22.16 Distribution
Gibbons Dissertation 3.22.16 Distribution
 

Similar to Institutional database analysis

WebIT2 Consultants Proposal
WebIT2 Consultants ProposalWebIT2 Consultants Proposal
WebIT2 Consultants Proposal
Sarah Killey
 
Determination of individual competencies by statistical methods yuksek lisans...
Determination of individual competencies by statistical methods yuksek lisans...Determination of individual competencies by statistical methods yuksek lisans...
Determination of individual competencies by statistical methods yuksek lisans...
Tulay Bozkurt
 
CPI_Guidebook_July_2008_OSD_FINAL
CPI_Guidebook_July_2008_OSD_FINALCPI_Guidebook_July_2008_OSD_FINAL
CPI_Guidebook_July_2008_OSD_FINAL
Leanleaders.org
 
Information security
Information securityInformation security
Information security
Hai Nguyen
 
Privacy and Tracking in a Post-Cookie World
Privacy and Tracking in a Post-Cookie WorldPrivacy and Tracking in a Post-Cookie World
Privacy and Tracking in a Post-Cookie World
Ali Babaoglan Blog
 
Internship report 2007eit043
Internship report 2007eit043Internship report 2007eit043
Internship report 2007eit043
Isha Jain
 
Business Management And Organization Booklet.pdf
Business Management And Organization Booklet.pdfBusiness Management And Organization Booklet.pdf
Business Management And Organization Booklet.pdf
SanskritiPandey29
 
Web Adoption and Implementation
Web Adoption and ImplementationWeb Adoption and Implementation
Web Adoption and Implementation
Assaf Alrousan
 
CISM Summary V1.0
CISM Summary V1.0CISM Summary V1.0
CISM Summary V1.0
christianreina
 
Management by competencies tulay bozkurt
Management by competencies tulay bozkurtManagement by competencies tulay bozkurt
Management by competencies tulay bozkurt
Tulay Bozkurt
 
Theta Planning
Theta PlanningTheta Planning
Theta Planning
ahmad bassiouny
 
CIMA_unlocking_business_intelligence
CIMA_unlocking_business_intelligenceCIMA_unlocking_business_intelligence
CIMA_unlocking_business_intelligence
Mohsin Kara, ACMA
 
896405 - HSSE_v03
896405 - HSSE_v03896405 - HSSE_v03
896405 - HSSE_v03
Katie Plummer
 
Change Management Strategy
Change Management StrategyChange Management Strategy
Change Management Strategy
Jim Soltis, PMP
 
Ict in africa education fullreport
Ict in africa education fullreportIct in africa education fullreport
Ict in africa education fullreport
Stefano Lariccia
 
Estrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CADEstrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CAD
Anibal Aguilar
 
Fulltext01
Fulltext01Fulltext01
Fulltext01
Farhin Kazi
 
EFFECTIVE IT GOVERNANCE - Report
EFFECTIVE IT GOVERNANCE - Report EFFECTIVE IT GOVERNANCE - Report
EFFECTIVE IT GOVERNANCE - Report
S L
 
Crossing The Next Regional Frontier 2009
Crossing The Next Regional Frontier 2009Crossing The Next Regional Frontier 2009
Crossing The Next Regional Frontier 2009
The Institute for Open Economic Networks (I-Open)
 
Iia nl combining functions 2014
Iia nl combining functions 2014Iia nl combining functions 2014
Iia nl combining functions 2014
Halimy Abdul Hamid
 

Similar to Institutional database analysis (20)

WebIT2 Consultants Proposal
WebIT2 Consultants ProposalWebIT2 Consultants Proposal
WebIT2 Consultants Proposal
 
Determination of individual competencies by statistical methods yuksek lisans...
Determination of individual competencies by statistical methods yuksek lisans...Determination of individual competencies by statistical methods yuksek lisans...
Determination of individual competencies by statistical methods yuksek lisans...
 
CPI_Guidebook_July_2008_OSD_FINAL
CPI_Guidebook_July_2008_OSD_FINALCPI_Guidebook_July_2008_OSD_FINAL
CPI_Guidebook_July_2008_OSD_FINAL
 
Information security
Information securityInformation security
Information security
 
Privacy and Tracking in a Post-Cookie World
Privacy and Tracking in a Post-Cookie WorldPrivacy and Tracking in a Post-Cookie World
Privacy and Tracking in a Post-Cookie World
 
Internship report 2007eit043
Internship report 2007eit043Internship report 2007eit043
Internship report 2007eit043
 
Business Management And Organization Booklet.pdf
Business Management And Organization Booklet.pdfBusiness Management And Organization Booklet.pdf
Business Management And Organization Booklet.pdf
 
Web Adoption and Implementation
Web Adoption and ImplementationWeb Adoption and Implementation
Web Adoption and Implementation
 
CISM Summary V1.0
CISM Summary V1.0CISM Summary V1.0
CISM Summary V1.0
 
Management by competencies tulay bozkurt
Management by competencies tulay bozkurtManagement by competencies tulay bozkurt
Management by competencies tulay bozkurt
 
Theta Planning
Theta PlanningTheta Planning
Theta Planning
 
CIMA_unlocking_business_intelligence
CIMA_unlocking_business_intelligenceCIMA_unlocking_business_intelligence
CIMA_unlocking_business_intelligence
 
896405 - HSSE_v03
896405 - HSSE_v03896405 - HSSE_v03
896405 - HSSE_v03
 
Change Management Strategy
Change Management StrategyChange Management Strategy
Change Management Strategy
 
Ict in africa education fullreport
Ict in africa education fullreportIct in africa education fullreport
Ict in africa education fullreport
 
Estrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CADEstrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CAD
 
Fulltext01
Fulltext01Fulltext01
Fulltext01
 
EFFECTIVE IT GOVERNANCE - Report
EFFECTIVE IT GOVERNANCE - Report EFFECTIVE IT GOVERNANCE - Report
EFFECTIVE IT GOVERNANCE - Report
 
Crossing The Next Regional Frontier 2009
Crossing The Next Regional Frontier 2009Crossing The Next Regional Frontier 2009
Crossing The Next Regional Frontier 2009
 
Iia nl combining functions 2014
Iia nl combining functions 2014Iia nl combining functions 2014
Iia nl combining functions 2014
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

Institutional database analysis

  • 1. INSTITUTIONAL DATABASES FOUNDATIONS FOR THE LIFE CYCLE Massimo Buonaiuto
  • 2. IPGRI – Foundations of Institutional Database Management Life Cycle Page 2 SUMMARY SUMMARY................................................................................................................... 2 INTRODUCTION ........................................................................................................ 5 INFORMATION AS AN ORGANIZATION ASSET ................................................. 6 CURRENT STATUS AND REMARKABLE PROBLEMS........................................ 7 BAD STRUCTURES AND SCHEMES.............................................................................................. 7 REDUNDANCY ............................................................................................................................... 7 BAD DATABASE MANAGEMENT SYSTEMS............................................................................... 7 PURPOSES NOT ALWAYS DEFINED............................................................................................. 7 DATA DICTIONARY....................................................................................................................... 8 DOCUMENTATION......................................................................................................................... 8 REFERENCES .................................................................................................................................. 8 STANDARD NAMING SCHEMES................................................................................................... 8 RELATIONSHIPS AMONG DATABASES....................................................................................... 8 RELATIONSHIPS AMONG TABLES: ABSENCE OF CASCADING FEATURES .......................... 8 LOOKUP TABLES............................................................................................................................ 8 STANDARD DATABASES .............................................................................................................. 8 VIEWS.............................................................................................................................................. 9 BACKUP AND RECOVERY............................................................................................................. 9 EXPORTING OF RECORDS BETWEEN TWO USERS.................................................................... 9 PRIMARY, SECONDARY AND FOREIGN KEYS........................................................................... 9 INDEXING ....................................................................................................................................... 9 PERSISTENT QUERIES................................................................................................................... 9 DISTRIBUTE DATABASES AND REPLICATION .......................................................................... 9 THE PROJECT .......................................................................................................... 10 PROJECT DETAILS ....................................................................................................................... 11 PROJECT TEAM..........................................................................ERROR! BOOKMARK NOT DEFINED. DATA ARCHITECTURE.......................................................................................... 12 INTRODUCTION AND BACKGROUND....................................................................................... 12 A SIMPLE DATA ARCHITECTURE .............................................................................................. 12
  • 3. IPGRI – Foundations of Institutional Database Management Life Cycle Page 3 DATA MODELING OVERVIEW.................................................................................................... 13 METADATA................................................................................................................................... 13 DATABASE MANAGEMENT SYSTEM (DBMS) OVERVIEW..................................................... 13 DATA ACCESS MIDDLEWARE OVERVIEW............................................................................... 13 DATA ACCESS IMPLEMENTATION OVERVIEW....................................................................... 13 DATA SECURITY OVERVIEW ..................................................................................................... 13 THE DAWN OF A NEW ARCHITECTURE ........................................................... 14 ADVANTAGES.............................................................................................................................. 14 DISADVANTAGES........................................................................................................................ 14 GENERAL RECOMMENDATIONS........................................................................ 15 WHY INST. DATABASES ARE NOT UP TO USERS EXPECTATIONS? ...........................16 INSTITUTIONAL DATABASE ................................................................................ 17 DEFINITION................................................................................................................................... 17 IMPLEMENTATION...................................................................................................................... 17 NAMING CONVENTION............................................................................................................... 17 DBMS Naming Convention.......................................................................................... 17 Field naming convention............................................................................................... 18 Default fields ................................................................................................................ 18 Primary and Foreign key naming convention ................................................................ 19 Data standard: the Unicode Standard............................................................................. 21 Data Sources................................................................................................................. 21 APPENDIX A: DATABASE DESIGN....................................................................... 23 APPENDIX B: DATABASE EVALUATION BY METRICS .................................. 25 APPENDIX C: REDUNDANCY AND NORMALISATION.................................... 27 NORMALISATION ........................................................................................................................ 27 EXAMPLE...................................................................................................................................... 28 APPENDIX D: DATABASE STANDARDS.............................................................. 31 APPENDIX E: GLOSSARY ...................................................................................... 33 COMMENTS .............................................................................................................. 39
  • 4. IPGRI – Foundations of Institutional Database Management Life Cycle Page 4
  • 5. IPGRI – Foundations of Institutional Database Management Life Cycle Page 5 INTRODUCTION IPGRI is an international research institute of United Nations with a mandate to advance the conservation and use of genetic diversity for the well being of present and future generations. IPGRI aims to meet three major objectives: § Countries, particularly developing countries, can better assess and meet their own plant genetic resources needs § International collaboration in the conservation and use of genetic resources is strengthened § Knowledge and technologies relevant to the improved conservation and use of plant genetic resources are developed and disseminated The information collected by scientists and all people who work in IPGRI, are fundamental for IPGRI mission. All these data are stored in different kind of supports: databases, documents, papers, and backup tools. The primary objective of the Database Inventory Project is to provide a detailed analysis of the databases considered Institutionally important. This aim is reached by deep investigation of the context in which the databases are used. There are various areas of Information and Knowledge Management in IPGRI, which need to be strengthened. In particular, there is great demand for Intranet access to administrative information, such as Personnel, Budgeting, Financing etc. The lack of Institutional Database Management is strongly felt as a weak point for the Institute, which should instead, lead the way in this area. In addition, the Intranet should acquire the capability to search in the Institutional documents as agreed in the last several meetings. Currently, databases are managed by various responsible persons in different groups and with poor technical support as well as integration. Some of them such as Contacts Database exist in various forms in different sites. Other ones are not scalable, shared among staff members or the tools used to interact with it are obsolete. The obvious result is that a combined search would currently require a huge manual work. In addition, if the databases are ever to become a truly valuable information asset in IPGRI a mechanisms must be put in place for managing and controlling the quality of the content stored.
  • 6. IPGRI – Foundations of Institutional Database Management Life Cycle Page 6 INFORMATION AS AN ORGANIZATION ASSET Efficient Organizations nowadays base their processes on the quick and flexible access to proper Information. Information can become so critical and expensive to produce that it must be made available to other groups in the Organization “Anytime - Anywhere”. The process required to make Information available to others is expensive. It is not justifiable to store all possible pieces of Information for Organizational access. Therefore, in planning its Databases an Organization should look at its objectives and the processes required to achieve them to quantify the value of the Information created at each stage. Once the Organization has decided what Information has a value that justifies its availability to other staff that Information becomes an asset and should be treated as any other assets in the Organization. Can IPGRI recognize itself in this model? Information is the base of our work. In IPGRI we would not be going very far without it. IPGRI is a very dispersed and distributed Institution. Several projects make use of information that can be reutilised in other projects even at the same time. However, the Institute has seen the growth of such information sets without the use of Database tools. For example, Word processors and Spreadsheets have and are used to store tables of different nature. This is all fine, as long as a careful evaluation of the value of the Information that must be shared is performed on a periodic basis. As a final statement: All Information that becomes an Institutional Asset will have to follow a life cycle, which process is defined in “Appendix A: Database Design” and be based on the standards described in “Appendix D: Database Standard”.
  • 7. IPGRI – Foundations of Institutional Database Management Life Cycle Page 7 CURRENT STATUS AND REMARKABLE PROBLEMS In IPGRI, a lot of databases have been developed without any approved standard. Poor documentation exists, but some patterns can be detected about the current status. These considerations suggested to perform a deep study of those databases that are considered by IPGRI institutionally important, achieving a consolidated view of them, gathered in an inventory. An archive of this information has to be created and maintained for the time being. From this necessity the Institutional Databases Project arose. For a definition of the terms used below pleases refer to Appendix E. These are some of the problems that we found in IPGRI Institutional Databases. BAD STRUCTURES AND SCHEMES Many IPGRI Databases are created on bad structures. Any Database Theories have not been applied. It is evident the lack of the following important Database properties: I. Correctness II. Reliability III. Maintainability IV. Flexibility V. Testability VI. Reusability VII. Interoperability Refer to Appendix B for the definitions of these terms. REDUNDANCY IPGRI Databases are often duplicated in several versions and the maintenance is often very heavy and produces incoherent records, because it is necessary to make updates into each version. Examples: Contacts and Publications databases. A good solution to this problem is normalisation (see Appendix C). BAD DATABASE MANAGEMENT SYSTEMS All databases should be developed using dedicated DBMS, Database Management Systems, like MS SQL Server, MySQL, MS Access, Oracle, etc. There are many databases not implemented using these tools, but with different kind of applications like Word Processors and Spreadsheet Tools. It becomes hard to share and manipulate this Information. PURPOSES NOT ALWAYS DEFINED Every database should satisfy a well-defined purpose. We found many databases without a general objective accepted by IPGRI as a whole. Reengineering is needed for many databases, like Europe LOA database.
  • 8. IPGRI – Foundations of Institutional Database Management Life Cycle Page 8 DATA DICTIONARY There isn’t any data dictionary standard. Data dictionary is a fundamental document that allows a rapid access to many rates of quality, as interoperability, reliability, maintainability, flexibility, testability and reusability (see Appendix B). DOCUMENTATION Most of the databases don’t have any documentation, neither user manuals nor technical papers. Data Source Documents are primary importance issues for database administrators, because they contain all information about the settings used for a particular database. REFERENCES Producing correct Databases requires a source of references. These could be created once and used always, as documents of standards and directives. We didn’t find any Institutional guidelines about database design. STANDARD NAMING SCHEMES IPGRI doesn’t use a naming convention for structures and interfaces of databases RELATIONSHIPS AMONG DATABASES We noticed the need to find the proper links among the different databases. But the existence of multiple Databases on the same topic (redundancy) and the absence of an up- to-date database inventory make this task more complex. RELATIONSHIPS AMONG TABLES: ABSENCE OF CASCADING FEATURES Relationships among tables of the same database are necessary for referential integrity. Most of the databases don’t use them to avoid loss of meaning of stored data. Referential integrity is a feature provided by relational database management systems (RDBMS's) that prevents users or applications from entering inconsistent data. Most RDBMS's have various referential integrity rules that can be applied when a relationship is created between two tables. See the Glossary (Appendix E) for the referential integrities proprieties. LOOKUP TABLES A table in a database that contains recurring values for a specific field should be used as unique source of stored data. The update of these kind of tables has to be centrally controlled. This technique is a user convenience that also promotes referential integrity. STANDARD DATABASES The use of data, considered as standard source of information by other international Institutes, is fundamental to produce a good interface between IPGRI and other
  • 9. IPGRI – Foundations of Institutional Database Management Life Cycle Page 9 organizations. In fact the use of structures recognized by other institutes aids the sharing process and makes IPGRI a good institute to treat with. Agrovoc is an example. VIEWS Databases Views are not used except for certain cases. The views permit the extrapolations of subset data from a database. For example: the Germplasm Database stores information about Institutes that collect a selected taxon. A view should be all Germplasm data without Institutes information that couldn’t be useful for a particular application. In data warehouses, the Data Mart is an evolution of the view concept. BACKUP AND RECOVERY Many databases don’t implement any backup policy to preserve data from crashes and other events. EXPORTING OF RECORDS BETWEEN TWO USERS The exporting of records from a database, with the intention to send data between different users, occurs without a previous agreement of the involved parts. In fact, we found that many hours are spent converting from a database format to another. PRIMARY, SECONDARY AND FOREIGN KEYS Each Table of every database should use correct keys to uniquely individuate tuples in the data. The Team found many tables without good kind of keys. For example: the current version of the table tip in the Travel Information Plan database contains ID field as primary key: the correct key should be travel code that could uniquely identify a TIP. INDEXING Indexes are used to order tables or to display them in a specific order, by a data structure used to give rapid, random access to relations. Indexes are most often used with large relations and can give high advantages to database queries. We didn’t find any policy on database indexing. PERSISTENT QUERIES Persistent queries should be applied to the queries most used in applications. They should be implemented directly in DBMS tools in order to obtain speed in data accessing. SQL Server Stored Procedures and Queries stored in Access are examples of these kind of queries. This tool is not often implemented in IPGRI databases. DISTRIBUTE DATABASES AND REPLICATION Institutional Databases are not always distributed or replicated. All Institutional Databases that are remotely accessed should be replicated.
  • 10. IPGRI – Foundations of Institutional Database Management Life Cycle Page 10 THE PROJECT For each Institutional Database the Team will provide a recommendation document with the following topics: 1) Analysis of the data architecture, interfaces and data entry procedures and tools with Entity Relationship diagram and analysis of all strengths and limitations. The main objective is finding holes in the data input that would allow inconsistent data to be produced. 2) A Data Dictionary covering the entire set of data represented. 3) A list of redundancies on the data architecture and data content obtained as a comparison among the Databases. 4) Improvements to the Data entry process to support multi-site, multi-user updates 5) A list of suggested improvements including development tools standards. 6) Values of main rates of quality with simple questionnaire. 7) Skills that were required for the design of the data structure and the interface. 8) A Map of redundancies among Databases and a list of suggested database merges with a list of steps to be taken A final presentation will be given to Management with summarized results. A Collaboration web site has been created, with Sharepoint, for quick interaction during the analysis phase and final delivery of the reports. All users can discuss about the published documents: databases, documents, spreadsheets can be uploaded and downloaded and new forums can be created around them to discuss different topics. Interaction with IPGRI staff to collect survey information and files needed to analyze the databases, their development and data entry processes are fundamental in this project. Various databases, such as Contacts, share some common problems, such as being able to update records data can be viewed by all other parties without requiring a manual merge process. In these instances, it maybe necessary to implement a different Database architecture to allow users in different sites to share a Distributed Database that will allow the selected update of data with an automatic replication. This Distributed Database architecture will require partitioning of the data tables for controlled update. We will give a detailed look at the skills that were required for the development of the existing Databases in the regions and at HQ. Along with suggestions from the various parties these will constitute the basis for a recommendation on Development tools standards.
  • 11. IPGRI – Foundations of Institutional Database Management Life Cycle Page 11 The findings will provide the basis for Management to understand the extent of usage of the Institutional Databases in IPGRI and the reliability of the data content. The presentation and all the documents produced will represent the basis for the successive activities in this area. PROJECT DETAILS The suggested procedures to obtain the above output are as follows: A. Identify an initial set of Databases that should be analysed and staff members that should be ready to provide all the information needed. B. Send a message to all IPGRI staff advising about the activity, giving the initial list of Databases and Database contacts and asking for suggestions on what additional Databases/Staff members should be included in the activity. C. Interact with all Database contacts to collect a sample of the database along with user, maintenance and development documentation. In addition, a list of questions will be sent which will enable to quantify the quality of the data content, any projected activity or any other addition/fixes that would improve the Database. D. Creation of the Collaboration web site. E. Actual analysis takes place. F. Final presentation.
  • 12. IPGRI – Foundations of Institutional Database Management Life Cycle Page 12 DATA ARCHITECTURE The mission of Data Architecture is to establish and maintain an adaptable infrastructure designed to facilitate the access, definition, management, security, and integrity of data across the state. INTRODUCTION AND BACKGROUND Data and information are extremely valuable assets of the institute. Data Architecture establishes an infrastructure for providing access to high quality, consistent data wherever and whenever it is needed. This infrastructure is a prerequisite for fulfilling the requirement for data to be easily accessible and understandable by authorized end users and IPGRI applications. Data and access to data are focal points for many areas of the Technical Architecture. Data Architecture influences how data is stored and accessed, including online input and retrieval, outside application access, backup and recovery, and data warehouse access. An established Data Architecture is the foundation for many other components of the IPGRI technical architecture. Using a good data architecture ensures that data is: 1) Defined consistently across the Institute 2) Re-useable and shareable 3) Accurate and up-to-date 4) Secure 5) Centrally managed A SIMPLE DATA ARCHITECTURE The Data Architecture consists of the following technical topics, including the recommended best practices, implementation guidelines, and standards, as they apply: 1) Data Modeling 2) Metadata 3) Database Management System (DBMS) 4) Data Access Middleware 5) Data Access Implementation 6) Data Security
  • 13. IPGRI – Foundations of Institutional Database Management Life Cycle Page 13 DATA MODELING OVERVIEW How data is modeled and designed inside an application can significantly impact the way an application runs and how other applications can access that data. This topic covers a basic overview of data modeling. METADATA The way to describe or define data is through metadata. Metadata is "information about data". Metadata is stored in a repository containing detailed descriptions about each data element. A generic implementation is as a data dictionary, with full description of the database fields. By using the formats described in the metadata repository, whether the data resides in a single location or in multiple databases across the IPGRI, the same data management principles apply. DATABASE MANAGEMENT SYSTEM (DBMS) OVERVIEW Database Management System (DBMS) addresses the Data Architecture recommendations for projects selecting, designing, and implementing database management systems. In order to meet existing and future database needs, a relational database technology is recommended, particularly for online transactional business applications. An emerging technology in the database world is the object database technology. DATA ACCESS MIDDLEWARE OVERVIEW Data access middleware addresses the Data Architecture recommendations for the implementation of data access middleware. Data access middleware is the communications layer between data access programs and databases. DATA ACCESS IMPLEMENTATION OVERVIEW The Implementing Data Access topic is a key topic, it’s a fundamental component of every application. This topic discusses recommendations for implementing data access within an application and to outside applications. DATA SECURITY OVERVIEW Data security is an important piece of the Data Architecture and the application security model. This topic provides an overview of data security and discusses the best practices for protecting data.
  • 14. IPGRI – Foundations of Institutional Database Management Life Cycle Page 14 THE DAWN OF A NEW ARCHITECTURE We will describe here what is suggested as a new Database Architecture that will be able to solve most of the problems we have listed in the “Current Status” section from the standpoint of concurrent access and update to the Institutional Databases. In the past to consolidate data coming from different locations a considerable amount of time was spent in merging the records manually. This is no required if we set up an Architecture whereby each location replicates the changes to the other major sites. ADVANTAGES 1) Staff will be able to export data from the SQL server database present in their regional office. The exporting can be performed using Microsoft access, Excel or any other ODBC compliant tool depending on the needs. Therefore, people will be able to run statistics, create graphs or perform any kind of processing using their preferred tool. 2) The interface to the Database will be the same for all sites. In practical terms each site will become a mirror of the others. Staff travelling will be able to access any of the 6 web sites from the Internet. DISADVANTAGES Additional Administration will be required from a group with the know how to manage SQL Server. This can be accomplished by a good extent using remote control tools, such as vnc. In addition, once the migration to Windows 2000 will take place at the remote sites, administration will be possible using the remote control features built into the Operating System. Replication SSA HQ APO Replication SSAReplication Americas Replication CWANAReplication
  • 15. IPGRI – Foundations of Institutional Database Management Life Cycle Page 15 GENERAL RECOMMENDATIONS The Database Project Team has defined general recommendations for the databases: 1) A purpose document should exist to describe the reasons of the database, who created it, where it is stored and how it’s accessed and maintained 2) Only DBMS tools should be used to create the Institutional Databases 3) The creation of each Institutional Database should require the following standard processes which underline certain related documents: a. Conceptual Design: Defines the interaction between users and the database to be created using text and graphics. The documents produced are: i. Requirements Document ii. Specification Document iii. Planning Document b. Logical Design: indicates which are the data flows between the actors involved during the interactions. The document produced is: iv. Entity Relationship Graph c. Physical Design: includes the physical creation of the database. The documents produced are: v. Implementation Document vi. User and Technical Manuals vii. Maintenance Document (For a brief description of the above documents see Appendix A) 4) The exporting of records from a database with the intention to share data, between different users, should occur with a previous agreement of the involved parts. This rule is supposed to simplify the importing procedures at the receiving site. 5) Standardization of data interface and data dictionary is required by a standard object naming scheme and naming convention. 6) Each database major properties should be saved in an inventory. This inventory should be implemented as a web-enabled database. 7) The use of international databases of standards (like FAO Agrovoc database) is tempted. In this manner, there will be the bases for international cooperation.
  • 16. IPGRI – Foundations of Institutional Database Management Life Cycle Page 16 8) Planning ahead: when a database is designed, a future development cases should be considered Example: Contacts database should be designed for mailing list as well. 9) It is suggested that a special team should be created to support the creation and maintenance of Institutional Databases. 10) Although not a precise standard, there are some well-defined rules that can be used for extending the Data Dictionary. See document “How to model People and Organization” for a sample of this. In addition, initiatives are ongoing to set standards in this area following the increasing popularity of XML. We will look at these initiatives and try to find out if they can be of any help in this area. It can never be stressed enough how important it is to keep up-to-date documentation like ER diagrams, very useful for showing relationships between tables, and a data dictionary that describes what each field is used for and any aliases that may exist. Documenting SQL statements is a must as well. In this manner, the database will be a powerful resource for all the IPGRI staff. WHY INST. DATABASES ARE NOT UP TO USERS EXPECTATIONS? It has been verified that various databases are in use at different sites for the same purpose. One clear example is the Contacts database. The main reasons for this situation are the following: a) Several IPGRI sites are badly served from the communications point of view. It is difficult for most of the sites to work with sufficient efficiency on a centralized Database, even if the interface is provided through a web browser using the Internet as a transport. b) The lack of Enterprise level applications providing reliability and scalability. This has resulted in consistent work required to centrally consolidate the data present at each site. c) The lack of commitment at HQ on the maintenance of the Institutional Databases has given rise to independent versions of the databases.
  • 17. IPGRI – Foundations of Institutional Database Management Life Cycle Page 17 INSTITUTIONAL DATABASE The Database Project Team has defined rules to be applied to the Institutional Database: DEFINITION The Institutional Database is the collection of data definite as important for the IPGRI Institute. IMPLEMENTATION IPGRI uses relational database management system (RDBMS) as collector of institutional databases. The Institutional Database is implemented as Tables in a adopted RDBMS. Each Table is defined by Data Definition Language (DDL) SQL or RDBMS wizards and must contain primary keys. A relationship diagram has to be published and be available for internal users. NAMING CONVENTION DBMS Naming Convention The structure of Institutional Database will follow a naming convention. Each institutional database should have a full data dictionary as documentation. Each interface of Institutional Database will have a default layout and be programmed using a general naming convention of variables and controls. From Leszynski Database Naming Convention the following Database naming convention is assumed: Only one consented exception is accepted: the table name can be a. an aggregate name b. in plural form without any prefix For example: Data Type Prefix Example Tables tbl tblContacts Views vws vwsEurope Queries qry qryLookUp Forms frm frmContacts Reports rpt rptMain Macros mcr mcrMySubs Modules mod modFunction Stored Proc. sp_ sp_records Triggers trg trgOnClick Indexes ind indMYField Primary Keys ID IDContacts Database naming convention
  • 18. IPGRI – Foundations of Institutional Database Management Life Cycle Page 18 a. Repository or Warehouse are aggregate names b. tblContacts and tblCountries can be renamed to Contacts and Countries Field naming convention Each field (except for primary keys and foreign keys) of the tables will have only first letter capitalised and will follow this naming convention: <type><Singular Table Name><Singular Name> where l Types: one value of Database Types naming convention table (see below) l Singular Table Name: the table name in singular form l Singular Names: any name with the first letter in upper case, without any symbol and spaces (like underscore, dollars, etc.). Multiple names will be linked without spaces: “Telephone and Fax” field will become txtContactsTelFax An example of Contacts field: the Surname field could be strContactSurname (note that the table name in this field name is singular). In this manner it can be easily decomposed: strContactSurname is a contact field and contains a text value that is the surname. Except for the prefixes, all parts have only the first letter in upper case, as you can noted above, except for abbreviations that are always in capital form. Only letters must be used, in the following ranges: [a…z] and [A…Z]; space, dot, minus and other ASCII symbols cannot be used for database structures, except for underscore symbol (_): keep in mind that SQL Server and other DBMS consider the underscore like a wildcard and this could be arise some problems accessing to the data. Sometimes the length of a field could be too long: for example pktxtCollectingMissionInstOriginalInstColumn: this is the trade off between transparency, easy-rules and tiresome disadvantages. Default fields Some field names are equivalent or synonyms: "remarks" and "comments" for example. a good rule is to use only one name for the same data. It’s a good rule build a field name starting from the context. For example: A budget has a “Code” and a brief “Description” in a table “Europe” of “LOA” database. The implementation of these field names are: l TxtLOABudgetCode l TxtLOABudgetDescription As you can see they are built from the “Budget” Prefix. An alphabetic order would evidence that there are two fields about the Budget of LOA: Code and Description. If we inverted the order then we could have: TxtLOACodeBudget and TxtLOADescriptionBudget that are not clear as the first set.
  • 19. IPGRI – Foundations of Institutional Database Management Life Cycle Page 19 Here we are a brief list of common fields and a strong suggestion synonym to use in data structure: l Notes, Remarks, Comments: Remarks l Info, Descriptions: Description l [field name]ID, ID[field name]: ID[table name] l Starting period: [prefixes]DateFrom l Ending period: [prefixes]DateTo l Telephone: [prefixes]Tel l Email: [prefixes]Email l Detail, Details: mem[singular table name]Details l Update, Updated, InputDate: dat[singular table name]Update. InputDate could be different from Update name only if it’s relevant the date of the first input of a record. l URL, Website, http, webaddress: txt[singular table name]URL Primary and Foreign key naming convention General primary key is defined as: ID<Table Name> It’s used when the tuple (called also record in relational databases) has not a unique attribute value (called also value of the field in relational databases) that can be used as primary key (see Entity/Relationship Theory). The type name is not necessary because this kind of field is always a counter handled by the RDBMS. For example: IDMyTables is the primary key of MyTables table. Note that the table name is in plural form. When there are attributes as primary keys they can be implemented as pk<type><Singular Table Name><Singular Name> For example: Contacts can have PIN (Personal Identification Number) as primary key. It will become pkintContactPIN. In this manner a brief analysis of the contacts database structure will evidence that the primary key is the PIN number (it is evident that PIN is an abbreviation, being in capital form). The foreign keys will have the name of the related primary key. At first sight, this rule bring to misunderstand the primary keys from other foreign keys, but the table name included in each field name will exclude this error. For example, Contacts table could contain IDCountries as foreign key (IDCountries could be the primary key of Countries table). These field types are adopted:
  • 20. IPGRI – Foundations of Institutional Database Management Life Cycle Page 20 Object Naming Convention for Institutional Database Interface programming: Use above table in programming when you want to reference to a database object. This variable convention naming table is adopted for Institutional Database Interface programming: Data Type Prefix SQLServer Type MSAccess Type Example Boolean bit bit bool blnAccepted Byte byt binary yes/no bytPixelValue Counter idx uniqueidentifier counter idxPrimaryKeys Currency cur money currency curMoney Date dat datetime date/time datMyDate DateTime dtm datetime date/time dtmFirstTime Double dbl double numeric dblTotalDistance Float flt float float fltValue Image img image ole Object imgPhoto Integer int smallint numeric intCount Long lng int numeric lngFreeSpace Memo mem nvarchar memo memComments Object obj varbinary ole Object objListBox Smallint sml smallint numeric smlVariable String str nvarchar text strAddress Database Types naming convention Objects Prefix Connection conn Database db Field fld Group grp Index idx Property prop QueryDef sql Recordset rs Relation rel TableDef td User usr Password pwd Workspace ws Objects Naming convention
  • 21. IPGRI – Foundations of Institutional Database Management Life Cycle Page 21 This convention naming table is adopted for Institutional Database Interface programming: Data standard: the Unicode Standard Before the development of the Unicode standard, character data was limited to sets of 256 characters. This limitation came from the one-byte storage space used by a single character; one byte can represent only 256 different bit combinations. The Unicode standard expands the number of possible values for character data. By doubling the amount of storage space used for a single character, the Unicode standard exponentially increases the number of possible character values from 256 to 65,536. With this increased range, the Unicode standard includes letters, numbers, and symbols used in languages around the world, including all of the values from the previously existing character sets. IPGRI will use this code for the data inserted in Institutional Database. Data Sources Data, articles and other publications could come from a unique source represented in a universal format and published using many supports: HTML, papers, etc. It requires some rules with a complexity that is inversed proportionally to the flexibility. Many publications like annual report, PGR, newsletters, etc. could be represented in an unique database and published using XML or other descriptive languages and supports in different formats. In this way an Data Type Prefix Example Boolean bit blnAccepted Byte byt bytPixelValue Date or Time dat dtmFirstTime Double dbl dblTotalDistance Integer int intCount Long lng lngFreeSpace Object obj objListBox Single sng sngLength String str strAddress Variant vrn vrnObject Error err ErrMessage Variable naming convention Scope Prefix Example Browsing bws bwsMain Deleting del delMask Editing edit editInterface Adding New Record new newInterface Confirming Questions qst qstIMask Printing Errors err errMessage Exiting exi exiMask Table Lookup tbl tblContacts Naming convention for interfaces
  • 22. IPGRI – Foundations of Institutional Database Management Life Cycle Page 22 Annual Report Issue could have a unique origin and published in Internet, in PDF, etc.
  • 23. IPGRI – Foundations of Institutional Database Management Life Cycle Page 23 APPENDIX A: DATABASE DESIGN The database design is a well-defined standard procedure that arise from different user needs. The main purpose of a database is storing homogeneous information about a wee- defined argument, with the aim of sharing these data among different users. Database design is fundamental to obtain correct specifications and final database that matches with initial design. It produces some documents that describe various aspects of the use, the processes and the cases in which the database is used. These aspects are handy when an IPGRI member would like to know if there are databases that collect some kind of data. As a software product, the database should come up from many standard processes that produced different documents that evidence how the database is created. Many organization adopted these kind of design for their databases. It’s easy to understand the sense of that database and how can be accessed the data stored when there are some papers that explain different aspect of the database. A user searching for particular data could read the requirement document. A programmer that should access to the database could read the Specification and Implementation Documents. Finally, accounting staff should access the Planning document to obtain details about the total cost of the database without require any additional documents. All the history of the database is included in a few sheets. Database maintainers often don’t know whether a database is still used or not. Technical mistakes come often out due to the impossibility to determine the origin of a particular database previous installed. To design a good database these standard processes should be as follows: § Requirement Process: analyses the current status of the data to be imported in the database and evidences the needs to be satisfied by the database § Specification Process: defines all database features § Planning Process: the technologies are defined (DBMS used, interface web- enabled proprietary, etc.) and the cost is indicated § Implementation Process: the database is created and the interface is built § Maintenance Process: the database has to be maintained The documents produced for these processes are the following: 1. Requirement Document: It covers the WHY an implementation of this Database is being attempted. It contains the needs to be satisfied by the new database, the current status of the data to be stored and who will benefit from the database. For example: papers contain a lot of data to be shared among members that need an independent access to it. It produces a detailed list of the necessities to be satisfied,
  • 24. IPGRI – Foundations of Institutional Database Management Life Cycle Page 24 with all the advantages gained from the final implemented database. This document is fundamental for all people that want to know the purposes of the database and can avoid redundancy and duplication of data to be stored. 2. Specification Document: It covers the HOW an implementation of this Database will be used. It evidences the characteristics of the final database, its features, how the data is accessed, without specifying the technologies used during the implementation. Entity Relationships, UML and ORM show detailed description of the database structure. 3. Planning Document: shows tools used for hosting the data, with the costs of the creation and maintenance task. This document clarifies which technology will be used. 4. Creation and Integration Document or Implementation Document: It contains a brief description of the implementation (DMBS used, database name, complete path in the server, name of tools used to access data, like ASP applications, and technical information used by technicians that maintain the database, like technical documentation and user manual). 5. Maintenance Document: describes how the maintenance is performed; if the maintenance process is executed outside the Institute, this document will be the contract of maintenance. Requirement Process Specification Process Planning Process Requirement Document Specification Document Planning Document Implementation Process C&I Document Maintenance Process Maintenance Document
  • 25. IPGRI – Foundations of Institutional Database Management Life Cycle Page 25 APPENDIX B: DATABASE EVALUATION BY METRICS All databases are weighable defining metrics parameters. The quality of a database is defined by different rates of quality. The main important ones are: § Correctness: indicates whether exists matching between specification and implementation. § Reliability: concerns fault tolerance, data coherent, etc. § Integrity: evaluates the security of the data from non-authorized attack. § Maintainability: how the database is maintained. § Flexibility: concerns expandability, modular propriety, etc. § Testability: the database structure and his collocation should be assented by technical staff. § Reusability: the database could be used for other purposes. For example Contacts Database should be accessed by mailing-list tools. § Interoperability: indicates the relationships with other databases Testing mentioned quality attributes needs to be spread in engineering criterions to be evaluated using checklist methods. Every attribute is judged giving a weight to all sub attributes that constitute above rates. This checklist are simply questionnaires made up by technical staff. For example: Flexibility has this sub attributes: a) Consistency b) Complexity c) Generality d) Modularity e) Auto-documentation The relative checklist questions could be: i. Is the database produced following IPGRI standard techniques?
  • 26. IPGRI – Foundations of Institutional Database Management Life Cycle Page 26 ii. Is the structure comprehensible? iii. Is the database usable for other requirements? iv. Are the tables decomposable? v. Can a user comprehend the meaning of the database without access to other documents? The value of the flexibility rate is defined as: Vflex = Doing this method for all rates the quality of a database is correctly defined. ghtsanswer weimaximumofsum ghtsanswer weiofsum
  • 27. IPGRI – Foundations of Institutional Database Management Life Cycle Page 27 APPENDIX C: REDUNDANCY AND NORMALISATION One of the objectives in designing a relational database is the reduction of duplication in the stored data. Duplicated data items represent redundancy. That is, duplicated items take up more storage space than is absolutely necessary. We might put up with this loss in storage space if it were not for a more significant consequence of redundancy. If a data item is stored in more than one place then, when we need to change that item we must do so in every location that it is to be found. The more copies there are the more difficult this is. If we miss just one copy then the database is in an invalid state (being incoherent) and there is no easy way to know which of the stored versions is correct. In relational database design, the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. There are three main normal forms, each with increasing levels of normalization: a. First Normal Form (1NF): Each field in a table contains different information. For example, in an employee list, each table would contain only one birth date field. b. Second Normal Form (2NF): No field values can be derived from another field. For example, if a table already included a birth date field, it could not also include a birth year field, since this information would be redundant. c. Third Normal Form (3NF): No duplicate information is permitted. So, for example, if two tables both require a birth date field, the birth date information would be separated into a separate table, and the two other tables would then access the birth date information via an index field in the birth date table. Any change to a birth date would automatically be reflect in all tables that link to the birth date table. NORMALISATION A technique exists by which we can arrange our data so that redundancy is minimised. Normalisation arranges the data in a succession of normal forms. Each normal form further reduces the degree of duplication. The first step is to make sure data is in First Normal Form (1NF). This is quite easy as we simply make sure there are no repeating groups. Any groups which repeat are placed in a separate table. The trick here is to look at the key. The key is the attribute which can be used to uniquely determine the row of the table that we are interested in. If there are any repeating groups then the key does not adequately determine the contents of a row and the table is not in 1NF. Now, since an E/R model of the data is necessary, entities has a primary key. The key was chosen so that it did uniquely identify each occurrence and so the entity should be in 1NF already.
  • 28. IPGRI – Foundations of Institutional Database Management Life Cycle Page 28 EXAMPLE A manufacturing company makes products from a variety of components. Each product has a unique product number, a name and an assembly time. Each component has a unique component number, a description, a supplier code and a price. Assume we have an entity definition from our data modelling which looks like this: Product (ProdCode, Name, Time, ComponentCode, Description, Quantity, Supplier, Cost) The primary key is the product code ProdCode but look at some typical occurrences of this entity and we will see some problems. An example tuple of the relation defined above is: (325,Trolley,0.35,B1378,Wheel,6,S2341,0.22) While the product code uniquely identifies each product it does not serve to identify each occurrence of the Product entity therefore the entity definition is not in 1NF. The problem is caused by the fact that some of the (non-key) attributes are not dependant upon the primary key. For example, the description, supplier and cost are dependant upon the component code and the quantity is dependent upon the combination of the product code and the component code. Dependency in this case is about how we can work out one attribute once given another. For example, ProductCode 325 tells us we are dealing with a Trolley but not which supplier(s). If we are given the ComponentCode though, we can determine which Supplier. Thus, the Supplier depends upon the ComponentCode. The reverse is not true and if we know the Supplier we cannot determine the either the ComponentCode or the ProductCode. We can group together those attributes where there are some dependencies by writing a list of functional dependencies. In order to get the definition into 1NF we need to extract those groups which repeat and put them into an entity of their own. In this case the last five attributes form a repeated group for each instance of a product code and must go into a separate entity. We must take care however to take a copy of the primary key as that will be needed to form a link between the two new entities. Our two new entity definitions are: § Product (ProductCode, Name, Time) § Component (ProductCode, ComponentCode, Description, Quantity, Supplier, Cost) You may complain that the new entity Component contains a repeating product code. However, this is now a necessary part of the primary key of Component and represents the least duplication we can have and still maintain a link between the two entities. Each product now has its name and assembly time stored only once so that if that changes we only have to change it in one place. There is a further step that we can take. You should notice that, in the occurrence entity, the Description, Supplier and Cost get repeated because they depend on part of the primary key not all of it. We can transform the Component entity into two new 2NF entities. A 2NF entity is one where all the (non-key)
  • 29. IPGRI – Foundations of Institutional Database Management Life Cycle Page 29 attributes depend on all of the primary key. We shall extract the offending attributes and create two new entities with the following definitions: § Parts (ProductCode, ComponentCode, Quantity) § Component (ComponentCode, Description, Supplier, Cost) Notice that Quantity is a function of both ProductCode and ComponentCode and that Description, Supplier and Cost are functions only of the ComponentCode. The tables will now look like this: § Parts (ProdCode CompCode Quantity ) ; Tuple Example : (325, B1378, 6) § Component (CompCode Description Supplier Cost ); Tuple Example: (B1378, Wheel, S2341, 0.22) There is a third stage that can be applied although our data now satisfies the conditions for that and there is little else we can do to remove redundancy. The important thing to realise is that our data is now stored in a way that minimises the amount of duplication. That will help to maintain the integrity of the database and the quality of the data. The complete definition and the occurrence tables are shown on the next page. Compare them with the original definition and table carefully and note the differences. The normalised database has now the following entities: § Product (ProductCode, Name, Time) § Parts (ProductCode, ComponentCode, Quantity) § Component (ComponentCode, Description, Supplier, Cost) NORMALIZE PROCESS TO ELIMINATE REDUNDANCY Normalization process helps eliminate the redundancy of data in a database by ensuring that all fields in a table are atomic. There are several forms of normalization, but the Third Normal Form (3NF) is generally regarded as providing the best compromise between performance, extensibility, and data integrity. Briefly, 3NF states that: § Each value in a table is to be represented only once § Each row in a table should be uniquely identifiable. (It should have a unique key) § No non-key information that relies upon another key should be stored in the table Databases in 3NF are characterized by a group of tables storing related data that is joined together through keys.
  • 30. IPGRI – Foundations of Institutional Database Management Life Cycle Page 30 For example, a 3NF database for storing customers and their related orders would likely have two tables: Customer and Order. The Order table would not contain any information about an order’s related customer. Instead, it would store the key that identifies the row containing the customer’s information in the Customer table. Higher levels of normalization exist, but is not always necessarily better. In fact, for some projects, even 3NF may introduce too much complexity into the database to be worth the rewards.
  • 31. IPGRI – Foundations of Institutional Database Management Life Cycle Page 31 APPENDIX D: DATABASE STANDARDS Here we will look at the standards to be used in the preparation of the documents listed in Appendix A on Database Design and the products and interfaces that are to be used for the implementation itself. Database design can be a very complex task, but it starts with an important iteration with the final users. In fact, the project leader in a Database project should be chosen as a champion in the area where the Database is going to be used. For this purpose the first step in a Database Project is to map the requirements to a Conceptual Model. The Conceptual model has nothing to do with technology and a lot with trying to capture what kind of information we want to store to solve our business problem. Several models have been created, each with its strengths and weaknesses but none has emerged as the best in all-possible situations. Therefore, it is most probable that more than one model will have to be used in this area. See the document “Evaluation of modelling Techniques” for details. Currently, we are oriented toward using the ER (Entity Relationship) and the ORM (Object Role Modeling) models. See “Entity Relationship diagrams documentation and presentation” at http://dec.bournemouth.ac.uk/staff/kcox/ERDs/index.htm and the document “Modeling, Data Semantics and Natural Language” for another analysis of the Conceptual models and for details on ORM. ER is best for quick reference and maintenance and it is widely known between developers while ORM is best for the interaction with the users. ORM allows also to model the business rules that apply over the information. After the Conceptual model is ready it can be mapped to a Logical model. At this stage, we have to choose the type of Database system we are going to use such as Hierarchical, Networked or Relational. Because, the industry has been orienting itself toward the Relational model already from a long time we really do not have much choice here. The Relational model is mathematically well founded and it has given rise to a number of important standards that allow the cooperation of different products in the same application.In particular, SQL (Standard Query Language) is a declarative language that can be used to work on the Relational Databases and, although very powerful it is oriented toward the final user. For more information on SQL see the document “Introduction to Structured Query Language”. At a third stage we will have to map the Logical model to the Physical model. Now we have to make our choice of a product. We have come to this point after having created the most important documentation using models that are product independent. To help us further, we have to adopt products based on standard interfaces, such as ODBC, which will simplify the migration to new products in the future. In addition, other requirements will become important at this stage such as: a) Network support group b) Know How c) The ability to replicate data on slow links
  • 32. IPGRI – Foundations of Institutional Database Management Life Cycle Page 32 d) Security granularity requirements e) Decision support components f) Data Warehousing capabilities The Conceptual model defines Information at the highest level of abstraction while the Physical model describes the details at the lowest level of abstraction. Due to their importance for the success of a Database Design project the models chosen must be kept always in sync. Organizational Decision makers can take great advantage from the central consolidation of Information stored in different products. This way they can perform high level analysis on the Information which is already available in different areas of the Organization. This is the target of Data warehouses. However, letting different products talk to each other is requires standards. For this reason, Database vendors started to include a Data Repository that stores all Meta data information about the databases along with the Databases themselves. Fortunately, lately we have assisted in this area to the consolidation of standards under the unique CWM (Common Warehouse Metamodel) from the OMG (Object management Group). Go to http://www.omg.org/cwm/ for details on OMG. See the document “Database Metadata Standard” for details. The CWM standardizes a complete, comprehensive metamodel that enables data mining across database boundaries at an enterprise and goes well beyond. Like a UML profile but in data space instead of application space, it forms the MDA mapping to database schemas. The product of a cooperative effort between OMG and the Meta-Data Coalition (MDC), the CWM does for data modelling what UML does for application modelling. The models outlined above focus on the information used by the processes but do not give any tool to describe the processes themselves in any way. UML (Unified Modelling Language) is the mostly accepted and supported way of defining the Business rules and processes. UML is object based which allow the model to be easily mapped to modern object oriented languages like Java. In fact, UML along with XML (eXtensible Markup Language) and XMI (XML Metadata Interchange) are used as a base for the CWM standard mentioned above.
  • 33. IPGRI – Foundations of Institutional Database Management Life Cycle Page 33 APPENDIX E: GLOSSARY ANSI (AMERICAN NATIONAL STANDARDS INSTITUTE): An association formed by the American Government and industry to produce and disseminate widely used industrial standards. ATTRIBUTE: a noun describing a value which will be found in each tuple in a relation. Usually represented as a column of a relation. It’s a property that can assume values for entities or relationships. Entities can be assigned several attributes . CANDIDATE KEY: one or more attributes which will uniquely identify one tuple in a relation. A candidate key is a potential primary key. COLUMN: A component of a table that holds a single attribute of the table. COMPOSITE KEY: A key in a database table made up of several fields. Same as concatenated key. CONCEPTUAL VIEW: The schema of a database DATA: A recording of facts, concepts, or instructions on a storage medium for communication, retrieval, and processing by automatic means and presentation as information that is understandable by human being. DATA AGGREGATE: A collection of Data items. DATA DICTIONARY: It’s contains definitions of Data, the relationship of one category of data to another, the attributes and keys of groups of data, and so forth. Software tools for recording these information are used. DATA ELEMENT: A uniquely named and well-defined category of data that consists of data items, and that is included in the record of an activity. DATA ENTRY: The process of entering data into a computerized database or spreadsheet. Data entry can be performed by an individual typing at a keyboard or by a machine entering data electronically. DATA MINING: Term for a class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data. DATA MART, DATAMART: A database, or collection of databases, designed to help managers make strategic decisions about their business. Whereas a data warehouse
  • 34. IPGRI – Foundations of Institutional Database Management Life Cycle Page 34 combines databases across an entire enterprise, data marts are usually smaller and focus on a particular subject or department. Some data marts, called dependent data marts, are subsets of larger data warehouses. DATA MODEL: 1) the logical data structures, including operations and constraints provided by a DBMS for effective Database processing. 2) The system used for the representation of Data (e.g., the ERD or relational model). A data model is an abstract representation of the data used by an organization, such that a meaningful interpretation of the data may be made by the model's readers. The data model may be at a conceptual, external or internal level (as defined by ANSI). DATA SOURCE: The source where the data to be accessed is stored. A generic name for data whether stored in a conventional data source (such as Oracle or in a file system such as RMS and VSAM). The name given to a data source in the binding. DATA WAREHOUSE: A copy of transaction data specifically structured for query and analysis. A collection of data designed to support management decision-making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. The term data warehousing generally refers to combine many different databases across an entire enterprise. Contrast with data mart. DATABASE: 1) a collection of all the data needed by a person or organization to perform needed functions 2) a collection of related files 3) any collection of data organized to answer queries 4) (informally) a database management system DATABASE MANAGER: 1) the person with primary responsibility for the design, construction, and maintenance of a database. 2) (informally) a database management system. DENORMALISATION: To allow redundancy in a table so that table can remain flat, rather than normalized. DB2, DB3, DB4: The IBM relational database systems DBMS(DATABASE MANAGEMENT SYSTEM): Also called database manager, it’s an integrated collection of programs designed to allow people to design databases, enter and maintain data, and perform queries. It contains the tools to manage the data and the structures by DML and DDL. DDL (DATABASE DEFINITION LANGUAGE): it’s the language used to define database tables structures, relationships, triggers, procedures needed to build the skeleton of the database.
  • 35. IPGRI – Foundations of Institutional Database Management Life Cycle Page 35 DML (DATA MANIPULATION LANGUAGE): this language is used to perform query to databases. Distributed Database: A database in which the resources are stored on more than one computer system, often at different physical locations. ENTITY: a real-world object, observation, transaction, or person about which data are to be stored in a database. ENTITY-RELATIONSHIP (ER OR ERD) DIAGRAM: design tool used primarily for relational databases in which entities are modeled as geometric shapes and the relationships between them are shown as labeled arcs. It’s a model of an organization’s data in which the objective has been to remove all repeated values by creating more tables. FIELD: term used by Access as a synonym for attribute. FILE: 1) the separately named unit of storage for all data and programs on most computers. For example, a relation or a whole database may be stored in one file. 2) term used as a synonym for relation in some (particularly older) database managers, like dBase. INCOHERENT DATA: a value of an attribute that doesn’t reflect the real state of the data. An incorrect address of a contact or a couple of similar data are two examples of incoherent data. INDEX: 1) a method used to reorder tuples or to display them in a specific order 2) a data structure used to give rapid, random access to relations. Indexes are most often used with large relations. JOIN: An operation that takes two relations as operands and produces a new relation by concatenating the tuples and matching the corresponding columns when a stated condition holds between the two. It uses data from more than one relation (table). The relations must have at least one attribute (called the join or linking attribute) in common. KEY: an attribute or combination of attributes. A combination of their values will be used to select tuples from a relation. MANY-TO-MANY RELATIONSHIP: One or more tuples in one relation may be related to one or more tuples in a second relation by a common value of a join attribute. This implies that each value of the join attribute may appear any number of times in either relation or in both. NORMAL FORM: 1) a condition of relations and databases intended to reduce data redundancy and improve performance 2) The method of normalizing a database. There are three main normal forms: First, Second, and Third. First Normal Form says that each field in a table must contain different information. Second Normal Form says that no field values can be derived from another field. The Third Normal Form says that no duplicate
  • 36. IPGRI – Foundations of Institutional Database Management Life Cycle Page 36 information is permitted within two or more tables. Normalized tables are linked using key fields. NORMALIZE: The process of removing redundancy in data by separating the data into multiple tables, decomposing complex data structures into natural structures. ODBC (OBJECT DATABASE CONNECTIVITY): A standard interface between a database and an application that is trying to access the data in that database. ODBC is defined by an international (ISO) and a national (ANSI) standard. The moist recent version is called SQL-92. ONE-TO-MANY RELATIONSHIP: exactly one tuple in one relation is related by a common join attribute to many tuples in another relation. This implies that each value of the join attribute is unique in the first relation but not necessarily unique in the second. ONE-TO-ONE RELATIONSHIP: exactly one tuple in one relation is related by a common join attribute to exactly one tuple in another relation. This implies that each value of the join attribute appears no more than once in each of the relations. PERSISTENT QUERY: a query which is stored for reuse PRIMARY KEY: a key such that the value of the key attribute(s) will uniquely identify any tuple in the relation. A relation must not have more than one primary key. QUERY: literally, a question. 1) a command, written in a query language, for the database to present a specified subset of the data in the database. 2) the subset of data produced as output in response to a query QUERY LANGUAGE: a computer language which can be used to express queries. QUERY RESOLUTION: the process of collecting the data needed to answer a query. RECORD: term used as a synonym for tuple in some (particularly older) database management systems, like dBase. RECURSIVE QUERY: a query in which the output of the query is then used as input for the same query. RDBMS(RELATIONAL DATABASE MANAGEMENT SYSTEM): see Database Management System and Relational Database. RECORD: In database management systems, a complete set of information. Records are composed of fields, each of which contains one item of information. A set of records constitutes a file. For example, a personnel file might contain records that have three fields: a name field, an address field, and a phone number field. In relational database management systems, records are called tuples.
  • 37. IPGRI – Foundations of Institutional Database Management Life Cycle Page 37 REDUNDANCY: A feature provided by relational database management systems (RDBMS's) that prevents users or applications from entering inconsistent data. Most RDBMS's have various referential integrity rules that you can apply when you create a relationship between two tables. It’s the practice of storing more than one occurrence of data. In the case where data can be updated, redundancy poses serious problems. In the case where data is not updated, redundancy is often a valuable and necessary design tool. The duplication of data in the database to improve the ease and speed of access to data can arise the risk that changes may cause conflicting values. REFERENTIAL INTEGRITY: A feature provided by relational database management systems (RDBMS's) that prevents users or applications from entering inconsistent data. An integrity mechanism ensuring vital data in a database, such as the unique identifier for a given piece of data, remains accurate and usable as the database changes. Referential integrity involves managing corresponding data values between tables when the foreign key of a table contains the same values as the primary key of another table. For example, suppose Table B has a foreign key that points to a field in Table A. Referential integrity would prevent from adding a record to Table B that cannot be linked to Table A. In addition, the referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete. Finally, the referential integrity rules could specify that whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly. This is called cascading update. RELATION: the basic collection of data in a relational database. Usually represented as a rectangular array of data, in which each row (tuple) is a collection of data about one entity RELATIONAL DATABASE: A type of database management system (DBMS) that store data in the form of related table. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways. An important feature of relational system is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table. REPLICATION: Duplication of table schema and data or stored procedure definitions and calls from a source database to a destination database, usually on separate servers. ROW: term used by Access as a synonym for tuple RUNNING A QUERY: Term for query resolution SCHEMA: 1).a description of a database. It specifies (among other things) the relations, their attributes, and the domains of the attributes. In some database systems, the join attributes are also specified as part of the schema. 2) the description of one relation SECONDARY KEY: a key which is not the primary key for a relation.
  • 38. IPGRI – Foundations of Institutional Database Management Life Cycle Page 38 SELECT: a query in which only some of the tuples in the source relation appear in the output SEQUEL: see SQL STORED PROCEDURE: In database management systems (DBMSs) , an operation that is stored with the database server. Typically, stored procedures are written in SQL. Stored procedures execute faster than ordinary SQL requests because they have been compiled and optimized by the server. By keeping the requests on the SQL server, they don't have to be coded into the user's front end, thereby allowing the program to load and execute faster. Stored procedures are an important element in load balancing. SQL: pronounced 'Sequel', stands for Sequential Query Language, the most common text- based database query language The SQL is both a DDL and a DML languages. DLL is defined by CREATE, ALTER statements and other commands. DML is represented by SELECT, UPDATE, DELETE, INSERT statements and so on. There are different standards of SQL: ANSI-SQL, T-SQL, etc. TABLE: A relation that consists of a set of columns with a heading and a set of rows (i.e., tuples). It’s a noun used as a synonym for relation in relational theory. TECHNICAL REENGINEERING: an organizational restructuring based on a fundamental re-examination of why a database exists. TRANSACTION: 1) the fundamental unit of change in many (transaction-oriented) databases. A single transaction may involve changes in several relations, all of which must be made simultaneously in order for the database to be internally consistent and correct. 2) the real-life event which is modeled by the changes to the database. TRIGGER: A detectable event that causes another action to happen. For instance, changing a discount rate in a grocery store's inventory database may cause an alert to be emailed to a manager. TUPLE: within a relation, a collection of all the facts related to one entity. Usually represented as a row of data. In relational database systems, a record. See record. VALUE: the computer representation of a fact about an entity.
  • 39. IPGRI – Foundations of Institutional Database Management Life Cycle Page 39 COMMENTS Database IPGRI_DB: • Does “Centres” table contain characteristics of Institutes? • Is the CROPCODE equal to SPECIE in AMS? • IND_PROF e IND_TAGS not included • Delete fields of “ipgriregion” from institute and contact • INST: Name_nat can be considered as “txtInstituteName” • PktxtCountryISOCodeCode is country.cty?? Database IPGRIAddressBook: • The “Flags” table contains the field names of ipgri_db “contacts” table • USERLOG could be treated as “user” table Database IPGRI-Eur-PopulusClones: • What is Clone and Accession? • “Clones” table skipped • A country is defined by: ISOCODE, CTY field or by name? • An institute is defined by: ISOCODE or INSTCODE field? • Must be “Main” table renamed to “Accession” ? • Main: “INSTITUTION WHERE MAINTAINED” where is the source of institutions? • “Main” table is not completely processed • “Maintenance” table processed Database IPGRIWEB: l ANNUALREPORT skipped l ANNUALREPORT_TEXT skipped l conflict_PUBSURVEY skipped l conflict_TRAINSURVEY skipped l conflict_WEBSURVEY skipped l COUNTRY contain institutional Country table translated in different languages: useful. ISO3 field is equal to Country.cty. l The table CROPS_TYPE and ipgri_db.material contain the same kind of data? l Is CROP tabela subset of CROPS_TYPE? l CROPS table must be renamed to “CropNames” l EVENT table: forced the primary key to match with ipgri_db.events l GENEFLOW used in website: skipped l Should IPGRI Users have only one table for the access to all IPGRI web-applications? l About language: many tables have different translation of a particular name. I.e. crop can be mentioned in English, Spain or French. Could we create an external database of all international names?
  • 40. IPGRI – Foundations of Institutional Database Management Life Cycle Page 40 l IPGRIWEB.OFFICE_LOCATION.COUNTRY links to COUNTRY name either than country cty. l OWNER table should be merged with Acronym table? l PGR skipped l PGR_ARTICLE skipped l PUBBLICATION contains articles published in the website l PUBBLICATION skipped l PUBBLICATION_OWNER (linked to PUBBLICATION) is skipped l PUBSURVEY skipped l RESOURCE skipped l SERIE skipped l STAFF: should id_ownergroup and id_ownergroup2 linked to GROUPS in TIP? l Does STAFF contain only and all IPGRI members? l STAFF must be revisited l THEME skipped l TOPIC skipped l TRAINING skipped: purpose not clear l TRUSTEES skipped: purpose not clear l URL_OWNER skipped l URL_REGION: regions are ipgriregions? l URL_TOPIC skipped l USETYPE skipped l WEBPAGE, WEBPAGE_AGROVOC, WEBPAGE_COUNTRY, WEBPAGE_CROP, WEBPAGE_EVENT, WEBPAGE_INSTITUTIONAL, WEBPAGE_NETWORK, WEBPAGE_REGION, WEBPAGE_RESOURCE, WEBPAGE_THEME, WEBPAGE_TRAINING, WEBSURVEY skipped IPGRNewsletter skipped Presupuesto skipped Proyectos database: l Pbcatcol skipped: purpose not clear l Pbcatedt skipped: purpose not clear l Pbcatfmt skipped: purpose not clear l Pbcattbl skipped: purpose not clear l pbcatvld skipped: purpose not clear