A Multi-Criteria Evaluation of Environmental databases using Hasse diagram technique-It is a multi-criteria evaluation method which can be used as a tool to rank objects and is hence also applicable to decision making.
The HDT reveals the best and the worst databases and conflicts among them, due to different information content.
A multi criteria evaluation of environmental databases using hasse
1. A Multi-Criteria Evaluation of
Environmental databases using
Hasse diagram technique
(Research Paper)
By:
K.Balamurugan
MFCS
M.Tech-CSE-1st year
17/03/2016 Pondicherry University 1
2. Abstract
ā¢ We apply Hasse Diagram Technique (HDT) (the software tool is named
ProRank) which originates in discrete mathematics
ā¢ It is a multi-criteria evaluation method which can be used as a tool to rank
objects and is hence also applicable to decision making.
ā¢ The HDT reveals the best and the worst databases and conflicts among
them, due to different information content.
ā¢ We evaluate 15 Internet databases with respect to the existence of data
on 24 chemicals. The information in the database x is coded by 0 = not
available or 1 = available. Subsets of the databases are evaluated:
ā¢ Single databases, European versus US databases, and databases which
comprise 2001-10,000 chemicals.
ā¢ Only one database, ChemExper Catalog contains all 24 chemicals. The
comparison of European and US databases revealed no marked difference
in the quantity of the selected information base, thus refuting the
widespread notion that US databases cover more chemicals than do
European databases.
ā¢ To sum up it can be stated that the data availability on the chosen test-set
of chemicals is far from being satisfactory
17/03/2016 Pondicherry University 2
3. 1. Introduction
ā¢ The increasing complexity of environmental problems, the growing
number of topics involved and keen competition between conflicting
interests make decisions and decision support difficult.
ā¢ In most environmental problems multi-criteria questions arise. Hence,
decision-support tools are required which are able to solve multi-criteria
problems.
ā¢ a multi-criteria decision-support tool is the Hasse Diagram
Technique, based on discrete mathematics.
ā¢ The commercial soft-ware for HDT, ProRank, has been applied in the
present study. ProRank presents a rather new approach based on partially
ordered sets, and avoids the loss of information received by merging
characterizing properties and thus preserves important elements of the
evaluation and decision-making processes
17/03/2016 Pondicherry University 3
4. 1. Introductionā¦ā¦
ā¢ The demand for decision-support tools is
particularly strong in the field of water pollution.
Close cooperation between scientists and
decision makers is necessary and scientists will be
a key element in decision making
ā¢ where the job of the scientist ends and the task
of the decision maker begins
ā¢ Several approaches for multi-criteria decision-
support methods and tools for environmental
applications exist.
17/03/2016 Pondicherry University 4
5. 1. Introductionā¦ā¦
ā¢ A fuzzy knowledge-based decision-support system, providing information on the environmental
impact of anthropic activities by examining their effects on groundwater quality.
ā¢ Hasse Diagram Technique (HDT), normally providing more than one favourable solution (partial
order).
ā¢ White Paper, to collect data on chemicals for their risk assessment leading to, where necessary,
risk reduction.
ā¢ The gap in knowledge about intrinsic properties of existing substances should be closed to ensure
that equivalent information to that on new substances is available. The available information on
existing chemicals, as well as on pharmaceuticals, should be thoroughly examined and best use
made of it in order to waive testing, wherever appropriate.
ā¢ However, publicly available knowledge of existing chemicals contains significant gaps . For example
the contents of the IUCLID (International Uniform Chemical Information Database) were evaluated
in a study in 1999. Considerable data gaps were found in environmental fate and pathways, and in
ecotoxicity parameters. The sparse data situation in environmental and chemical databases was
confirmed by an evaluation approach
ā¢ In this study the contents of data-bases were evaluated whereas in the present paper we study the
data availability on different kinds of chemicals in several subsets of databases.
17/03/2016 Pondicherry University 5
6. 2. Data availability for environmental
chemicals
ā¢ Data availability for environmental chemicals is strongly related to
structuring and archiving them in environmental and chemical
databases.
ā¢ Several approaches are used to access the quality of databases in
environmental sciences and chemistry. Commercial databases in
toxicology were evaluated by applying environmental and
toxicological evaluation criteria.
ā¢ Commercial online databases and CD-ROMs were examined with
chemical and environmental evaluation parameters .
ā¢ Data availability is an important prerequisite for scrutinizing
chemical substances (existing chemicals as well as pharmaceuticals)
for their environmental behaviour and effect.
ā¢ We intend to determine whether publicly available databases
comprise information on environmental chemicals, and in a further
step, evaluate what kind of information is available.
17/03/2016 Pondicherry University 6
7. 3. Methodology
3.1. Background of the Hasse Diagram Technique
(HDT)
ā¢ The Hasse Diagram Technique (HDT) is an approach based
on partially ordered sets that preserves important elements
of the evaluation and decision-making processes.
ā¢ The basis of the HDT is the assumption that a ranking can
be performed, while avoiding the use of an ordering index.
ā¢ For an evaluation of the objects they must be compared.
The comparison is made by examining characteristic
properties (attributes, descriptors) of these objects.
ā¢ If the evaluation is aimed at assessing criteria, then the
attributes (synonyms: descriptors) are thought of as
measures, of how well a criterion is fulfilled.
ā¢ Attributes are in the case of the object āāxāā denoted as
q(1,x), q(2,x), ., q(m,x) and often written as a tuple q(x).
17/03/2016 Pondicherry University 7
8. 3. Methodology
3.1. Background of the Hasse Diagram Technique
(HDT)ā¦..
ā¢ properties are gathered to a set without reference to actual values
realized by the objects
.
ā¢ This set of properties is called an information base IB. Often sub-sets of
the IB are needed. Consider now two objects x and y, then we say y ā„ x
(with respect to the m properties of interest)
ā¢ if q(i,x) ā„ q(i,y) for all i =1, 2, ā¦, m and there is at least one
ā¢ i*, for which q(i*,x) > q(i*,y) (because of the demand āāfor allāā, this
definition is denoted as āāgenerality principleāā)
ā¢ If q(i,x) ā„ q(i,y), or q(i,x) ā¤ q(i,y), for all i =1, ., m then the objects x and y
are comparable. The mere fact that x is comparable with y (without the
information about the orientation) is often denoted as x ā±¶y.
17/03/2016 Pondicherry University 8
9. 3. Methodology
3.1. Background of the Hasse Diagram Technique
(HDT)ā¦..
ā¢ However, one often finds
ā¢ q(i,x) < q(i,y) for one index set a and
ā¢ q(i,x) > q(i,y) for another index set with
ļ§ In such a case, the objects x and y are incomparable, and one writes x || y. Although
incomparability's are not wanted in a final decision, they reveal interesting conflicts
among the objects.
The main framework of HDT can be characterized as follows:
1. Selecting a set of elements of interest which are to be compared, E. The set E is called
the ground set. This notation expresses that the ground set, together with at least one
binary relation among the elements of E, gets a structure which can often be
represented as a digraph as in the case discussed here.
2. Selecting a set of properties, by which the comparison is performed, called the
information base IB.17/03/2016 Pondicherry University 9
10. 3. Methodology
3.1. Background of the Hasse Diagram Technique
(HDT)ā¦..
ā¢ 3. Finding a common orientation for all properties,
according to the criteria they are assigned.
ā¢ 4. Analysing if one of the following three
relations is valid:-
ā¢ equivalence, we call the corresponding
equivalence relation R, the equality of two tuples
q(x),q( y). By R the quotient set E/R is given and
ā¢ Almost all operations in HDT are based on the quotient
set E/R. For example the visualization in WHASSE is
based on representatives taken from equivalence sets.
In the software ProRank, the vertices are associated
with the equivalence classes themselves.
17/03/2016 Pondicherry University 10
11. 3. Methodology
3.1. Background of the Hasse Diagram Technique
(HDT)ā¦..
ā¢ The relation defined above among all objects, is indeed an order relation, because it fulfils the
axioms of order, namely
ā¢ reflexivity (one can compare each object with itself)
ā¢ antisymmetry (if x is preferred to y then the reverse is only true, if the two objects are equal (or
equivalent))
ā¢ transitivity (if x is better than y, and y is better than z, then x is better than z).
ā¢ A set E equipped with an order relation ā¤ is said to be an ordered set, or partially ordered set, or
briefly āāposetāā, and is denoted as (E, ā¤ ).
ā¢ We note: A set E, equipped with a partial order, is often written as (E, ā¤). Because the ā¤ comparison
depends on the selection of the information base (and of the data representation (classified or not,
rounded, an so on)) we also write (E,IB) to denote this important influence of the IB for any
rankings .
ā¢ Concerning the evaluation of the ecotoxicity of environmental chemicals by lethal concentrations
i.e. LC50 values for example, orientation is the following:
ā¢ small values: āāgoodāā, relatively non-hazardous
ā¢ large values: āābadāā, relatively hazardous.
17/03/2016 Pondicherry University 11
12. 3. Methodology
3.1. Background of the Hasse Diagram Technique
(HDT)ā¦..
ā¢ In our applications, the circles near the top of the page of the Hasse
Diagram indicate objects that are the āābetterāā objects according to the
criteria/attributes used to rank them.
ā¢ The objects not āācoveredāā by other objects are called maximal objects.
Objects which do not cover other objects are called minimal objects.
ā¢ Equivalent objects are different objects that have the same data with
respect to a given set of attributes. Only one representative of the
equivalent objects is shown in the Hasse Diagram and named Kn in the
ProRank software.
ā¢ The total number of comparabilities V and incomparabilities U and their
local analogues that is to say the number of comparabilities V(x) and
incomparabilities U(x) of a certain element x, are useful quantities for the
documentation of the Hasse Diagram and for the estimation of ranking
uncertainties
ā¢ Whereas V can be considered as a degree of correlatedness among the
attributes, the quantity U generally provides information about the extent
of conflicts among the objects.
17/03/2016 Pondicherry University 12
13. 3. Methodology
3.2. Characterizing numbers of the Hasse Diagram
ā¢ In order to interpret a Hasse Diagram, some further terms
and notations have to be introduced:
ā¢ N: the number of objects.
ā¢ IB: āāinformation baseāā, the set of attributes characterizing
the objects within the evaluation study.
ā¢ m: number of attributes of IB.
ā¢ Rk(x): the rank of object x within a total order.
ā¢ Chain: a set of mutually comparable objects.
ā¢ Antichain: a set of mutually incomparable objects.
ā¢ Articulation point: a vertex of the transitive hull of the
digraph the elimination of which would increase the
number of hierarchies
17/03/2016 Pondicherry University 13
14. 3. Methodology
3.2. Characterizing numbers of the Hasse Diagram
ā¢ If articulation points exist, the Hasse Diagram can almost be
separated into hierarchies. That means that the identification of
articulation points helps to discover specific data structures within
the data-matrix.
ā¢ Levels: a first screening and a partitioning of set E according to
increasing values of the attributes. The levels are defined by the
longest chain within the Hasse Diagram. The assignment of objects
to levels cannot be done uniquely from the point of view of order
theory, but uniquely if additional rules are introduced (for example:
conservatively ,meaning if HDT objects are assigned to the highest
possible level).
ā¢ The set of levels together with the ā¤ relation, forms a new poset
(L, , ā¤ ), which represents a chain over all objects of L, that is to say
a total order. Both the empirical posets (E, , ā¤ ) and (L, , ā¤ ) are
related by an order-preserving map.
17/03/2016 Pondicherry University 14
15. 4. Set-up of a data-matrix to be
evaluated by Hasse Diagram Technique
17/03/2016 Pondicherry University 15
16. 4.Set-up of a data-matrix to be evaluated
by Hasse Diagram Techniqueā¦.
17/03/2016 Pondicherry University 16
17. 4.Set-up of a data-matrix to be evaluated
by Hasse Diagram Techniqueā¦.
ā¢ For the evaluation of databases with chemicals, a data-
matrix was set up consisting of 15 Internet databases
and 12
ā¢ pharmaceuticals as well as 12 High Production Volume
Chemicals (HPVCs).
ā¢ The databases are listed in Table 1, together with their
abbreviation, Internet address (URL =Uniform Resource
Locator)and number of chemicals.
ā¢ The queries were made by CAS-numbers. If the
chemical could not be found by this step, a search by
trade name, as listed in Table 2.
17/03/2016 Pondicherry University 17
18. 4.Set-up of a data-matrix to be evaluated by
Hasse Diagram Techniqueā¦.
ā¢ Four different types of numerical databases can be distinguished:
ā¢ Single databases which cover only one data collection (BID, CIV,
GES, HSD, ICS, NCL, OEK).
ā¢ Multi-database databases which encompass several databases
under the same name and search interface (ECO,ENV, EFD, ESI,
EXT).
ā¢ Monograph databases which cover extensive reviews on very few
chemicals (EHC, OIH).
ā¢ Catalog Database (CEX).
17/03/2016 Pondicherry University 18
19. 5.Applying the multi-criteria evaluation
approach
5.1. Evaluation of the 15 databases
ļ§ The Hasse Diagram Technique using the ProRank program was applied to
the complete 15 Ć24 data-matrix, and the result is given in Fig. 1. The
diagram is structured into seven levels, numbered from the bottom
(minimal objects) to the top (maximal object).
ļ§ Only 13 databases are individually shown in the diagram, with the
equivalent objects (databases) indicated by the letter K. This K1 means
that the database CIV is equivalent to ICS, and K2 means that EHC and EXT
are equivalent.
ļ§ The catalog database CEX (ChemExper Catalog of Chemical Suppliers,
Physical Characteristics) is the only maximal object in this evaluation
approach. This object is also called the greatest object. The CEX database
is connected with all other databases in the downward position; hence it
comprises more chemicals than any other database using our approach
ļ§ The minimal objects are OIH (OECD Integrated HPV Database), BID
(Biocatalysis/Biodegradation Database) and the equivalent objects EHC
(Environmental Health Criteria Monographs) and EXT (EXTOXNET).
17/03/2016 Pondicherry University 19
20. 5.Applying the multi-criteria evaluation approach
5.1. Evaluation of the 15 databasesā¦ā¦ā¦ā¦.
17/03/2016 Pondicherry University 20
21. 5.Applying the multi-criteria evaluation approach
5.1. Evaluation of the 15 databasesā¦ā¦ā¦ā¦.
ā¢ Applying the so-called sensitivity matrix (W-
matrix) ,we determined that the high
production volume chemicals CMC
(Chormequat chloride) and ISO (Isoproturon)
have the greatest impact on the Hasse
Diagram.
ā¢ This means that their absence or presence
(coded by 0/1) is most important in this data
analysis.
17/03/2016 Pondicherry University 21
22. 5.Applying the multi-criteria evaluation approach
5.2. Evaluation of single databases
ā¢ Seven databases were so-called single databases, meaning that they consisted of
only one data collection: BID, CIV,GES, HSD, ICS, NCL, OEK.
ā¢ In the ProRank program a subset of objects (databases) can easily be generated,
and in this case we evaluated a data-matrix of a subset of 7 Ć 24
ā¢ Two different types of Hasse Diagrams are given in Fig. 2,the standard diagram
(left) and the bar diagram (right).
ā¢ Bar diagram the bars represent the attributes (chemicals), thus it is easy to
interpret the ā¤ relation.
ā¢ The maximal objects are HSD and GES. However, neither covers all 24 chemicals
and the two databases are not comparable to each other.
ā¢ The HSD database includes 20 chemicals, whereas GES includes only 15 chemicals.
Since the code 0/1 can also be interpreted as a characteristic set function for a
given database.
ā¢ For example, regarding the high production volume chemical ISO (Isoproturon),
the GES database provides data, whereas HSD does not.
ā¢ The BID database, the minimal object.
17/03/2016 Pondicherry University 22
23. 5.Applying the multi-criteria evaluation approach
5.2. Evaluation of single databasesā¦ā¦ā¦ā¦..
17/03/2016 Pondicherry University 23
24. 5.Applying the multi-criteria evaluation approach
5.3. Evaluation of databases with respect to their origin
ā¢ We identified three classes of databases according to their origin:
ā¢ EU databases: CIV, ESI, GES, NCL, OEK.
ā¢ US databases: BID, ECO, EFD, ENV, EXT, HSD.
ā¢ International databases: CEX, EHC, ICS, OIH.
ā¢ we compared data availability of the five EU databases and the six US databases
(Fig. 3). There were no equivalent objects in either of the two test-sets.
ā¢ In the Hasse Diagram of the European databases, a total of nine comparability's
are demonstrated whereas only one incomparability is shown, namely CIV || NCL
ā¢ This incomparability is caused by the HPV chemical ISO (Isoproturon) and by the
pharmaceutical PHE (Phenazone). The CIV database provides data for Phenazone
(code =1) whereas NCL does not, the reverse is the case for Isoproturon
ā¢ The maximal object of the European databases, ESI, provides information for 23
out of the 24 chemicals
ā¢ The minimal object of the European data-bases OEK provides information on only
seven out of 24 chemicals.
17/03/2016 Pondicherry University 24
25. 5.Applying the multi-criteria evaluation approach
5.3. Evaluation of databases with respect to their originā¦..
17/03/2016 Pondicherry University 25
26. 5.Applying the multi-criteria evaluation approach
5.3. Evaluation of databases with respect to their originā¦..
ā¢ The Hasse Diagram of the six US databases shows 10 comparability's and five
incomparability's.
ā¢ The maximal objects of the US databases have HSD 20 out of 24 chemicals and
EFD 22 out of 24 chemicals.
ā¢ The minimal object BID in the US databases covers only two out of 24 chemicals,
and the other
ā¢ minimal database EXT has information on five chemicals out of 24.
ā¢ It should be mentioned that most of the US databases are multiple data collections
(multi-database databases), whereas the entire EU databases (with the exception
of ESI) are single databases.
ā¢ With respect to the evaluation approach, single databases do not give worse
results than multi-database databases.
17/03/2016 Pondicherry University 26
27. 5.Applying the multi-criteria evaluation approach
5.4. Evaluation of databases with respect to the number
of chemicals they contain
ā¢ The chosen test-set of 15 databases was
divided into the following three clusters with
respect to the quantity of information:
ā¢ Databases containing less than 2000
chemicals: BID, EXT,EHC, ICS.
ā¢ Databases containing 2001-10,000 chemicals:
CIV, ECO,GES, HSD, NCL, OEK, OIH.
ā¢ Databases containing more than 10,001
chemicals: CEX,EFD, ENV, ESI.
17/03/2016 Pondicherry University 27
28. 5.Applying the multi-criteria evaluation approach
5.4. Evaluation of databases with respect to the number
of chemicals they containā¦..
ā¢ The largest group of databases is those covering from 2001
to 10,000 chemicals; for this group of seven databases, we
elaborate the Hasse Diagram shown as follow
17/03/2016 Pondicherry University 28
29. 5.Applying the multi-criteria evaluation approach
5.4. Evaluation of databases with respect to the number
of chemicals they containā¦..
ā¢ The Hasse Bar Diagram is structured into four levels, and shows 11
comparability's against 10 incomparability's. As discussed above,
HSD and GES are incomparable with each other.
ā¢ The incomparability of the pair ECO and NCL is induced by
chemicalsā subsets, differing with respect to Isoproturon (ECO
coded by 0, NCL by 1), and chemicals {CAR,CLO, DAP, EES, PHE}
where ECO has the code 1 and NCL 0.
ā¢ By elimination of the objects CIV and ECO, two hierarchies appear.
The set {CIV, ECO} is called an articulation set (in generalization of
articulation point).
ā¢ The subset of {HSD, OIH} has a peculiar data structure by which it is
separated from the other databases {GES, NCL, OEK}.
ā¢ two maximal objects are HSD and GES. (HSD) comprises 4800
chemicals, whereas Databases (GES) comprises 8000 chemicals.
17/03/2016 Pondicherry University 29
30. 6.conclusions
ā¢ We analysed the quality of databases with respect to the complete
data-matrix comprising 15 databases and 24 chemicals.
ā¢ Subsets of the set of databases (objects) were investigated in three
independent steps: the type of database, the origin of database
and the number of chemicals contained in each database.
ā¢ The Hasse Diagrams generated by the software package ProRank
show the most important and least important databases.
ā¢ Furthermore, the comparability's and incomparability's are
demonstrated and interpreted by examples. The bar diagrams give
a concrete insight into the partial order method, and the
importance of an articulation set is explained.
17/03/2016 Pondicherry University 30