Upcoming SlideShare
×

# using fuzzy logic in educational measurement

563 views
403 views

Published on

Published in: Education, Technology, Business
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
563
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
17
0
Likes
0
Embeds 0
No embeds

No notes for slide

### using fuzzy logic in educational measurement

2. 2. 130 Very Incompetent Evaluation and Research in Education Moderately Incompetent More or less Competent Moderately Competent Very Competent Figure 1: Semantic ambiguities between levels of competence before she is selected from the population. Once she is selected, the probability is gone. Cox (1994) adds that probability is an uncertainty associated with time. Once a predicted event takes place, probability disappears. He illustrates his point by the example ‘there is a 50% chance of rain tomorrow’. If we wait until tomorrow, it may rain or it may not. Subsequently the uncertainty associated with probability disappears. Smithson considered a simple case of a binary outcome setup (say A and B). He argued that if we say the probability of an event A occurring is P(A) = 1/2 , this could mean that we know A and B are equally likely, or it could mean that we are utterly ignorant of the likelihood of A or B. In addition probability is incapable of capturing any ambiguity or vagueness about the event. In the rain example, there still remains some ambiguity about whether the rain is a mist, light, moderate or heavy. These are fuzzy uncertainties which can be dealt with by FL. FL can deal with the characteristics/properties of individual cases. If we analyse any concept (e.g. speed, height, competence etc.) we usually notice that it is made up of a number of sub-states that stretch from a clear non existence of a characteristic to a clear existence of the characteristic. Throughout the continuum we may identify various semantic labels that represent various areas of the continuum (e.g. very incompetent, moderately incompetent, more or less competent, moderately competent and very competent). In general, there are areas of ambiguity/overlap between the various semantic labels as shown in Figure 1. These overlaps occur naturally and they reflect a flexibility in the language. FL describes properties that have continuously varying values by associating partitions of these values with a semantic label (Cox, 1994). One of the main strengths of FL is that it allows the semantic partitions to overlap as shown above. This is a significant improvement on traditional probability which identifies a group as either having or not having a particular characteristic. Fuzziness is a measure of how well a value/measure (e.g. 6 feet) conforms to a semantic ideal (e.g. tall). Hence if a list of criteria for measuring competence in a particular area is agreed, fuzziness becomes the measure of how well a particular value associated with these criteria reflects a semantic ideal (e.g. very competent). FL refers to the degree of membership or number of votes that a particular
3. 3. Using Fuzzy Logic in Educational Measurement 131 Slow Fast Degree of membership 1 0 30 35 40 45 50 55 60 65 70 Speed Figure 2: Perceptions of two categories of speed value in a fuzzy set has attracted to determine the degree of compatibility between this value and the concept underlying the set. In order to illustrate this idea, consider Figure 2. Suppose we ask a group of people to give the minimum speed of a car before being considered Fast. The answers could range from 35 miles per hour to 70 miles per hour. Thus if we consider a car ‘A’ running at a speed of 45 miles per hour, we will find that it has a degree of membership of {0.20} which means that it has a low level of compatibility with the label Fast. We may draw a second membership curve based on the instruction ‘give the maximum speed that makes a car run slow’. This time the answers may range between 30 miles an hour to 60 miles an hour. Now we may find that car ‘A’ has achieved a degree of membership of {0.65} which means that it has a moderate level of compatibility with the label slow. In probability theory the car is either Fast or Slow and we are not completely sure whether it is Fast or Slow. Also in probability theory, if the probability (or chance) that car ‘A’ is Fast is 20% then the probability that it is not fast must be 80%. In fuzzy logic the membership degree defines to what extent the car is considered Fast and Slow and related membership does not have to total 1 (Turban, 1992). Origin of the Difference Between Probability Theory and Fuzzy Logic In dealing with many aspects of our daily life we recognise that many phenomena, situations and issues are imprecise. Yet this does not prevent us from solving a lot of the problems that face us using this imprecision. In fact by recognising the imprecise nature of certain phenomena we improve our understanding of the situations we deal with. We use words such as high, low, moderate, adequate, extremely, large, tall, adult, mature, competent etc. to deal with problems ranging from law, financial management, engines, to psychology and education. Yet such expressions are incompatible with traditional quantitative modelling and information system design which generally require an either/or response to a question. However it is only making a small step to argue
4. 4. 132 Evaluation and Research in Education that if we can ‘reason’ using such imprecise information, so should our machines (Cox, 1994). In opposition to our world of greyness we find that much of our science, maths, logic and, consequently, culture is based on a black or white interpretation of our world (Kosko, 1994; Cox, 1994; Hisdal, 1986). Every statement is true or false, every law either applies or does not apply. The origin of such perception was traced back to Greek philosophy and in particular Aristotle’s binary logic ‘A or not A’. This basic contradiction between reality and science has been of concern amongst eminent scientists such as Einstein (Kosko, 1994) who stated: ‘ So far as the laws of mathematics refer to reality, they are not certain. And so far as they are certain, they do not refer to reality’. The basic difference could be stated as follows. Formal logic and computer programming statements are all true or all false. In other words: they correspond to either 1 or 0. Yet statements which directly refer to the world are very rarely that clear-cut. Their truth generally lies between total truth and total falsehood (i.e. between 1 and 0). Note that although rating scales are being used on a regular basis, they generally ‘constrain’ a respondent to choose one characteristic/quality among others. They do not consider the grey area between the various characteristics. In addition, and as mentioned above, the responses are analysed in terms of population trends. They do not tell us about the individuals as such. Smithson (1988) argued that many researchers have relied almost exclusively on statistical models and methods for the quantitative analysis of human behaviour. He adds: Because they invoke stochastic determinism, such models are incapable of incorporating human intentionality, purposive choice, or agency along with constraints and influences on behaviour. He argues that fuzzy logic and possibility theory (as opposed to probability theory used in conventional statistics) offer an alternative framework which is compatible with psychological explanations that permit choice under partial and uncertain constraints. He criticises the General Linear Model (which includes the most commonly used statistical methods such as ANOVA, regression models, factor analysis etc.) for yielding a stochastically deterministic view of human behaviour. Smithson (1988) says that in this model behaviour which is not in accordance with a one-to-one prediction is described as random behaviour. However he stresses that the aim behind highlighting the weaknesses of statistical models is not to replace them but rather ‘to permit the articulation and investigation of interpretations that cannot be handled by the statistical perspective’. Fuzzy logic is generally associated with Lotfi Zadeh, a professor at the University of California, who wrote the seminal paper ‘Fuzzy sets’ in 1965. This paper built on traditional set theory to resolve difficulties associated with the rigid Aristotelian ‘all or nothing’ situation. Fuzzy World, Fuzzy Logic Consider the following situation: An experienced assessor is given a number of assignments completed by a student, each dealing with one subcomponent
5. 5. Using Fuzzy Logic in Educational Measurement 133 (covering either a skill or knowledge) of a particular area of competence. This assessor is asked to make a decision about whether the student has mastered the subcomponent or not. If the subcomponent has been mastered, the assessor should award one mark, or pass; if not, a zero, or fail is allocated as shown in Figure 3. However it is very likely that the performance in the assignment is far from being either pass or fail. So it seems that the assessor will have to do some rounding up to be able to fit the binary system (1 or 0) to the candidate. In fact 0 1 Figure 3: Bipolar assessment the 1 or 0 alternatives are only two of a multitude of possibilities (highlighting various levels of fuzziness or greyness in comparison to the two extreme alternatives) along the continuum between the two polarised positions (0 and 1). It is very likely that a candidate position could be 1/2 which means it is situated half way between 0 and 1 (i.e. between non-master and master). This situation is referred to by some fuzzy scientists as reflecting a 100% fuzzy entropy (Kosko, 1994). Fuzzy entropy measures the degree of fuzziness of a fuzzy set (such as the set of competent people). In other words, in this case, it is absolutely unclear as to which side of the binary system the candidate should be allocated. The candidate performance deserves neither pass or fail or the opposite might be said: he deserves both a pass and a fail. This situation is unacceptable in conventional Aristotelian logic since it is perceived as a contradiction in terms. However in FL this situation is perfectly acceptable. In fact, in FL the two poles become the two extreme instances among many other possibilities. A candidate may get a 1/4, 1/2 or 3/4 etc. An assessor using the binary system of 1 or 0 is missing very important information about the true competence status of the candidate compared to another assessor using the whole range of possible values in the competence continuum. The more open are the possibilities in the continuum, the more precise becomes the decision. Hence it may be argued that when dealing with non clear-cut situations, precision is more on the side of fuzzy logic than a binary perspective. Consider the scale in Figure 4. An assessor given this scale may decide that the candidate’s ‘performance status’ is not either a pass or a fail but rather may be classified at the 0.8 position of the continuum. This means that although the candidate status is more towards 0 Definitely a fail * 0.1 * 0.2 * 0.3 * 0.4 ½ * 0.5 * 0.6 * 0.7 * 0.8 Neither pass nor fail Figure 4: FL scale for a candidate’s performance representation * 0.9 1 Definitely a pass
6. 6. 134 Evaluation and Research in Education a pass, there are still some elements in his performance which are not totally satisfactory. Another assessor might also show his degree of uncertainty about the result by choosing more than one point on the scale. For instance, an assessor might feel that a particular candidate is somewhere between 0.7 and 1 and therefore ticks all these numbers. Fuzzy logic has developed methodologies which take account of these variations and therefore give a more accurate picture of the reality. It is clear that if the assessment decision involves many aspects of competence, or the joint view of many assessors, the binary approach would lead to a cumulative loss of information. In the following section an account will be given of the rise of interest in portfolio assessment. The subsequent part will look at a particular FL procedure and show how it can be applied in portfolio assessment. Vocational Qualification and Portfolio Assessment In the UK portfolio assessment is becoming more and more popular as a valid means to test candidates achievement/competence. This tendency became particularly pronounced in vocational education after the introduction of National Vocational Qualifications (NVQs). These were introduced to resolve a supposed crisis in vocational education. This was due to the fact that the system was described as overly complex and chaotic with an urgent need for reform. In particular the standards of occupational competence were in need of clear specification so that there would be no doubt about requirements in terms of skills and corresponding assessment procedures to achieve a particular award. Following two governmental reports (MSC, 1981; 1986) the problem was to be resolved by achieving two targets: (1) to develop valid, reliable and easily accessible nationally recognised qualifications, and (2) to rationalise the links and progression within and between occupational areas so as to avoid duplication of effort by candidates who might take different qualifications from different Awarding Bodies. The task assigned for this purpose was to be shared between the Department of Employment and a newly-created National Council for Vocational Qualifications 1 (NCVQ) (in Scotland, this task was assigned to the Scottish Vocational Education Council). The Department of Employment’s role was to help redefine standards of competence pertaining to the various occupational areas. The statements of competence were to be specified in outcome terms, that is ‘what must a candidate be able to do to be deemed competent in a particular occupational area’. These standards were to show a clear path of progression between a lower and a higher level of a vocational qualification. The main sources for this information were to be industry representatives who form an industry Lead Body (LB) for each industrial area and determine the competence standards. NCVQ, which is perceived as the main government agency for implementing changes in vocational education, was established in 1986. Its remit is to accredit (give the seal of approval to) newly developed National Vocational Qualifica-
7. 7. Using Fuzzy Logic in Educational Measurement 135 tions (NVQs) based on the standards determined by the LBs. NCVQ needs to be satisfied that the assessment procedure designed by bodies awarding the qualification (the awarding bodies) together with the infrastructure which supports the system of assessment are meeting its criteria. NCVQ recognises two categories of portfolio evidence that may lead to accreditation: (1) past evidence (e.g. products, previous employer’s reports and qualifications). The assessment of this evidence became known as APL (or Assessment of Prior Learning) or APA (Assessment of Prior Achievement). (2) current evidence in the form of a portfolio of evidence (e.g. assignment, products etc.). Although portfolio evidence in the sense of current evidence could be found at any level of the educational system, the advent of NVQs has extended the term to ‘any material (past and current) which is relevant and portable’ (Fourali, 1994a). Hence because NVQs have clearly defined standards of competence and a broader definition of portfolio evidence than other more ‘academic’ qualifications they become a prime candidate as a test bed for FL. Apart from the difficulties of ‘covering all criteria’ for the purpose of assessment there was the added difficulty associated with rewarding candidates who have just missed the required criteria as opposed to those who were far off the target. These assessment issues may be helped by recognising the fuzzy nature of assessment. In any case and in spite of these difficulties, criterion referenced assessment (or some version of it) is still perceived as the way forward at a time when ‘fitness for purpose’ is perceived as the guiding criterion for the quality of educational ‘products’. Using Fuzzy Logic in Portfolio Assessment: An Illustration In portfolio assessment, the evidence presented by a candidate is matched to the prescribed standards of competence defined in achievement outcome terms. Subsequently an assessor will determine whether the portfolio evidence is adequate or not to allow the candidate to obtain a certificate for a unit of competence (i.e. a work duty within the qualification which is sought out) or the whole qualification. However, different portfolio assessors may have different views on the adequacy of the evidence provided by a candidate. This means that they will have a different perception of the candidate’s level of competence based on different competence ‘standards’ or, even, intuitions. This is true even if there is a prototype portfolio which may be consulted as it is almost impossible for written advice to cover all possible alternatives. In any case fuzziness will remain irrespective of how complex the advice is. Hence, if we ask an assessor to deliberate on the competence status of a candidate, based on a portfolio of evidence, he/she is not always clear where exactly a candidate fits in a competence continuum (ranging between ‘0’, for definitely incompetent, to ‘10’, for definitely competent — see Figure 5). The assessor’s natural tendency would probably be to think of a range of possibilities where a candidate’s competence status could be located. Moreover he/she will
8. 8. 136 Evaluation and Research in Education Table 1: Criteria for portfolio assessment Section of Portfolio Authenticity Currency (Practices & Equipment) Retention Relevance Sufficiency (quantity) Variety (contexts) Account of Experience Witness Testimony Products (including photos audios etc) Certificates and Awards have to identify this range of possibilities in terms of a list of criteria, such as those shown in Table 1, before deciding if the candidate is competent overall. The assessor’s decision regarding the position of the candidate on the competence continuum will depend on how satisfactory the evidence was when evaluated in the light of each criterion. However as he/she is not always sure exactly where to locate the candidate (there is always the benefit of the doubt!), FL will allow and encourage him/her to identify the range of possible values that may be acceptable to him/her. Definitely incompetent 0 * 1 Neither Nor * 2 * 3 * 4 * 5 Definitely competent * 6 * 7 * 8 * 9 10 Figure 5: Fuzzy rating scale Figure 6 is an example of the possible values which may be assigned to a candidate’s portfolio for each of the sections of portfolio identified above (first column) in terms of the ‘Authenticity’ perspective. The instruction given to the assessor could be in the form: ‘please represent the degree of authenticity of the candidate’s evidence by ticking the corresponding box on the satisfaction scale. Then decide on the minimum position you will accept and the maximum position you will accept on the scale as representing competence and extend the rating accordingly’. Thus, an assessor might think that the candidate’s evidence, for criterion 2 (see Figure 6), is fairly but not totally satisfactory. Hence he/she may start by allocating a rating of 8 on the scale. Then he/she realises that a lower rating of 7 is also acceptable to him/her . However he/she would not accept a rating higher than 8. Hence a rating of 7 is added. FL enables the assessor to make use of all the information available in Figure
9. 9. Using Fuzzy Logic in Educational Measurement Extremely unsatisfactory 0 * 1 137 Neither Nor * 2 * 3 * 4 Extremely satisfactory * 5 * 6 * 7 * 8 * 9 10 Scale Criteria 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 Figure 6 Fuzzy rating ascribed to each section of portfolio for the ‘Authenticity’ quality 6 and thereby calculate an ‘optimal’ index. Because this index takes into consideration the various uncertainties, the decisions that are reached on this basis would tend to be more reliable. Taking the rating shown in figure 6, one simple way of calculating the optimal index is as follows (see section entitled ‘Improving the Proceedure’ for the justification of the method): 9 votes (ticks) have been cast These votes amount to: (1x4)+(2x5)+(3x6)+(2x7)+(1x8) = 54 This averages out at 54/9= 6. This result is the optimal index for the Authenticity quality. The average of the optimal indices (i.e. sum of indices divided by the number of indices) will be the competence status of the individual. Thus if the obtained optimal indices for each of the 6 qualities are as follows: Authenticity: 8 Currency: 9 Retention: 6 Relevance: 7 Sufficiency: 8 Variety: 4 The competence status is: 8 + 9 + 6 + 7 +8 + 4 =7 6 If a candidate obtains the above index representing his/her competence status, this will mean that the assessor is fairly but not totally satisfied that a candidate has presented all evidence to demonstrate his/her competence. This index (i.e.
10. 10. 138 Evaluation and Research in Education 7) is only two steps away from the middle of the scale (i.e. 5) which represents the total uncertainty/fuzziness that the assessor holds with regard to the competence status of the candidate. Thus if such a method is adopted, Examining and Awarding Bodies might need to set a minimum competence status index of no less than say ‘8’, to guarantee a pass, so as to ensure a certain degree of reliability in the decision which gives more credibility to the resulting qualifications. The index might also give us some ideas about the degree of leniency or severity of an assessor. Thus whilst an assessor might give a candidate a pass as soon as the overall index is 6, another assessor might think it is not wise to do so until he/she has a minimum of 8. In addition the leniency/severity may also be verified by comparing the indices obtained by two assessors based on the evidence offered by the same candidate. This information should help in the standardisation of assessors decisions. Our example also shows that it is possible to represent in a very flexible manner the assessors’ views about any aspect of competence. As it is generally very difficult for an assessor to keep track of all his/her reservations and his/her professional ‘gut feelings’ in a manner which is as rational as possible, fuzzy methodology offers a very valid solution. It is clear that a normal procedure restricted to an either/or decision could pass a candidate whose overall competence status might be very close to the absolute fuzziness level. This situation may arise when an assessor overlooks previous reservations about the performance evidence because the candidate happens to have shown some very good results in some particular aspects of the assessed area of competence. The opposite may also happen if a candidate is failed because of some reservations which may prevent an assessor from evaluating more objectively the overall performance. This procedure can be very useful when assessment situations involving group consensus are required. The procedure can be used to take into account all the group members’ ratings regarding various portfolios. The obtained optimal index will be the best compromise of all the group members’ views. The required calculations are very simple and may be carried out by hand. However the use of a spreadsheet may facilitate the calculations as this will enable information to be entered (and indices calculated) as the views are expressed. However this procedure assumes that the weights for the six criteria are the same. If this is not the case then these weights should be determined using FL to get more agreement between various assessors. In addition assessors may agree that an index below a certain point for a particular criterion for any section of portfolio may automatically disqualify the candidate from achieving a module. Grading issues may also be resolved by using more than one scale. For instance, once a candidate has met the criteria for a satisfactory basic competence status (e.g. to obtain a pass), the assessor may consider him/her for credit or distinction by adding two extra scales to the assessment process as seen in Figure 7. Candidates would only be considered for the subsequent scales after they have met the requirements for the previous scales.
11. 11. Using Fuzzy Logic in Educational Measurement 139 Definitely incompetent Definitely competent Definitely no credit Definitely deserves credit Definitely no distinction Definitely deserves distinction Scale 1 Scale 2 Scale 3 Figure 7: Fuzzy scales for grading decisions This procedure is used when strict assessment criteria have been agreed. However it is likely that those criteria are not clear. Such situations may arise when the area of competence is not very clear (e.g. when assessors are asked to assess creativity). In this case FL is also appropriate since it allows assessors to locate more flexibly the candidate’s position in the competence continuum (e.g. creativity continuum). FL is particularly relevant when researchers are faced with phenomena that involve a smooth progressive change. Let’s consider an area which up to now has drawn the main benefits from the procedure: automated systems. Consider a non-fuzzy fan motor whose speed is a function of input temperature (Viot, 1993). The current supplied to the fan motor is controlled by four sets of temperature: cold, cool, warm and hot. Each of these triggers a different speed of the fan. The problem arises when input temperatures move between set boundaries. This leads to corresponding abrupt changes (i.e. sudden changes of speed of the fan). However in a fuzzy regulated system, as the input temperature changes, a corresponding smooth change in the fan speed takes place regardless of inputs crossing set boundaries. FL has great potential in improving diagnostic/formative assessment. It gives a clearer idea about a candidate’s position in a competence continuum. In a criterion referenced context candidate performance could be linked to a training programme pitched at an optimum level. This level is determined by referring to the views of assessors about candidate abilities related to a particular area of competence. Thus in order to assess text comprehension both content complexity and cognitive processing requirements could be considered to elicit expert views as to the adequacy of particular test items for a particular level in a particular area. Subsequently fuzzy indices could be calculated to give a more accurate picture of the candidate in the competence continuum and allow a more customised training programme to be drawn. In a Vygotskian sense the