The document discusses a study that was conducted to validate test papers used at Saint Paul School of Business and Law and relate the validity of the test papers to student performance. 50% of test papers from the previous term were analyzed by experts using a checklist. The validity of test papers was found to have a moderately small positive correlation with student performance. Based on the results, guidelines for standardized test construction were formulated to improve the quality of assessment at the institution. The guidelines differentiate requirements for theory-based versus skill-based subjects. The study aims to establish best practices and standards for test development and administration at the school.
The document discusses critiquing and revising an assessment rubric for an 8th grade literacy portfolio. It finds issues with the reliability and validity of the original rubric. Regarding reliability, the rubric uses vague terms that could be interpreted differently. For validity, the rubric does not accurately reflect students' learning achievements. Suggested revisions include making the standards and scoring criteria more clear and specific. The portfolio assessment itself is found to have benefits like providing ongoing formative feedback, but the associated rubric requires improvement to properly measure student progress.
The document discusses the concepts of validity and reliability in educational assessment. It defines validity as the accuracy of inferences made based on assessment results, particularly whether an assessment truly measures the intended learning outcomes. There are three main types of validity evidence: content, criterion, and construct-related evidence. An assessment is valid if it represents the targeted learning content and yields results that correlate with other measures of the same skills. Threats to validity include unclear instructions, inappropriate test items, and other technical flaws. Maintaining validity requires careful test construction and alignment between assessments, curriculum, and instruction.
The effectiveness of evaluation style in improving lingual knowledge for ajlo...Alexander Decker
1. The study aimed to investigate how evaluation style can improve linguistic knowledge for female students at Ajloun College.
2. The sample consisted of 45 students divided into 3 groups: one evaluated by the teacher, one evaluated by classmates, and one self-evaluated.
3. The results showed statistically significant differences between the groups, with the group evaluated by classmates performing best, followed by the self-evaluated group, suggesting evaluation style can impact linguistic knowledge.
8. brown & hudson 1998 the alternatives in language assessmentCate Atehortua
This document discusses different types of language assessments that teachers can use, categorized into three broad groups: selected-response, constructed-response, and personal-response assessments. Selected-response assessments include multiple choice, true-false, and matching questions that test receptive skills like reading and listening. Constructed-response assessments require students to produce short answers and include fill-in-the-blank, short answer, and performance tasks. Personal-response assessments involve more subjective methods like conferences, portfolios, self-assessment, and peer assessment. The document explores the advantages and disadvantages of each type and how teachers can choose assessments based on validity, reliability, feedback, and using multiple sources of data.
Measurement and evaluation in educationCheryl Asia
The document discusses measurement and evaluation in education. It defines measurement as quantifying individuals' attributes and skills, and evaluation as making judgements about something using criteria and standards. Evaluation involves systematically determining how well educational objectives are achieved by learners. Key points include:
- Evaluation assesses educational objectives, programs, teachers, and learners.
- Evaluation implies a systematic, continuous, and comprehensive process using various techniques.
- Evaluation assumes instructional objectives have been previously identified.
- Tests, quizzes, and other instruments are used to obtain information for evaluation purposes.
Here are my slides for my report for my Advanced Measurements and Evaluation subject on Educational Measurement and Evaluation. #Polytechnic University of the Philippines. #GraduateSchool
Negative marking on multiple choice tests deducts points for incorrect answers to discourage guessing, though students dislike it, and it is not used for descriptive exams. Measurement and evaluation are processes for quantifying individual achievement through assigning values or symbols to observable phenomena and determining learning outcomes through qualitative assessment. Various types of tests, measures of central tendency and variability, grading philosophies, and key terminology are discussed for educational measurement and evaluation.
The document discusses various types of educational ability tests, including survey battery tests, diagnostic tests, readiness tests, and cognitive ability tests. It provides examples of specific tests for each type and outlines their purpose, structure, reliability, and validity. Key points covered include using these tests to determine student and school learning, identify learning problems or giftedness, and assess readiness for higher grade levels or college.
The document discusses critiquing and revising an assessment rubric for an 8th grade literacy portfolio. It finds issues with the reliability and validity of the original rubric. Regarding reliability, the rubric uses vague terms that could be interpreted differently. For validity, the rubric does not accurately reflect students' learning achievements. Suggested revisions include making the standards and scoring criteria more clear and specific. The portfolio assessment itself is found to have benefits like providing ongoing formative feedback, but the associated rubric requires improvement to properly measure student progress.
The document discusses the concepts of validity and reliability in educational assessment. It defines validity as the accuracy of inferences made based on assessment results, particularly whether an assessment truly measures the intended learning outcomes. There are three main types of validity evidence: content, criterion, and construct-related evidence. An assessment is valid if it represents the targeted learning content and yields results that correlate with other measures of the same skills. Threats to validity include unclear instructions, inappropriate test items, and other technical flaws. Maintaining validity requires careful test construction and alignment between assessments, curriculum, and instruction.
The effectiveness of evaluation style in improving lingual knowledge for ajlo...Alexander Decker
1. The study aimed to investigate how evaluation style can improve linguistic knowledge for female students at Ajloun College.
2. The sample consisted of 45 students divided into 3 groups: one evaluated by the teacher, one evaluated by classmates, and one self-evaluated.
3. The results showed statistically significant differences between the groups, with the group evaluated by classmates performing best, followed by the self-evaluated group, suggesting evaluation style can impact linguistic knowledge.
8. brown & hudson 1998 the alternatives in language assessmentCate Atehortua
This document discusses different types of language assessments that teachers can use, categorized into three broad groups: selected-response, constructed-response, and personal-response assessments. Selected-response assessments include multiple choice, true-false, and matching questions that test receptive skills like reading and listening. Constructed-response assessments require students to produce short answers and include fill-in-the-blank, short answer, and performance tasks. Personal-response assessments involve more subjective methods like conferences, portfolios, self-assessment, and peer assessment. The document explores the advantages and disadvantages of each type and how teachers can choose assessments based on validity, reliability, feedback, and using multiple sources of data.
Measurement and evaluation in educationCheryl Asia
The document discusses measurement and evaluation in education. It defines measurement as quantifying individuals' attributes and skills, and evaluation as making judgements about something using criteria and standards. Evaluation involves systematically determining how well educational objectives are achieved by learners. Key points include:
- Evaluation assesses educational objectives, programs, teachers, and learners.
- Evaluation implies a systematic, continuous, and comprehensive process using various techniques.
- Evaluation assumes instructional objectives have been previously identified.
- Tests, quizzes, and other instruments are used to obtain information for evaluation purposes.
Here are my slides for my report for my Advanced Measurements and Evaluation subject on Educational Measurement and Evaluation. #Polytechnic University of the Philippines. #GraduateSchool
Negative marking on multiple choice tests deducts points for incorrect answers to discourage guessing, though students dislike it, and it is not used for descriptive exams. Measurement and evaluation are processes for quantifying individual achievement through assigning values or symbols to observable phenomena and determining learning outcomes through qualitative assessment. Various types of tests, measures of central tendency and variability, grading philosophies, and key terminology are discussed for educational measurement and evaluation.
The document discusses various types of educational ability tests, including survey battery tests, diagnostic tests, readiness tests, and cognitive ability tests. It provides examples of specific tests for each type and outlines their purpose, structure, reliability, and validity. Key points covered include using these tests to determine student and school learning, identify learning problems or giftedness, and assess readiness for higher grade levels or college.
Achievement tests measure what students have learned after a period of instruction. There are two main types - standardized tests which have uniform procedures and scoring, and teacher-made tests which assess learning in a particular classroom. Standardized tests provide norms and impartial information, while teacher-made tests help evaluate teaching effectiveness but have less accuracy and refinement. Both types of achievement tests are important for measuring student learning outcomes.
This document defines key concepts in educational measurement and evaluation including measurement, evaluation, testing, and the functions and principles of evaluation. It discusses different types of tests and measurements including criterion-referenced and norm-referenced tests. Various measures of central tendency (mean, median, mode) and variability (range, standard deviation) are explained. Different point measures like quartiles, deciles, and percentiles are also defined. Formulas for calculating measures like the mean, median, and mode from frequency distribution tables are provided.
Criterion and norm referenced evaluationJinto Philip
The document summarizes a departmental seminar on criterion and norm-referenced evaluation. The seminar was presented by Mr. Jinto Philip on August 31, 2010 from 2-3 PM in the Arts Theatre. It discussed the key differences between criterion-referenced tests, which evaluate performance against an absolute standard, and norm-referenced tests, which compare performance to other examinees. Some key differences highlighted were that criterion-referenced tests measure specific skills from a curriculum, while norm-referenced tests measure broad skill areas. Criterion-referenced tests aim to determine if students have achieved skills, while norm-referenced tests rank students against others.
This document discusses measurement, evaluation, and tests in physical education. It defines these terms and explains their interrelationship. Measurement involves using tests to collect quantitative data about traits, while evaluation judges the worth of those measurements. Tests are tools used for measurement. The document also discusses the need for and modern trends in measurement and evaluation, such as increased accountability, emphasis on health-related fitness, and more sophisticated instruments. It explains how measurement and evaluation are important for setting objectives, assessing achievement, research, and responding to current issues in physical education.
THESIS- Making the Results and Discussions portionelio dominglos
- The attitudes of farmers toward organic farming varied across different aspects of organic farming. Farmers had a very positive attitude toward the use of standards, proper harvesting and marketing, and advantages/disadvantages of organic farming. However, their attitudes were not as positive toward organic farming commodities, effectiveness, and importance.
- Statistical analysis confirmed that the farmers' attitudes differed significantly. This difference can be attributed to varying levels of patience, talent, honesty, and goals among individual farmers.
- It is concluded that the farmers have a mixed view of organic farming, being more convinced by some aspects than others. They are recommended to attend educational sessions to improve their knowledge and attitudes regarding all aspects of organic farming.
Developing an authentic assessment model in elementary school science teachingAlexander Decker
1) The document describes the development of an authentic assessment model for elementary school science teaching in Indonesia.
2) It aims to develop a model that can reveal student creativity and potential while developing character and fulfilling good assessment requirements.
3) The developed model includes performance assessments with tasks and rubrics, character assessments with criteria, and a scientific attitude questionnaire.
Standardized tests are widely used assessments that provide consistent evaluation of student performance compared to national norms. They are designed to be administered uniformly under controlled conditions to allow for comparable scoring. While criticized, standardized tests currently remain an important part of assessment in most school districts. They can measure basic knowledge and skills, but have limitations in evaluating higher-order thinking, creativity, or special needs. Both advantages like ease of administration and disadvantages like narrow focus are discussed.
A standardized test is any test where all test takers answer the same questions in a consistent manner that is scored uniformly. There are two main types - norm-referenced tests compare performance to others, while criterion-referenced tests assess performance against a set of objectives. Standardized tests can measure achievement, aptitude, or be used for college admissions. Scores are reported using raw scores, percentiles, or stanines.
Topic: Planning for Assessment
Student Name: Sarang
Class: B.Ed. Hons Elementary Part (II)
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
The document discusses the historical development of language assessment in Malaysia and changing trends. It describes 4 stages of development: pre-Independence, post-Razak Report, post-RahmanTalib Report, post-Cabinet Report, and current reforms under the Malaysia Education Blueprint. Key changes include establishing a common exam system, introducing school-based assessment, and shifting the focus to higher-order thinking skills. Contributing factors to changing trends include education reforms, recommendations from government reports, and poor performance on international assessments. The role of assessment in education is increasingly seen as integrated with instruction rather than just auditing learning.
Classroom testing: Using tests to promote learningRichard P Phelps
Among the most effective educational interventions are those with testing components. Testing can be used effectively to promote learning, but that means using it more often in spaced, shorter bursts. Optimally, teachers should test their students on material at the moment they begin to forget it--the more discrete the subject matter (e.g., mathematics) the shorter the time interval between tests.
Assessing assessment literacy of science teachers in public secondary schools...Alexander Decker
This document summarizes a study that assessed the assessment literacy of science teachers in public secondary schools in Ekiti State, Nigeria. The study found that the majority of teachers had low knowledge of assessment techniques. However, female teachers and those with more experience had higher assessment literacy than male teachers and those with less experience. The study recommended regular seminars and workshops to improve science teachers' knowledge and use of various assessment practices.
This document discusses principles of language assessment, including practicality, validity, reliability, authenticity, and washback/backwash. It provides definitions and examples for each principle. Practicality considers cost, time, administration, and scoring of a test. Validity includes content, criterion, construct, consequential, and face validity. Reliability examines consistency of scores and includes student-related reliability, rater reliability, test administration reliability, and test reliability. Authenticity aims to correspond test tasks to real-world language use. Washback/backwash refers to how a test impacts teaching and learning.
Discussion question for meeting two language assessmentManasApintamon
This document discusses the criteria for effective language tests: practicality, reliability, validity, authenticity, and washback. It provides examples of how tests can violate these criteria. For reliability, it discusses factors like rater reliability and test administration reliability. It distinguishes reliability from validity. For validity, it discusses how tests can violate content validity, criterion validity, and construct validity. It states authenticity is important so language tests reflect real-world language use. It defines washback as the effect of tests on teaching and learning, and says washback can be positive or negative depending on the test. The document instructs groups to analyze examples of tests violating reliability, validity or authenticity criteria.
The document discusses test, measurement, assessment, evaluation and testing in education. It defines each term and provides examples. A test is used to determine a student's abilities and knowledge in a given area, through methods like multiple choice questions or spelling tests. Measurement is the process of obtaining numerical descriptions of how much a learner has learned, using tools like tests and rating scales. Assessment involves gathering and organizing various types of data on a student's knowledge, skills and beliefs to inform decision making. Evaluation is the process of making judgments about the worth or value of a student's performance by establishing objectives and comparing data to those objectives. The key difference is that a test is one form of assessment, while assessment and evaluation involve broader analysis of a
This document discusses strategies for assessment in physical science education. It begins by defining classroom assessment and its purposes of informing teaching, improving learning, and monitoring student progress. It describes using a variety of tools for assessment, including questions, observations, and student self-assessments. The document then discusses continuous and comprehensive evaluation (CCE), which assesses all aspects of a student's development over time. It also discusses using grading systems instead of marks to classify student performance and the benefits and limitations of different grading approaches. The document provides details on constructing achievement tests, including planning, designing, and writing test items.
This document discusses criterion-referenced language testing and compares it to norm-referenced testing. It defines NRTs as tests that compare students' performances to others, while CRTs provide absolute measures of competence without comparing to other students. CRTs were developed in response to problems with NRTs like teaching/testing mismatches, lack of instructional sensitivity, and lack of curricular relevance. While NRTs and CRTs share aspects of test construction, CRTs focus more on teaching/testing matches and instructional sensitivity. The document also discusses issues in defining language proficiency and communicative competence, and challenges in developing and analyzing CRTs.
The document discusses various concepts related to evaluation including measurement, assessment, evaluation techniques and tools.
It defines measurement as assigning numerical values to traits using specific rules. Assessment involves making inferences about characteristics based on collected information. Evaluation is a systematic process to determine if objectives are achieved and to what degree.
Some key techniques discussed are quantitative methods like oral, written and practical tests, and qualitative methods like observation and self-reporting. Tools include tests which can be teacher-made, standardized, individual or group-based. Different types of test items like objective and essay questions are also explained.
The document discusses the negative impact of Pakistan's Higher Secondary School Certificate exam on English language teaching and learning. It summarizes research showing high-stakes exams can negatively influence what and how teachers teach and students learn, known as washback effect. The author describes their experience teaching English in Pakistan, where students prioritized exam preparation over attending regular classes. They argue the exam system has led to widespread negative washback in Pakistan at both individual and societal levels.
01 introducción a la evaluación del aprendizaje de idiomasY Casart
This document discusses key concepts in language assessment including measurement, testing, assessment, and evaluation. It defines these terms and distinguishes between them. It also categorizes different types of tests according to their purpose (e.g. placement, diagnostic), framework (e.g. criterion-referenced, norm-referenced), scoring (e.g. objective, subjective), impact (e.g. high-stakes, low-stakes), and skills involved (e.g. integrative, discrete-point). The relationships between measurement, tests and evaluation are explored.
Effect of scoring patterns on scorer reliability in economics essay testsAlexander Decker
1) The study investigated the effect of different scoring patterns on scorer reliability when grading economics essay tests.
2) The findings showed that scoring items across all student responses (one question at a time for all students) produced more consistent scores between scorers compared to other patterns like scoring all items for one student.
3) It is recommended that scoring items across all responses be adopted to improve reliability when grading economics essay tests.
The document discusses issues related to implementing school-based assessment programs. It begins by noting the potential benefits of school-based assessment in validity and flexibility but also the need to ensure reliability, quality control, and quality assurance. It then examines five key issues for reliable school-based assessment: providing teachers with training and guidance, developing clear assessment criteria, establishing record keeping and moderation procedures, creating networks for teacher collaboration, and monitoring implementation. The document concludes by emphasizing the importance of ensuring adequate resources, expertise, and oversight when establishing a school-based assessment system.
Achievement tests measure what students have learned after a period of instruction. There are two main types - standardized tests which have uniform procedures and scoring, and teacher-made tests which assess learning in a particular classroom. Standardized tests provide norms and impartial information, while teacher-made tests help evaluate teaching effectiveness but have less accuracy and refinement. Both types of achievement tests are important for measuring student learning outcomes.
This document defines key concepts in educational measurement and evaluation including measurement, evaluation, testing, and the functions and principles of evaluation. It discusses different types of tests and measurements including criterion-referenced and norm-referenced tests. Various measures of central tendency (mean, median, mode) and variability (range, standard deviation) are explained. Different point measures like quartiles, deciles, and percentiles are also defined. Formulas for calculating measures like the mean, median, and mode from frequency distribution tables are provided.
Criterion and norm referenced evaluationJinto Philip
The document summarizes a departmental seminar on criterion and norm-referenced evaluation. The seminar was presented by Mr. Jinto Philip on August 31, 2010 from 2-3 PM in the Arts Theatre. It discussed the key differences between criterion-referenced tests, which evaluate performance against an absolute standard, and norm-referenced tests, which compare performance to other examinees. Some key differences highlighted were that criterion-referenced tests measure specific skills from a curriculum, while norm-referenced tests measure broad skill areas. Criterion-referenced tests aim to determine if students have achieved skills, while norm-referenced tests rank students against others.
This document discusses measurement, evaluation, and tests in physical education. It defines these terms and explains their interrelationship. Measurement involves using tests to collect quantitative data about traits, while evaluation judges the worth of those measurements. Tests are tools used for measurement. The document also discusses the need for and modern trends in measurement and evaluation, such as increased accountability, emphasis on health-related fitness, and more sophisticated instruments. It explains how measurement and evaluation are important for setting objectives, assessing achievement, research, and responding to current issues in physical education.
THESIS- Making the Results and Discussions portionelio dominglos
- The attitudes of farmers toward organic farming varied across different aspects of organic farming. Farmers had a very positive attitude toward the use of standards, proper harvesting and marketing, and advantages/disadvantages of organic farming. However, their attitudes were not as positive toward organic farming commodities, effectiveness, and importance.
- Statistical analysis confirmed that the farmers' attitudes differed significantly. This difference can be attributed to varying levels of patience, talent, honesty, and goals among individual farmers.
- It is concluded that the farmers have a mixed view of organic farming, being more convinced by some aspects than others. They are recommended to attend educational sessions to improve their knowledge and attitudes regarding all aspects of organic farming.
Developing an authentic assessment model in elementary school science teachingAlexander Decker
1) The document describes the development of an authentic assessment model for elementary school science teaching in Indonesia.
2) It aims to develop a model that can reveal student creativity and potential while developing character and fulfilling good assessment requirements.
3) The developed model includes performance assessments with tasks and rubrics, character assessments with criteria, and a scientific attitude questionnaire.
Standardized tests are widely used assessments that provide consistent evaluation of student performance compared to national norms. They are designed to be administered uniformly under controlled conditions to allow for comparable scoring. While criticized, standardized tests currently remain an important part of assessment in most school districts. They can measure basic knowledge and skills, but have limitations in evaluating higher-order thinking, creativity, or special needs. Both advantages like ease of administration and disadvantages like narrow focus are discussed.
A standardized test is any test where all test takers answer the same questions in a consistent manner that is scored uniformly. There are two main types - norm-referenced tests compare performance to others, while criterion-referenced tests assess performance against a set of objectives. Standardized tests can measure achievement, aptitude, or be used for college admissions. Scores are reported using raw scores, percentiles, or stanines.
Topic: Planning for Assessment
Student Name: Sarang
Class: B.Ed. Hons Elementary Part (II)
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
The document discusses the historical development of language assessment in Malaysia and changing trends. It describes 4 stages of development: pre-Independence, post-Razak Report, post-RahmanTalib Report, post-Cabinet Report, and current reforms under the Malaysia Education Blueprint. Key changes include establishing a common exam system, introducing school-based assessment, and shifting the focus to higher-order thinking skills. Contributing factors to changing trends include education reforms, recommendations from government reports, and poor performance on international assessments. The role of assessment in education is increasingly seen as integrated with instruction rather than just auditing learning.
Classroom testing: Using tests to promote learningRichard P Phelps
Among the most effective educational interventions are those with testing components. Testing can be used effectively to promote learning, but that means using it more often in spaced, shorter bursts. Optimally, teachers should test their students on material at the moment they begin to forget it--the more discrete the subject matter (e.g., mathematics) the shorter the time interval between tests.
Assessing assessment literacy of science teachers in public secondary schools...Alexander Decker
This document summarizes a study that assessed the assessment literacy of science teachers in public secondary schools in Ekiti State, Nigeria. The study found that the majority of teachers had low knowledge of assessment techniques. However, female teachers and those with more experience had higher assessment literacy than male teachers and those with less experience. The study recommended regular seminars and workshops to improve science teachers' knowledge and use of various assessment practices.
This document discusses principles of language assessment, including practicality, validity, reliability, authenticity, and washback/backwash. It provides definitions and examples for each principle. Practicality considers cost, time, administration, and scoring of a test. Validity includes content, criterion, construct, consequential, and face validity. Reliability examines consistency of scores and includes student-related reliability, rater reliability, test administration reliability, and test reliability. Authenticity aims to correspond test tasks to real-world language use. Washback/backwash refers to how a test impacts teaching and learning.
Discussion question for meeting two language assessmentManasApintamon
This document discusses the criteria for effective language tests: practicality, reliability, validity, authenticity, and washback. It provides examples of how tests can violate these criteria. For reliability, it discusses factors like rater reliability and test administration reliability. It distinguishes reliability from validity. For validity, it discusses how tests can violate content validity, criterion validity, and construct validity. It states authenticity is important so language tests reflect real-world language use. It defines washback as the effect of tests on teaching and learning, and says washback can be positive or negative depending on the test. The document instructs groups to analyze examples of tests violating reliability, validity or authenticity criteria.
The document discusses test, measurement, assessment, evaluation and testing in education. It defines each term and provides examples. A test is used to determine a student's abilities and knowledge in a given area, through methods like multiple choice questions or spelling tests. Measurement is the process of obtaining numerical descriptions of how much a learner has learned, using tools like tests and rating scales. Assessment involves gathering and organizing various types of data on a student's knowledge, skills and beliefs to inform decision making. Evaluation is the process of making judgments about the worth or value of a student's performance by establishing objectives and comparing data to those objectives. The key difference is that a test is one form of assessment, while assessment and evaluation involve broader analysis of a
This document discusses strategies for assessment in physical science education. It begins by defining classroom assessment and its purposes of informing teaching, improving learning, and monitoring student progress. It describes using a variety of tools for assessment, including questions, observations, and student self-assessments. The document then discusses continuous and comprehensive evaluation (CCE), which assesses all aspects of a student's development over time. It also discusses using grading systems instead of marks to classify student performance and the benefits and limitations of different grading approaches. The document provides details on constructing achievement tests, including planning, designing, and writing test items.
This document discusses criterion-referenced language testing and compares it to norm-referenced testing. It defines NRTs as tests that compare students' performances to others, while CRTs provide absolute measures of competence without comparing to other students. CRTs were developed in response to problems with NRTs like teaching/testing mismatches, lack of instructional sensitivity, and lack of curricular relevance. While NRTs and CRTs share aspects of test construction, CRTs focus more on teaching/testing matches and instructional sensitivity. The document also discusses issues in defining language proficiency and communicative competence, and challenges in developing and analyzing CRTs.
The document discusses various concepts related to evaluation including measurement, assessment, evaluation techniques and tools.
It defines measurement as assigning numerical values to traits using specific rules. Assessment involves making inferences about characteristics based on collected information. Evaluation is a systematic process to determine if objectives are achieved and to what degree.
Some key techniques discussed are quantitative methods like oral, written and practical tests, and qualitative methods like observation and self-reporting. Tools include tests which can be teacher-made, standardized, individual or group-based. Different types of test items like objective and essay questions are also explained.
The document discusses the negative impact of Pakistan's Higher Secondary School Certificate exam on English language teaching and learning. It summarizes research showing high-stakes exams can negatively influence what and how teachers teach and students learn, known as washback effect. The author describes their experience teaching English in Pakistan, where students prioritized exam preparation over attending regular classes. They argue the exam system has led to widespread negative washback in Pakistan at both individual and societal levels.
01 introducción a la evaluación del aprendizaje de idiomasY Casart
This document discusses key concepts in language assessment including measurement, testing, assessment, and evaluation. It defines these terms and distinguishes between them. It also categorizes different types of tests according to their purpose (e.g. placement, diagnostic), framework (e.g. criterion-referenced, norm-referenced), scoring (e.g. objective, subjective), impact (e.g. high-stakes, low-stakes), and skills involved (e.g. integrative, discrete-point). The relationships between measurement, tests and evaluation are explored.
Effect of scoring patterns on scorer reliability in economics essay testsAlexander Decker
1) The study investigated the effect of different scoring patterns on scorer reliability when grading economics essay tests.
2) The findings showed that scoring items across all student responses (one question at a time for all students) produced more consistent scores between scorers compared to other patterns like scoring all items for one student.
3) It is recommended that scoring items across all responses be adopted to improve reliability when grading economics essay tests.
The document discusses issues related to implementing school-based assessment programs. It begins by noting the potential benefits of school-based assessment in validity and flexibility but also the need to ensure reliability, quality control, and quality assurance. It then examines five key issues for reliable school-based assessment: providing teachers with training and guidance, developing clear assessment criteria, establishing record keeping and moderation procedures, creating networks for teacher collaboration, and monitoring implementation. The document concludes by emphasizing the importance of ensuring adequate resources, expertise, and oversight when establishing a school-based assessment system.
The document discusses various models of curriculum evaluation. It describes several purposes of curriculum evaluation including providing feedback to learners, determining how well objectives are achieved, and improving the curriculum. Curriculum evaluation can take place at the classroom level, where teachers collect data from tools like tests and observations, and at the school or system level using surveys, focus groups, and standardized tests. The document outlines several prominent models of curriculum evaluation, including Provus' Discrepancy Model, Tyler's objectives-based model, Stufflebeam's CIPP model, Stake's Congruency Model, and Eisner's Educational Connoisseurship qualitative model.
Using Traditional Assessment in the Foreign Language Teaching Advantages and ...YogeshIJTSRD
The article seeks to compare traditional assessment procedures such as multiple choice tests with performance or alternative assessments. The descriptive analysis method was used to express the effectiveness of traditional assessment and its advantages, limitations as well. The assessment types and statements by researchers, and controversial questions are analyzed and compared. The article describes some essential issues of using traditional and authentic assessment types. Furthermore, the article suggests some ways and techniques of using other type of assessment that can be effective in the foreign language teaching process. The author concludes that the types of assessment should be selected according to the content of the program and should be content related. Urazbaeva Dilbar Turdibaevna "Using Traditional Assessment in the Foreign Language Teaching: Advantages and Limitations" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd39858.pdf Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/education/39858/using-traditional-assessment-in-the-foreign-language-teaching-advantages-and-limitations/urazbaeva-dilbar-turdibaevna
Assessment Of Students Argumentative Writing A Rubric DevelopmentSophia Diaz
The document discusses the development of a rubric to assess students' argumentative writing skills. It presents the results of a study that aimed to empirically develop an analytical rubric to evaluate students' achievement of learning outcomes for writing argumentative essays. The study involved 145 students who wrote argumentative essays as part of their final exam. The researchers developed criteria to assess various elements of the essays and tested the rubric's ability to reliably score the students' work and provide feedback on their writing skills. The results suggested the rubric was effective at evaluating students' achievement of learning outcomes and supported the use of genre-specific rubrics to improve student performance through constructive feedback.
A review of classroom observation techniques used in postsecondary settings..pdfErin Taylor
This document reviews different techniques for observing college classrooms. It discusses two main uses of observation - for faculty professional development and for teaching assessment/evaluation. It describes key characteristics that distinguish observation protocols, such as whether they evaluate quality or just describe teaching, and their level of detail. It provides examples of commonly used protocols, including the Reformed Teaching Observation Protocol (RTOP) and UTeach Observation Protocol (UTOP), both of which are criterion-referenced and evaluative in nature. The document concludes by discussing areas for future research on classroom observation techniques in postsecondary education.
Descriptive evaluation of the primary schools an overviewAlexander Decker
This document discusses descriptive evaluation in primary schools. Descriptive evaluation focuses on evaluating students' progress qualitatively by describing strengths and areas for improvement, rather than solely using scores. The document outlines the objectives of descriptive evaluation as improving teaching and learning quality and emphasizing educational goals over test scores. It also describes common tools for descriptive evaluation, such as teacher observations and project-based assessments. While descriptive evaluation can increase student motivation and participation, it also faces challenges in fully achieving goals like improving student attitudes toward learning.
11.descriptive evaluation of the primary schools an overviewAlexander Decker
This document provides an overview of descriptive evaluation in primary schools. It discusses descriptive evaluation methods as an alternative to traditional evaluation that focuses on quality of learning rather than just scores or quantities. The document outlines several studies that found descriptive evaluation improved student participation and learning compared to traditional methods. It then describes the objectives, tools, and standards of descriptive evaluation, as well as its benefits and potential drawbacks. Overall, the document advocates for descriptive evaluation as a way to improve teaching and learning quality, reduce student stress, and change views of the purpose of education.
Johnston, pattie enhancing validity of critical tasksWilliam Kritsonis
NATIONAL FORUM JOURNALS are a group of national and international refereed, blind-reviewed academic journals. NFJ publishes articles academic intellectual diversity, multicultural issues, management, business, administration, issues focusing on colleges, universities, and schools, all aspects of schooling, special education, counseling and addiction, international issues of education, organizational behavior, theory and development, and much more. DR. WILLIAM ALLAN KRITSONIS is Editor-in-Chief (Since 1982). See: www.nationalforum.com
This document discusses standards for education systems including curriculum standards, pedagogical approaches, professional standards, and assessment and evaluation standards. It addresses defining what students should know and be able to do. Standards are level-specific descriptions of expected achievements but do not specify how knowledge and skills should be taught. The document also discusses relationships between standards, lessons, assessments, and organizing curriculums in schools.
Lecture on the
different types of
inferential statistics and
when to use them.
Demonstration of
encoding data in SPSS
and computing statistics.
Hands on practice of
encoding a small data set
and computing statistics
in small groups.
This document discusses developing effective assessment instruments. It covers criterion-referenced versus norm-referenced tests, using portfolios for assessment, evaluating congruence between objectives and assessments, and Dick and Carey's five-step model for creating assessments. Key aspects include ensuring assessments measure the behaviors and criteria in course objectives, considering learner characteristics, and making assessments as realistic to the performance context as possible.
What philosophical assumptions drive the teacher/teaching standards movement ...Ferry Tanoto
What philosophical assumptions drive the teacher/teaching standards movement today? Are standards dangerous?
Week 4 - Reading highlights
Falk, B., 2002 and Tuinamuana, K., 2011
This document summarizes a research paper on assessment criteria in EFL writing skills. The paper examines university tutors' and students' understanding of assessment criteria used to evaluate writing. The analysis found that most tutors did not inform students about assessment criteria or discuss them, affecting students' ability to achieve higher grades. This was likely due to tutors' lack of knowledge about the importance of involving students in the criteria. The findings suggest re-thinking how assessment criteria are used to potentially improve teaching and learning, especially for EFL students in Libya.
1. Assessment refers to the methods used by educators to evaluate students' academic readiness, learning progress, skills, and needs. It is an ongoing process involving collecting, analyzing, and interpreting data.
2. Bloom's taxonomy classifies cognitive objectives into different levels including knowledge, comprehension, application, analysis, synthesis, and evaluation.
3. Common assessment methods mentioned include written responses, product ratings, performance tests, oral questioning, observation, and self-reports. Objective tests are suitable for lower levels while performance tests involve demonstrating a skill.
Utilizing Rubrics in Audio/Visual ProductionCorey Anderson
During the 2016-2017 school year, it became apparent to me that my students at Watkins Overton High School in Memphis, TN, might enjoy a greater sense of academic achievement if they had a better understanding of what was required to receive a rating of Proficient or Advanced when their artifacts are assessed. In the Audio/Visual Production field, these artifacts are almost always something the student must create. I am specifically interested in improving their commercials and public service announcements. Although, high school students have a lot of competing interests, providing rubrics for assignments would give them a way to focus their energy when completing projects and provide a way for them to assess the quality of their own work before submitting it for assessment. Their attention to detail and quality has further implications for post-secondary success. Rallying behind the mantra, Destination 2025! In the year, 2025, our school district’s goal is to have 80% of graduates, college and career ready, 90% graduating on time and 100% of college and career ready graduates will enroll in post-secondary opportunities (Shelby County Schools, n.d.). What tools can I actively use to help my students get the advantage in life and become champions at work? The purpose of this paper is to determine can developing and utilizing rubrics with my high school A/V Production students help improve the quality of their films for public service announcements and commercials. These are my Next Steps.
Validity refers to the appropriateness and usefulness of assessment interpretations and results, while reliability refers to the consistency of measurements. There are various types of validity evidence including content, criterion, and construct validity. Reliability can be estimated through methods like test-retest, equivalent forms, and internal consistency. Ensuring both validity and reliability of assessments is important for making fair and meaningful evaluations of students.
1. The document discusses standards-based assessment and standardized testing. It outlines the key elements of standardized tests and explores both the popularity and criticisms of standardized testing.
2. Concerns about standardized testing include test bias, a overemphasis on test performance leading to test-driven learning, and ethical issues regarding their role in gatekeeping.
3. The document advocates for using multiple measures of assessment, including more formative assessments, to reduce the negative impact of standardized testing and make assessment less biased.
This document discusses tools for assessing cognitive outcomes of service-learning programs. It begins by explaining the importance of assessing service-learning and then provides a review of available assessment tools. The tools are organized into three categories: research scales, written essays/protocols, and interviews/qualitative approaches. Several tools are described in detail, including the Cognitive Learning Scale, Problem-Solving Analysis Protocol, and Problem-Solving Interview Protocol. The conclusion emphasizes that systematic assessment can improve service-learning programs and better demonstrate their impact on student learning.
The document discusses various forms of alternative assessment including portfolios, authentic assessment, and backward design. It provides details on portfolio assessment, describing it as a continuous, multidimensional process that allows for student reflection. Portfolios can be either process-oriented to show development or product-oriented to demonstrate mastery. The advantages of alternative assessments are that they provide opportunities for student self-assessment and application of concepts to real-world expectations. However, implementing alternative assessments also presents challenges.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Journal raganit santos
1. International Journal of Education and Research Vol. 3 No. 2 February 2015
629
ATTRIBUTES OF TEST PAPERS AND LEVEL OF PERFORMANCE OF STUDENTS:
BASIS FOR THE FORMULATION OF THE GUIDELINES ON TEST PAPER
CONSTRUCTION
Amadeo P. Cristobal, Jr. Ed. D.
ABSTRACT
Aiming to establish standards in Saint Paul School of Business and Law, the system of
Test Constructions was given attention. Doing this in research, entails the validation of
test papers used for the preliminary term examinations, school year 2014-2015 which was
submitted to the office of the Academic and Research. This was randomly selected from
the total of 115 test papers, 50% was subjected to validation of four (4) experts. The level
of validation was then related to the level of performance of students and this served as
the basis for the formulation of the guidelines in test construction.
An overall average weighted mean of 3.65 and verbally interpreted quite valid is the result
of the level of validation of the test papers. The Mean of 66.26; Median of 67.00; and
Mode of 60.00 reflect the level of performance of students. The level of validity of test
papers has moderately small positively correlation (0.050) to the level of performance of
students of Saint Paul School of Business and Law. It is therefore highly recommended to
implement the formulated guidelines on test paper construction.
Article II, Quality Assurance Framework, Section 7 of CHED Memorandum Order
(CMO) No.46, series of 2012, Policy-Standard to enhance Quality Assurance (QA) in
Philippine Higher Education Through an Outcomes-Based and Typology-Based Quality
Assurance, states “that Quality Assurance (QA) for CHED does not mean merely
specifying the standards or specifications against which to measure or control quality,
rather, Quality Assurance is about ensuring that there are mechanisms, procedures and
processes in place to ensure that the desired quality however, defined and measured is
delivered. These mechanisms, procedures and processes must be reflected in all
elements of the entire organization, thus in higher education this is translated to its trifocal
roles: instruction, research, and extension.
Instruction, an important role of higher education institutions (HEIs) must be given
enough attention for education to be effective and efficient. The ultimate measures of
which are indicated by the performance of HEIs in the different national board
examinations and the outcomes in the employment of the graduates. These should be
actualized in the various day to day activities in the classroom. Classroom instruction and
classroom management must be attended into.
The assessment process must be considered and be given due attention as here
lies the kind of outcomes higher education aspires to have. Assessment as defined is the
process of judging the performance of students and it serves so many functions such as
2. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
630
measures of student’s achievement, motivates students’ learning, predicts students’
success, diagnose difficulty and evaluates instruction (Raganit, 2010; Santos, 2007;
Calderon, 1993; and Asaad, 2004). This can be in a form of summative, formative,
placement or diagnostic assessment (Santos, 2007).
The principles of good assessment and evaluation must not be left behind.
According to Navarro (2012) and Asaad (2004), assessment and evaluation should start
with the vision, mission and goals of the schools. They should be based from the clear
instructional objectives; they must be also comprehensive, continuous, diagnostic and
functional. They must be conducted through the cooperative efforts of the instructors the
students, school administrators, parents and the stakeholders in the educational
community.
Assessment in the classroom is usually conducted through a series of tests which
are in various forms (Rico, 2011; Santos, 2011; Navarro, 2011; Raganit, 2010; Barosse,
2005) such as subjective, objective, written works, portfolio assessment and the use of
rubrics. This can be presented in a quiz, recitation, long exams, and other forms. There are
many types of test to consider and choose from (Calderon, 1993; Buendicho, 2011;
Corpuz, 2009): recall or supply types (simple recall, fill-in, identification, labelling,
enumeration); recognition or selection type (alternative response, multiple choice, multiple
response, matching type); and rearrangements of item (pictures, words).
In this paper, the test papers used during the preliminary period will be subjected to
a validation process. Though there are different types of validation such as face validation,
content validation, criterion validation and construct validation, it is for practical purposes
that only the content validation will be utilized to propose an institutionalized format of test
papers.
One important aspect in the validation is the preparation of the test papers during
final examinations. The validity of the test papers must be ensured to have an accurate
result of performance evaluation. Validity, one of the qualities of the good examination is
defined as the degree to which an instrument measures what it intends to measure (Rico,
2011; Buendicho, 2011; Navarro, 2012; Raganit, 2010; Miller, 2005; Padua 1997). This is
done by submitting test papers to at least three (3) experts to look into the appearance,
directions, readability, correctness of grammar, spacing, and suitability of words (Rico,
2011). According to Gabuyo (2012); Asaad (2004); and Del Socorro (2011) the following
factors affect the validity of the test papers: appropriateness of test items; directions;
construction of test items; arrangement of items; difficulty of items; reading vocabulary and
sentence structures; length of the test; and patterns of answers. Also, according to Santos
(2007), the adequacy, objectivity, testing condition administrative procedures, usability,
administrability, scorability, economy, utility, interpretability are to be checked to establish
a good test.
The construction of a test is not an easy task. The teacher should develop that skill
of constructing tests considering the qualities of a valid and good tests. The steps in test
constructions are to identify objectives, decide the type of objective test, formulate the
table of specifications, draft the test items and try out and validate (Rico, 2011; Gabuyo,
2012). Moreover, Buendicho (2011) added the following steps: determine the purpose of
the test, select appropriate assessment tools, prepare them, assemble and appraise them
and use the results.
3. International Journal of Education and Research Vol. 3 No. 2 February 2015
631
Likewise the following were suggested by most of the experts mentioned above: the
use of the table of specifications; write more items than
needed; write items well in advance of the testing date; write items so that they call for the
performance described in the behavioural objectives; specify the tasks to be performed
clearly; write items for appropriate reading; provide no clues to answer; and recheck items
when revised for relevance.
Considering the dissertations conducted by Rivera (2007) and Abimbola (2012),
Rivera found out that items developed were directly linked to specific content domains or
objectives. After utilizing the norm-referenced test design, there was a direct alignment
between the agriculture and natural resources core curriculum and the items developed.
Item writing is a skill that with practice one can learn to master, but it was very difficult to
find agriculture teachers with the skills to write good items, and it is equally difficult to find
test specialists with expertise of the specific content. The team of item writers made up of
teachers and extension agents might not have been the best group to design and write
questions. They were knowledgeable in content, but lacked the skills in generating well
constructed test items. Though this dissertation was focused on teaching agriculture, the
procedures conducted in validation are the same with this study.
The latter recommends that examination bodies should make effort to improve the
validity of their examination; need for them to hire appropriate external personnel to help in
item construction; screen the items before and after administration to establish the gender
equity; take interest in how teachers teach their recommended syllabi; and publish this
information as books and training cassette and videotapes and also mount workshops that
students and teachers will attend.
This proposal is anchored on the scientific management theory (Frederick Taylor,
1911) and system theory (Chester Barnard, 1938) cited by Robbins (2012). The former
theory states that in order to develop effectiveness and efficiency the organization should
develop a science in each element of the organization. Likewise, education as a system is
composed of sub-systems in it. All parts of the system must be established in the bound of
science to function well. One of these systems is how the performance of the students
being assessed.
The paradigm on the next page shows three boxes, the criterion variable which is
presented on the first box, the attributes of the test-papers which has the following sub-
variables appearance; grammar and structure; variety of test types; accuracy of direction;
and hierarchy of taxonomy of objectives. The variable, Level of performance of students
presented on the second box. These two boxes are connected by a line which means that
there is a predicted relationship between the two variables. The third box below is the
proposed guidelines in the construction of the test papers for the Saint Paul School of
Business and Law which serves as the offshoot of the validation process represented by
an arrow.
It is the overall objective of this proposal to validate the test papers currently used
during the preliminary examinations, 1st
semester AY 2014-2015 and relate it to the level
of performance of students towards proposing a guidelines to standardize the test paper
format of the St. Paul School of Business and Law. Through this the institution will have
4. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
632
the mechanics and science in the preparations of test papers as an indicator of quality and
excellence.
This proposal sought answers to the general statement of the problem: What
relationship exists in the level of test paper’s attributes to the level of performance of
students of the Saint Paul School of Business and Law.
Specifically, the following questions have answered:
1. What is the level of validity of the attributes of test papers for preliminary
examinations in terms of:
1.1 appearance;
1.2 grammar and structure;
1.3 Characteristics of test ;
1.4 hierarchy of taxonomy; and
1.5 content level?
2. What is the level of performance of the students in different courses based from
the result of the prelim examinations?
3. What is the significant relationship of the test paper’s attributes to the level of
performance of students?
4. What policy guidelines in test construction for theory-based and skill-
development based courses is to be formulated based rom the findings of the
study?
Method
Research Method
This proposal utilized a quantitative-descriptive type of study. This is a scientific
method which refers to a general set of orderly, disciplined procedures to acquire
information which is usually based from the data which are treated statistically. Also, this
aims to gather more information about characteristics within a peculiar field of study which
purports to provide a picture of a situation as it naturally happens (Cristobal, 2013).
The test-papers used for the preliminary examinations across courses were
described to arrive at the common characteristics.
Participants of the Study
The instructors of Saint Paul College of Business and Law comprised the population
of the study which has a total of 65 instructors which was primarily based from the
submitted preliminary examinations. The total of test papers submitted is one hundred
fourteen (114) sets from different courses. Fifty percent (50%) of these instructors, full time
and part-time were randomly chosen and about fifty percent (50%) of the test papers they
have used were subjected for documentary analysis, a technique used to analyse primary
or secondary sources of data which are available.
Using a stratified sampling procedures, the final documents were selected. The first
stratum is the selection of the instructors, second stratum is the test papers and then the
last the classification of the subjects.
5. International Journal of Education and Research Vol. 3 No. 2 February 2015
633
The fifty (50) percent of the total instructors who submitted their test papers, about
33 of them were randomly chosen. Their names in even number as listed served as the
participants of the study. The fifty percent of their test papers totalling to 57 sets from
different courses were randomly chosen. Using a stratified sampling procedures, the final
document was selected. First, these names were written in strips of paper then was put
inside a bowl to randomly select the fifty percent (50%) of the population to serve as
samples. Next, the fifty percent (50%) of the test papers constructed and utilized for the
preliminary examinations by each selected sample was further randomly selected.
Sources of Data
The main source of data are the test papers actually utilized during the preliminary
examinations conducted on July 8-12, 2014.
Instrument of the Study
A checklist was used as the main instrument for content validation. After reading
related literatures the researcher arrived at a proposed checklist.
This checklist is composed of the variables of the study, the attributes of a test-
paper. Various indicators per variable were included to fully analyse the documents and
arrived at a thorough analysis of the said attributes. Each indicator was measured by the
level of appropriateness through a Likert Scale.
EV - Extremely Valid ( The indicator is 81% to 100% valid)
QV – Quite Valid ( The indicator is 61%-80% valid)
FV – Fairly Valid (The indicator is 41%-60% valid)
SV – Slightly Valid ( The indicator is 21%-40% valid)
NV – Not at all Valid ( The indicator is 0% - 20% valid)
This checklist was submitted to three (3) experts in the field of education for
validation purposes. Some of the comments noted were: inclusion of parallelism and use
of tenses in the sub-variable grammar and structure; and the change of the likert scales
from very good to needs improvement to excellently valid to not at all valid. This result was
incorporated in the final checklist.
Procedures
After conducting the stratified sampling for samples and for the documents to
validate, a letter was submitted to the Assistant Vice - President of the Office of the
Academic and Research to gather the test papers which were submitted.
A letter was also sent to the four (4)- selected qualified validators who have the
expertise on test paper construction, two(2) internal validators from SPSBL and two (2)
external validators to conduct the content validation using the validated instrument
checklist.
The checklists were consolidated and subjected to appropriate statistical formulae
for analyses and interpretation purposes.
6. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
634
The result of the content validation and statistical analyses were the bases for the
formulation of the guidelines for the standardized test papers for the SPSBL.
On the Development of the Guidelines for Test Construction
The development of the guidelines for test construction is developed with the use of
the model, Organizational Behaviour Modification developed by Drucker (Robbins, 2000).
This model is composed of six(6) primary steps:
Step 1-Identify Performance – Related Behaviour Events. This where the
researcher identified the level of validation of the test papers used during the preliminary
term of the SPSBL instructors.
Step 2- Measure: Baseline the Frequency of Responses. Tabulate the level of
validity of test papers and measure the gap. According to Yauch (cited by Navarro, 2000)
excellence can be achieved in an organization if efforts are directed at the different
aspects of school operations. This will only happen if the school managers and instructors
are equipped with needed competencies to assure quality. In this study, the competency
in test construction must be possessed by the school managers and instructors and quality
assurance is expected when competencies have value of perfect 5.0.
Step 3-After measuring the gap of the variables of the attributes of test papers, each
variable was analysed with its indicators that need to be enhanced or improved.
Step 4-Develop Intervention Strategy. Based on the analysis of the attributes of test
papers, the researcher developed a guideline on test construction dividing the subjects
into two (2): theory-based subjects and skill-development based subjects. The guideline is
developed based from intensive readings of literatures related to test construction.
Results
Table 1 -5 present the weighted mean of the level of validity of the attributes of the
test papers in terms of the different sub-variables such as appearance, grammar and
structure, characteristics of test, hierarchy of taxonomy and higher ordered thinking skills.
This is to answer the specific statement of the problem, “What is the level of validity of the
attributes of test papers for preliminary examinations in terms of the different sub-variables
mentioned above.
Font style with the highest average weighted mean of 4.33 and size of fonts are
rated EV, excellently valid while attractiveness with the lowest average weighted mean of
3.79 with spacing and uniformity of elements are rated QV, quite valid. The variable,
appearance has an overall weighted mean
7. International Journal of Education and Research Vol. 3 No. 2 February 2015
635
Table I
Weighted Mean of the Level of Validity of the Attributes of Test Papers in Terms of
Appearance
N=57
Indicators E1 E2 E3 E4 AWM VI
Fonts Style 3.90 4.26 4.75 4.42 4.33 EV
Size of Fonts 3.86 4.07 4.57 4.42 4.23 EV
Spacing 3.63 4.39 4.50 3.30 3.96 QV
Attractiveness 3.75 3.54 4.40 3.45 3.79 QV
Uniformity of Elements 3.79 3.63 4.58 3.63 3.91 QV
AWM 3.79 3.98 4.56 3.84 4.04 QV
SD= .35425 Var.= .125
of 4.04 which is equivalent to QV, quite valid. The ratings of the evaluators has a standard
deviation of .35425 and a variance of.125.
Table 2 shows the ratings of the evaluators on the sub-variable, grammar and
structure.
Three indicators, capitalization, pluralisation with an average weighted means of
4.25 and spelling have a rating of EV, excellently valid while the remaining indicators,
subject-verb agreement, use of tenses, parallelism, use of punctuations and sentence
construction have similar ratings of QV, quite valid. This sub-variable, grammar and
construction has an overall average weighted mean od 4.17, verbally interpreted quite
valid.
The ratings of evaluators show a standard deviation of .41524 and a variance of
.172.
8. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
636
Table 2
Weighted Mean of the Level of Validity of the Attributes of Test Papers in Terms of
Grammar and Structure
N=57
Indicators E1 E2 E3 E4 AWM VI
Capitalization 4.01 4.14 4.93 3.93 4.25 EV
Pluralisation 4.06 4.02 4.93 4.00 4.25 EV
Subject-Verb Agreement 4.02 4.00 4.72 3.98 4.18 QV
Use of Tenses 4.07 3.90 4.75 3.96 4.17 QV
Parallelism 3.96 4.04 4.79 3.84 4.16 QV
Spelling 4.07 4.05 4.91 3.89 4.23 EV
Use of Punctuations 4.02 4.02 4.89 3.68 4.00 QV
Sentence Construction 3.89 4.02 4.49 3.95 4.09 QV
Average Weighted Mean 4.01 4.02 4.80 3.90 4.18 QV
Sd = .41524 Var.=.172
Table 3 shows the weighted mean of the level of validity of the attributes of test
papers in terms of characteristics of test.
Table 3
Weighted Mean of the Level of Validity of the Attributes of Test Papers in Terms of
Characteristics of Test
Indicators E1 E2 E3 E4 AWM VI
Variety of Test Types 2.96 2.82 4.49 2.54 3.20 FV
Arrangement of Items 3.16 2.80 4.60 2.73 3.32 FV
Points System 1.54 1.03 4.61 2.12 2.33 SV
Accuracy of Directions 3.05 2.72 4.05 2.49 3.08 FV
Mechanisms 3.02 1.32 4.35 3.07 2.94 FV
Average Weighted Mean 2.78 2.14 4.42 2.59 2.98 FV
Sd = .99520 Var. = .990
9. International Journal of Education and Research Vol. 3 No. 2 February 2015
637
Four out of five indicators, variety of test types, arrangement of items, accuracy of
directions and mechanisms have average weighted means ranging from 2.94 to 3.32 in
which arrangement of items has the highest average weighted mean which are verbally
interpreted fairly valid. The last indicator, points system has an average weighted mean of
2.33, interpreted SV, slightly valid. The sub-variable, characteristics of test has an overall
average weighted mean of 2.94 and verbally interpreted FV, fairly valid. The ratings have a
standard deviation of with a variance of .990.
The sub-variable hierarchy of taxonomy has three indicators. These are
arrangement, balance of difficulty and higher ordered thinking skills. These have average
weighted means ranging from 3.47 to 3.54 which are verbally interpreted quite valid. This
sub-variable has an overall weighted mean of 3.50 which is verbally interpreted QV, quite
valid.
Table 4
Weighted Mean of the Level of Validity of the Attributes of Test Papers in Terms of
Hierarchy of Taxonomy
N=57
Indicators E1 E2 E3 E4 AWM VI
Arrangement 3.33 2.88 4.72 3.07 3.50 QV
Balance of Difficulty 3.35 2.95 4.68 2.89 3.47 QV
HOTS 3.42 2.82 4.74 3.19 3.54 QV
Average Weighted Mean 3.37 2.88 4.71 3.05 3.50 QV
Sd = .83024 Var.=.689
The ratings given have a standard deviation of .83024 and a variance of .689.
The sub-variable, content level has five(5) indicators. Four of it are scope, depth,
clarity ad relevance. They have average weighted means ranging from 3.46 to 3.72 with
relevance has the highest average weighted
10. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
638
Table 5
Weighted Mean of the Level of Validity of the Attributes of Test Papers in Terms of Content
Level
N=57
Indicators E1 E2 E3 E4 AWM VI
Scope 3.42 3.28 4.72 2.42 3.46 QV
Depth 3.35 3.28 4.67 2.96 3.57 QV
Adequacy 3.28 3.21 4.16 2.68 3.33 FV
Clarity 3.49 3.21 4.70 3.02 3.61 QV
Relevance 3.49 3.21 4.65 3.51 3.72 QV
Average Weighted Mean 3.40 3.28 4.58 2.92 3.54 QV
Sd = .76879 Var.=591
mean. They are verbally interpreted, QV or quite valid. The last indicator, adequacy ahs
an average weighted mean of 3.33 and is verbally interpreted FV, fairly valid. This sub-
variable has an overall average weighted mean of 3.54, interpreted QV, quite valid.
The evaluators’ ratings have a standard deviation of .76879 and a variance of .591.
Table 6 shows the answer for the stated question, “What is the level of performance
of the students in different courses based from the result of the prelim examinations?”.
Table 6
Measures of Central Tendencies of the Level of Performance
__________________________________________________________ Cases
Mean Median Mode_____
57 66.26 67.00 60.00_____
The level of performance of students in the preliminary examinations based from
the result of their final examinations has a mean of 66.26, median of 67.00 and a mode of
60.00
Table 7 on the next page shows the statistical data to answer the posted problem,
“What is the significant relationship of the test paper’s attributes to the level of performance
of students?”.
Only one evaluator’s ratings on the level of validity of the test papers show a
moderately small positive correlation with a correlation coefficient of 0.263 to the level of
performance of students, their relationship is significant. Two evaluators’ ratings on the
level of validity of test papers show a very small negative correlation with correlation
coefficients of -0.047 and -0.141 to the level of performance of students and the other has
a moderately small
11. International Journal of Education and Research Vol. 3 No. 2 February 2015
639
Table 7
Significant Relationship of the Test Paper’s Attributes to the Level of Performance of
Students
Test Paper’s
Attributes
r
coefficient
Relationship Significance Decision
Evaluator I 0.263 Moderately small
positive correlation
0.480 Significant
Evaluator 2 -0.141 Very small negative
correlation
0.296 Not
significant
Evaluator 3 -0.047 Very small negative
correlation
0.727 Not
significant
Evaluator 4 0.076 Very small positive
correlation
0.574 Not
significant
Overall 0.050 Moderately small
positive correlation
0.355 Not
significant
positive correlation, 0.076. The relationship of the two (2) variables is not significant. As a
whole the relationship of the level of validity of the test papers and the level of performance
of students for the preliminary term is moderately small positive correlation and this
relationship is not significant since the computed value of r, 0.050 is lower than the
significance value of 0.355.
Discussion
The level of the validity of the test paper’s attributes is assessed using the five sub-
variables. These are the appearance, grammar and structure, characteristics of test,
hierarchy of taxonomy and content level.
Based from the ratings, the indicators of appearance of test papers: fonts style and
size of fonts are 80% to 100% properly observed while the other indicators such as
spacing, attractiveness, uniformity are 60% to 80% observed. The attribute, appearance
has an average weighted mean 4.04 is quite valid. The difference in the ratings of the
evaluators is so low which means that their evaluation is almost unanimous. This means
that the different criteria cited were 60% to 80% satisfied. There is a deficient of not more
than 40% . The instructors must be careful in the construction of their test papers with
respect to the various elements of appearance. They should choose the best fonts style,
size of fonts and spacing. This will make the test papers more attractive and encouraging
to the students especially when all the test papers have uniformity in all elements. This is
in consonance to what was said by Gabuyo (2012); Asaad (2004); and Del Socorro (2011)
that appearance will affect the validity of the test papers. As when the test papers are not
valid they will hinder the good performance of students.
12. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
640
The second sub-variable, grammar and structure is an important ingredient to make
the test valid. When the written words and texts are not properly written, the concepts and
statements not be properly communicated and of course test takers will be confused. The
following indicators: use of capitalization, pluralisation and spelling are 80% to 100%
correctly done while the use of subject and verb agreement, tenses, parallelism,
punctuations and sentence constructions are 60% to 80% correctly observed. As a whole,
this is 60% to 80% properly used. This observation was almost unanimously seen by the
evaluators as reflected by the standard deviation and variance of very low value. As
emphasized by experts reading vocabulary and sentence structure are very much
important in communication, the same will also do in test papers. This shows that not all
instructors are good in the construction of test papers in terms of grammar and structure.
They should be equipped in these competencies in order for them to construct the test
correctly and be able to help students to take test easily concentrating only on the
concepts learned not having much concerns on what their instructors want to imply
because of the items not correctly expressed.
The third sub-variable, characteristics of test has four (4) out of five (5) indicators
which are rated fairly valid. These are variety of test types, arrangement of the items,
accuracy of directions and the points system was rated slightly valid. This sub-variable
has an average weighted mean of fairly valid. The ratings of the evaluators are almost
similar as reflected by the low variance value. This means the instructors have only 20% to
40% correctly construct their test papers in terms of its characteristics. In this element the
instructors must have a thorough knowledge for this is so important to guide test takers in
passing the test. As mentioned in the introduction, these indicators must be given
careful attention (Del Socorro, 2011) the following factors affect the validity of the test
papers: appropriateness of test items; directions; construction of test items; arrangement
of items; difficulty of items; reading vocabulary and sentence structures; length of the test;
and patterns of answers.
The sub-variable, hierarchy of taxonomy with its indicators: arrangement, balance of
difficulty and higher ordered-thinking skills has a rating of quite valid. This means that the
instructors have 61% to 80% constructed the test properly in terms of the hierarchy of
taxonomy. This further shows that the instructors arranged the items not following the
principle of from simple to complex or the theory of higher ordered-thinking skills. Items
should be arranged from the easiest then accelerated to the most difficult purposely to
motivate students to continue taking the test till the last items. Most of the education
experts require the use of table of specifications to scientifically assign the number of items
per taxonomy dependent upon the number of hours consumed in discussing the different
topics or concepts as cited by Rico (2011) and Gabuyo (2012) that the following steps in
test constructions must be followed. These are to identify objectives, decide the type of
objective test, formulate the table of specifications, draft the test items and try out and
validate.
The content of the test is another important sub-variable in test construction. The
following indicators: scope, depth, clarity and relevance are rated quite valid while the
indicator, adequacy is fairly valid. Summing it all, the sub-variable has an average
weighted mean equivalent to quite valid which means that the instructors have 60% to
80% accuracy in satisfying all the concepts and sub-concepts which were discussed
during the preliminary period. Although the syllabi was not used to validate the test items, it
13. International Journal of Education and Research Vol. 3 No. 2 February 2015
641
can be inferred from the ratings that the items in the test papers did not consider
thoroughly the scope and depth of the concepts discussed.
The low validation ratings given by the evaluators show that the testpapers
formulated and used by the instructors of Saint Paul School of Business and Law during
the preliminary term of the first semester, 2014-2015 needs a lot of improvement and
revisions. The result of this study agree to the result of the study of Rivera (2007) and
Abimbola (2012) that even their subject is about agriculture subjects, the same
methodology and principles were used in test construction as they found out item writing is
a skill that with practice one can learn to master, but it was very difficult to find agriculture
teachers with the skills to write good items, and it is equally difficult to find test specialists
with expertise of the specific content. The team of item writers made up of teachers and
extension agents might not have been the best group to design and write questions. They
were knowledgeable in content, but lacked the skills in generating well constructed test
items. Differently agreed to this findings that items developed were directly linked to
specific content domains or objectives.
Lastly, as cited commonly by education experts the following should be used as a
guide in the construction of test papers: the use of the table of specifications; write more
items than needed; write items well in advance of the testing date; write items so that they
call for the performance described in the behavioural objectives; specify the tasks to be
performed clearly; write items for appropriate reading; provide no clues to answer; and
recheck items when revised for relevance.
In posted specific problem no. 2, based from the mean, median and mode of 66.26,
67.00 and 60.00, it could be inferred that the students who took the exam using the test-
papers during the preliminary term is low considering that the passing grade is 75%. This
further explains that though there are many factors to consider in having a high level of
performance, it could not be denied that the manner of constructing test papers is very
significant to a level of performance. That when the test papers are valid enough, which
measures what it intends to measure, it will result to a higher performance. Likewise, the
result of this study shows that most of the test items are not appropriate and wrongly
formulated.
The result of the pearson-product moment of correlation which was
computed resulted to a moderately small positive correlation. This means that when the
test papers used in the final examination has a high validity it will have a small impact to
the level of performance of the students or when the test has a low validity it will also have
s small impact to the level of performance. The result of this study is negating the result of
the study of Rivera (2007) and Abimbola (2012) that though they found out that items
developed were directly linked to specific content domains or objectives, they found out
that item writing is a skill that with practice one can learn to master, but it was very difficult
to find agriculture teachers with the skills to write good items, and it is equally difficult to
find test specialists with expertise of the specific content. The team of item writers made up
of teachers and extension agents might not have been the best group to design and write
questions. Though their study is on agriculture, the principles in test construction are
exactly the same.
This study therefore, concluded that though the level of validation on the attributes
of test paper does not significantly relate to the level of performance of students of Saint
14. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
642
Paul School of Business and Law, it could not exclude the fact that test papers
construction is so important towards a high level of performance. Furthermore, it is
recommended that a training on test construction and English proficiency which includes
rules in capitalization, pluralization, use of tenses, parallelism, use of punctuation among
others must be conducted and the proposed guidelines for test construction be observed
for standardization purposes.
SPSBL GUIDELINES FOR TEST CONSTRUCTION
1. Appearance
1.1Margins: .5 inch in all sides
1.2Font type and size – Arial; 12 points
1.3Spacing - double spaced
1.4Size of paper – 8.5 x 13
1.5Printing – RISO printed or originally printed
1.6Pagination – center of the bottom margin
2. Grammar and Structure – Observe and apply the rules for capitalization,
pluralization, subject-verb agreement, use of tenses, parallelism, use of
punctuations and spell words correctly.
3. Characteristics of Test
3.1Test Types
For theory-based subjects - Use at least 3 types of objective types (80%):
(Identification, Completion, Labelling, Enumeration, Alternative Response (TF),
Matching, Multiple Choice, Rearrangement) and a Restricted Essay type (20%).
For skill-development based subjects –cognition (25%) and skill domain
(Problem Solving) types (75%)
3.2 Test Types must be arranged according to the level of difficulty.
3.3 Points per item or per test type must be defined. Easy item, 1 point and difficult
items, more points (total points for final examination=50 points)
3.4 Directions should be clear on what tasks to do, where to write and other
significant requirement
4. Hierarchy of Taxonomy of Objectives
4.1Arrangement of Items – items must be arranged from simple to complex; from
the easiest item to answer to the most difficult
4.2Difficulty of Items – items should be more on the development of analytical,
critical, and decision making skills.
4.3Higher- Ordered Thinking Skills – utilize the use of table of specifications
5. Content Level – satisfy the standardized syllabi
5.1Scope – Confine items on the content that is developed in the period based from
the approved syllabus.
5.2 Depth – Include the details of the concept in the test.
5.3Adequacy – Remember the time spent in developing the concept for it is the
basis of how many items/points to be given.
5.4Clarity – Use clear, concise and specific statements in simple sentences to be
easily understood by test takers.
5.5 Relevance – Include practical scenario or incidence that is happening locally,
nationally or internationally.
15. International Journal of Education and Research Vol. 3 No. 2 February 2015
643
6. Others
6.1 The draft of the test papers must be made 10 days before the scheduled final
examinations.
6.2 The drafted test papers will be submitted to the dean/academic heads for checking of
the content and the format.
6.3 Final copies of the test papers must be signed at the back for printing purposes.
6.4 A training on the following must be scheduled:
6.4.1 test construction
6.4.1 grammar and structure
References
A. Published Materials
Asaad, Abubakar S and Hailaya, Wilham M. (2004). Measurements and evaluation:
concepts and principles. Quezon City: Rex Bookstore, Inc.
Barosse, Emily (2005). Classroom assessment: key concepts and applications, 5th
edition.
NY; McGraw – Hill.
Buendicho, Flordeliza (2010). Assessment of Learning I. Phils: Rex Bookstore, Inc.
Calderon, Jose F. and Gonzales, Expectacion C. (1993) Tests and measurements. Pasig
City: Capitol Publishing House.
Costa, Arthur L. (2004). Assessment strategies for self-directed learning. California: Sage
Publications, Ltd.
Cristobal, Amadeo Jr. P. and Cristobal, Maura Consolacion D. (2013). Research made
easier a step by step process. Quezon City: C & E Publishing Incorporated.
Del Socorro, Felicidad R. et al. (2011). Quezon City: Great Books Publishing.
Gabuyo, Yonardo, A.(2012). Assessment of learning I textbook and reviewer. Quezon
City: Rex Printing Company, Inc.
Kachru, Upendra (2007). Production and operations management, text and cases. New
Delhi: Excel books, p.5
16. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
644
Miller, I and David, I.(2005). Educational tests and measurements. US: Pearson Prentice
Hall.
Navarro, Rosita L and Santos, Rosita G. (2012). Assessment of Learning Outcomes
(Assessment 1), 2nd
Edition. Aurora Blvd: Lorimar Publishing, Inc.
Padua, Roberto N. and Santos, Rosita G (1997). Quezon City. Katha Publishing Co. Inc.
Raganit, Arnulfo R. et al (2010). Assessment of student learning I (cognitive learning).
Quezon City: C & E Publishing , Inc.
Rico, Alberto (2011). Assessment of student’s learning: a practical approach. Philippines:
Anvil Publishing, Inc.
Robbins, Stephen R. (2000). Organizational behaviour. San Diego, USA: Prentice Hall
International, Inc.
Robbins, Stephen R. and Coulter Mary (2012). Management global edition, 11th
edition.
Boston: Pearson Education Limited.
Santos, Rosita (2007). Advanced methods in educational assessment and evaluation of
learning 2. Aurora Blvd: Lorimar Publishing House.
Santos, Rosita D. (2007). Assessment of learning I. Aurora Blvd: Lorimar Publishing Inc.
B. Unpublished Materials
Abimbola, Isaac O. (2012), Advances in the development and validation of instruments
for assessing student’s science knowledge. University of Llorin: Department of
Curriculum Studies and Educational Technology
Rivera, Jennifer Ellain (2007). Cornell, University: unpublished dissertation.
17. International Journal of Education and Research Vol. 3 No. 2 February 2015
645
APPENDIX A
SAINT PAUL SCHOOL OF BUSINESS AND LAW
Palo, Leyte
PROBLEM: ATTRIBUTES OF TEST PAPERS AND LEVEL OF PERFORMANCE:
BASIS FOR THE FORMULATION OF THE GUIDELINES ON TEST PAPER
CONSTRUCTION
DIRECTIONS: Indicate the degree of appropriateness of the different attributes of the test
papers. Check the appropriate column.
EV - Extremely Valid ( The indicator is 81% to 100% valid)
QV – Quite Valid ( The indicator is 61%-80% valid)
FV – Fairly Valid (The indicator is 41%-60% valid)
SV – Slightly Valid ( The indicator is 21%-40% valid)
NV – Not at all Valid ( The indicator is 0% - 20% valid)
18. ISSN: 2201-6333 (Print) ISSN: 2201-6740 (Online) www.ijern.com
646
ATTIBUTES OF TEST PAPERS EV QV FV SV NV
Appearance
1. Fonts style
2. Size of the fonts
3. Spacing
4. Attractiveness
5. Uniformity of elements
Grammar and Structure
1. Capitalization
2. Pluralization
3. Subject-verb agreements
4. Use of Tenses
5. Parallelism
6. Spelling
7. Use of Punctuations
8. Sentence Construction
Characteristics of Test
1. Variety of Test Types
2. Arrangement of test types
3. Points system
4. Accuracy of Directions
5. Mechanisms (Clarity of Directions)
Hierarchy of Taxonomy
1. Arrangement of items
2. Balance of Difficulty
3. HOTS (Higher Order Thinking Skills)
Content Level
1. Scope
2. Depth
3. Adequacy
4. Clarity
5. Relevance