The document discusses methods for establishing the validity and reliability of assessment tools, including correlation coefficients, levels of measurement, reliability measures like test-retest reliability and internal consistency, and validity measures like content validity, criterion validity, and construct validity. It also covers item analysis methods like calculating item difficulty and discrimination indices. The overall aim is to evaluate how well an assessment tool measures what it intends to measure and produces consistent results.
Traditionally examination was the purpose of learning. However, our conception of learning is changing and it is being front ended. Now assessment is also being treated as learning. This presentation deals with assessment, feedback and assurance of learning.
Overview of Assessment
It is an integral part of instruction, as it determines whether or not the goals of education are being met.
3 criteria of assessment
Validity
Reliability
Practicality
(Farhady,2012)
Assessment
Assessment information is needed by administrators, teachers, staff developers, students, and parents to assist in determining appropriate program placements and instructional activities as well as in monitoring student progress. (O’Malley,1994)
Assessment Purposes of ELL Students
Screening and identification
Placement
Reclassification or exit
Monitoring Student Progress
Program Evaluation
Accountability
(O’Malley,1994)
Is it possible to explain why the student outputs is as they are through an assessment of the processes which they did in order to arrive at the final product?
YES, through Process oriented, performance-based assessment
Traditionally examination was the purpose of learning. However, our conception of learning is changing and it is being front ended. Now assessment is also being treated as learning. This presentation deals with assessment, feedback and assurance of learning.
Overview of Assessment
It is an integral part of instruction, as it determines whether or not the goals of education are being met.
3 criteria of assessment
Validity
Reliability
Practicality
(Farhady,2012)
Assessment
Assessment information is needed by administrators, teachers, staff developers, students, and parents to assist in determining appropriate program placements and instructional activities as well as in monitoring student progress. (O’Malley,1994)
Assessment Purposes of ELL Students
Screening and identification
Placement
Reclassification or exit
Monitoring Student Progress
Program Evaluation
Accountability
(O’Malley,1994)
Is it possible to explain why the student outputs is as they are through an assessment of the processes which they did in order to arrive at the final product?
YES, through Process oriented, performance-based assessment
Guiding your child on their career decision makingCarlo Magno
This presentation provides perspective for parents to understand the career development of their child and how they get involved in their child's career development.
This slide describes self-regulated learning based on the components derived by Magno (2010). Different ways of teaching it inside the classroom are presented
Chapter 6: Writing Objective Test Items
1) What is an objective test items?
2) Examples of an objective test items
a) True or False
• Advantages & Disadvantages
• Suggestions for writing true or false test items
b) Matching Type
• Advantages & Disadvantages
• Suggestions for writing matching type test items
c) Multiple Choice
• Advantages & Disadvantages
• Suggestions for writing multiple choice test items
d) Completion Test
• Advantages & Disadvantages
• Suggestions for writing completion test items
3) Guidelines for writing test items
Hello everyone, this is Vartika Verma, student of B. El. Ed 4. This presentation titled 'Reliability' is helpful for the subject 'Measurement and Evaluation' in B. El. Ed 4 and also for all the Education students. Thanking you :)
This session answers the following questions: (1) What are the implications of the 4IR on Educational Assessment and Education as a whole? (2) What skills do we need to assess given the landscape of the 4IR? (3) How do we assess such skills to prepare students in the 4IR? (4) What standards should schools adapt to prepare students in the 4IR?
The objectives of this session are: (1) Identify the characteristics of an effective research mentor, (2) Identify issues and problems in thesis/research mentoring. (3) Make a flowchart of the mentoring process
Managing technology integration in schoolsCarlo Magno
This session answers the following questions: (1) How do we integrate technology in teaching and learning? (2) Is technology integration effective? (3) How do we support technology integration in our schools? (4) How do we know we are in the right track on technology integration?
This session first describes 21st century learning. Technology integration is described, shift in the use of technology in learning, the use of LMS, and the flipped classroom.
Empowering educators on technology integrationCarlo Magno
This presentation answers the following questions: (1) What is the status of technology integration among private schools? (2)What is needed among teachers to implement well technology integration? (3) What is needed among school administrators to make technology integration work? (4) What are the indicators of successful practice in ICT integration?
This slide tackles the steps, guidelines, and parts of an online lesson. A checklist is provided to assess whether the online lesson conform to quality standards.
This presentation provides an overview of K to 12 Curriculum in the Philippines. The different principles to be considered in teaching and learning the curriculum based on the best teaching and learning practices of the APA is tackled.
Accountability in Developing Student LearningCarlo Magno
This slide emphasizes on the role of instructional leaders to support instruction that would eventually lead to student learning. Different strategies on instructional leadership is tackled in order to achieve student progress overtime.
The Instructional leader: TOwards School ImprovementCarlo Magno
This slide contains (1) Purpose of instructional leadership, (2) What is instructional leadership? (3) Curriculum involvement
Functions of an instructional leader, (4) Roles of the instructional leader (5) Characteristics of instructional leadership, (5) Activities of instructional leadership, (6) Effective instructional leaders, (7) Instructionally effective schools, and (8)
Philippine Professional Standards for Teaching.
This presentation emphasizes on assessing science based on learning competencies, selecting appropriate forms of assessment and developing written and performance based tasks on science.
Assessment in the Social Studies CurriculumCarlo Magno
This presentation contains two assessment competencies of teachers in social studies: (1) Constructive alignment and (2) and making decisions as to give written works or performance-based assessment in class. Some guidelines in making paper and pencil items and performance-based task are presented.
This presentation covers new perspectives in using books in the classroom. The utility of books are integrated with pedagogical practices such as essential questions, inquiry-based approach, authentic-based tasks, and learner-centeredness
3. Objectives
1. Determine the use of the different
ways of establishing an assessment
tools’ validity and reliability.
2. Familiarize on the different methods
of establishing an assessment tools’
validity and reliability.
3. Assess how good an assessment
tool is by determining the index of
validity, reliability, item
discrimination, and item difficulty.
6. Degree of Relationship
0.80 – 1.00 Very High
relationship
0.6 – 0.79 High Relationship
0.40 – 0.59 Substantial/Marked
relationship
0.20 – 0.39 Low relationship
0.00 – 0.19 Negligible
relationship
7. Testing for Significance
Nominal: Phi Coefficient
Ordinal: Spearman rho
Interval & Ratio: Pearson r
Interval with nominal: Point biserial
Decision rule:
If p value < =.05: significant
relationship
If p value > =.05: no significant
relationship
8. Variance
R2
Square the correlation coefficient
Interpretation: percentage of time
that the variability in X accounts for
the variability in Y.
9. Reliability
Consistency of scores Obtained by
the same person when retested
with the identical test or with an
equivalent form of the test
10. Test-Retest Reliability
Repeating the identical test on a second
occasion
Temporal stability
When variables are stable ex: motor
coordination, finger dexterity, aptitude,
capacity to learn
Correlate the scores from the first test
and second test.· The higher the
correlation the more reliable
11. Alternate Form/Parallel
Form
Same person is tested with one form on
the first occasion and with another
equivalent form on the second
Equivalence;
Temporal stability and consistency of
response
Used for personality and mental ability
tests
Correlate scores on the first form and
scores on the second form
12. Split half
Two scores are obtained for each person
by dividing the test into equivalent halves
Internal consistency;
Homogeneity of items
Used for personality and mental ability
tests
The test should have many items
Correlate scores of the odd and even
numbered items
Convert the obtained correlation
coefficient into a coefficient estimate
using Spearman Brown
13. Kuder Richardson
(KR #20/KR #21)
When computing for binary
(e.g., true/false) items
Consistency of responses to all
items
Used if there is a correct answer
(right or wrong)
Use KR #20 or KR #21 formula
14. Coefficient Alpha
The reliability that would result if all
values for each item were
standardized (z transformed)
Consistency of responses to all
items
Homogeneity of items
Used for personality tests with
multiple scored-items
Use the cronbach’s alpha formula
15. Inter-item reliability
Consistency of responses to all
items
Homogeneity of items
Used for personality tests with
multiple scored-items
Each item is correlated with every
item in the test
16. Scorer Reliability
Having a sample of test papers
independently scored by two examiners
To decrease examiner or scorer variance
Clinical instruments employed in
intensive individual tests ex. projective
tests
The two scores from the two raters
obtained are correlated with each other
17. Validity
Degree to which the test actually
measures what it purports to
measure
18. Content Validity
Systematic examination of the test
content to determine whether it covers a
representative sample of the behavior
domain to be measured.
More appropriate for achievement tests
& teacher made tests
Items are based on instructional
objectives, course syllabi & textbooks
Consultation with experts
Making test-specifications
19. Criterion-Prediction
Validity
Prediction from the test to any
criterion situation over time interval
Hiring job applicants, selecting
students for admission to
college, assigning military
personnel to occupational training
programs
Test scores are correlated with
other criterion measures ex:
mechanical aptitude and job
performance as a machinist
20. Concurrent validity
Tests are administered to a group
on whom criterion data are already
available
Diagnosing for existing status ex.
entrance exam scores of students
for college with their average grade
for their senior year.
Correlate the test score with the
other existing measure
21. Construct Validity
The extent to which the test may be said
to measure a theoretical construct or
trait.
Used for personality tests. Measures that
are multidimensional
Correlate a new test with a similar
earlier test as measured approximately
the same general behavior
Factor analysis
Comparison of the upper and lower
group
Point-biserial correlation (pass and
fail with total test score)
Correlate subtest with the entire test
22. Convergent Validity
The test should correlate
significantly from variables it is
related to
Commonly for personality
measures
Multitrait-multidimensional matrix
23. Divergent Validity
The test should not correlate
significantly from variables from
which it should differ
Commonly for personality
measures
Multitrait-multidimensional matrix
24.
25.
26.
27.
28. Item Analysis
Item Difficulty – The percentage of
respondents who answered an
item correctly
Item Discrimination – Degree to
which an item differentiates
correctly among test takers in the
behavior that the test is designed
to measure.
29. Difficulty Index
Difficulty Index Remark
.76 or higher Easy Item
.25 to .75 Average Item
.24 or lower Difficult Item
30. Index Discrimination
.40 and above - Very good item
.30 - .39 - Good item
.20 - .29 - Reasonably Good
item
.10 - .19 - Marginal item
Below .10 - Poor item