Faith & Reason
“Faith is not opposed to reason, but is sometimes opposed to feelings and appeareances.” Tim Keller
How do faith and reason coexist for the Christian disciple? Do faith and reason oppose each other, work together, or end up at the same end goal from completely unrelated paths?
In Ephesians ch. 4, Paul writes:
Ephesians 4:11-15 New King James Version (NKJV)
11 And He Himself gave some to be apostles, some prophets, some evangelists, and some pastors and teachers, 12 for the equipping of the saints for the work of ministry, for the [a]edifying of the body of Christ, 13 till we all come to the unity of the faith and of the knowledge of the Son of God, to a perfect man, to the measure of the stature of the fullness of Christ; 14 that we should no longer be children, tossed to and fro and carried about with every wind of doctrine, by the trickery of men, in the cunning craftiness of deceitful plotting, 15 but, speaking the truth in love, may grow up in all things into Him who is the head—Christ—
Faith and knowledge /reason will always feed off one another as we grow in Christ.
Throughout the rest of this semester we will be discussing our faith and how we think through issues related and influenced by our faith.
Christian Reflections – Reflection paper 3-4 pages (1,050-1,400 words) APA format, include references.
To what extent is religious faith objective (i.e., based on reasons or evidence that should be obvious to others) and/or subjective (i.e., based on personal reasons that are not necessarily compelling to others)?
1) In what ways and to what extent do you believe that faith:
· Is derived from what we consider to be true and reasonable?
· Goes beyond what reason and evidence dictate?
· Goes against what is reasonable?
2) What is the role of feelings and emotions in religious faith?
· Does faith depend upon them?
· To what extent should they embraced or controlled?
1
Promoting Reliability
Both MacMillan and Dar (see below) provide suggestions on how promote reliability in classroom assessments. Doing the things mentioned
below can help control both external and internal sources of error which in turn helps bolster reliability of test scores.
McMillan’s (2006, p.51) suggestion on how to help bolster or promote reliability in the classroom assessments:
Motivated students to put forth their best efforts on assessment
Use sufficient number of items or tasks. A minimum of 5 items is needed to assess a single trait or skill
Construct items, scoring criteria, and tasks that clearly differentiate students on what is being assessed, and make the criteria
public
Make sure scoring procedures for constructed-response items are consistently applied to all students
Use independent raters or observers to score a sample of student responses, and check consistency with your evaluations
Build in as much objectivity into scoring as possible and still maintain the integrity of what is be.
Faith & ReasonFaith is not opposed to reason, but is sometime.docx
1. Faith & Reason
“Faith is not opposed to reason, but is sometimes opposed to
feelings and appeareances.” Tim Keller
How do faith and reason coexist for the Christian disciple? Do
faith and reason oppose each other, work together, or end up at
the same end goal from completely unrelated paths?
In Ephesians ch. 4, Paul writes:
Ephesians 4:11-15 New King James Version (NKJV)
11 And He Himself gave some to be apostles, some prophets,
some evangelists, and some pastors and teachers, 12 for the
equipping of the saints for the work of ministry, for
the [a]edifying of the body of Christ, 13 till we all come to the
unity of the faith and of the knowledge of the Son of God, to a
perfect man, to the measure of the stature of the fullness of
Christ; 14 that we should no longer be children, tossed to and
fro and carried about with every wind of doctrine, by the
trickery of men, in the cunning craftiness of deceitful
plotting, 15 but, speaking the truth in love, may grow up in all
things into Him who is the head—Christ—
Faith and knowledge /reason will always feed off one another as
we grow in Christ.
Throughout the rest of this semester we will be discussing our
faith and how we think through issues related and influenced by
our faith.
Christian Reflections – Reflection paper 3-4 pages (1,050-1,400
words) APA format, include references.
To what extent is religious faith objective (i.e., based on
reasons or evidence that should be obvious to others) and/or
2. subjective (i.e., based on personal reasons that are not
necessarily compelling to others)?
1) In what ways and to what extent do you believe that faith:
· Is derived from what we consider to be true and reasonable?
· Goes beyond what reason and evidence dictate?
· Goes against what is reasonable?
2) What is the role of feelings and emotions in religious faith?
· Does faith depend upon them?
· To what extent should they embraced or controlled?
1
Promoting Reliability
Both MacMillan and Dar (see below) provide suggestions on
how promote reliability in classroom assessments. Doing the
things mentioned
below can help control both external and internal sources of
error which in turn helps bolster reliability of test scores.
McMillan’s (2006, p.51) suggestion on how to help bolster or
promote reliability in the classroom assessments:
assessment
3. items is needed to assess a single trait or skill
, and tasks that clearly
differentiate students on what is being assessed, and make the
criteria
public
-response items
are consistently applied to all students
ore a sample of
student responses, and check consistency with your evaluations
maintain the integrity of what is being assessed.
Eliminate or reduce external sources of error
assessments
assessment
illustrate scoring criteria
protocols and procedures
student to keep.
4. Darr’s (2005, p. 60) Factors Affecting Reliability (and
Reliability Coefficients):
– more tasks will
generally lead to higher reliability
assessed- questions are too hard or too easy for students will
not increase
reliability
– the larger
the spread, the higher the reliability
procedures
– carefully worded rubrics make it
easier to decide on achievement levels
assessment are followed
–
assessing student when they are tired or after an exciting event
is less
likely to produce reliable results.
5. 2
Reliability as a Continuum
I think that it is helpful to think of Reliability as a continuum
and absolute perfect reliability of a score can never be achieved
especially when
attempting to measure something that cannot be directly
observed like student knowledge. A person’s score on a test will
always contain some
degree of error because it is impossible to control all sources of
internal and external error.
Sources of Reliability Evidence
The 4 basic sources of reliability evidence (stability across
time, equivalence across forms, internal consistencies, and
scorer/rater
consistency), capture ways we can go about assessing reliability
with the context of an assessment or decision. However, all
sources are not
appropriate for every situation where you want to assess the
reliability of an assessment or decision. Thus, not all ways of
gathering reliability
evidence are appropriate for every situation.
How one goes about collecting reliability evidence for these
four sources often differs based on whether or not the test is a
classroom
6. assessment or a large-scale assessment. As Sarah Godlove
Evans discusses in her blog, evidence of reliability is collected
formally using
statistical analyses for large-scale assessment but more
informally for classroom based, teacher-created assessments.
“Reliability is a trait
achieved through statistical analysis in a process called
equating. Equating is one of the many behind-the-scenes
functions performed by
psychometricians, folks trained in the statistical measurement of
knowledge.
In general, the informal, classroom based, teacher-created
assessments do not directly engage with the concept of
reliability, as these types of
assessments do not require advanced statistical analysis;
however, they do informally engage with the concept.
Gathering Sources of Reliability Evidences
Evidence What does it tell
us?
When is it
appropriate?
How to can we assess it formally in
7. Large-scale assessments using
statistical methods?
How can we assess it informally in
Classroom Assessment?
Stability
across time
(Test-retest)
Answers the
questions:
--Are scores on a
test stable across
time?
--Are the results
approximately the
same when they
take the same test
Only when one
expects scores
8. from “tests”
administered at
two different
times to the
same group of
students to
remain the
The same test is administered at two
different times to the same group of
students.
Example:
The scores from the two administrations
are correlated & resulting correlation is
called a test-retest reliability coefficient.
Not often assessed
Why:
gathering evidence using stability across time
9. is only a good estimate of reliability when what
is being assessed is not expected to change
during the time span between the two tests. In
a situation where a student has studied between
the time she took the test the first time and the
3
on different
occasions?
Provides an
indication of how
consistent the
scores are over
time within
students on the
same test.
approximately
10. same.
High positive correlations supports
stability across time
second time she takes the test, stability across
time would not be an appropriate way to gather
reliability evidence for these scores. One would
not expect her score the first time she took the
test to be the same as the second time she took
it since what was being assessed on the test was
expected to change due to her studying
between the two testing periods.
Example:
I can administer the test on one day to half
of the students and the next day to the other
half of the students. Reliability evidence
should show consistent scores across the
student population; no matter what day the
11. test was taken.
Stability
across test
forms
(parallel
forms)
Answers the
question:
Do scores on two
forms (re-ordered
items or different
items) of a test
measure the same
thing?
Are results
approximately the
same when
12. different but
equivalent tests
are taken at the
same time or at
different times?
Only when one
expects scores
from
equivalent test
administered at
two different
times to the
same group of
students to
remain the
approximately
same.
13. A test and an equivalent version of the
test, is administered to the same group
of students.
Example:
The scores from the two administrations
are correlated & resulting correlation is
called a parallel forms reliability
coefficient. High positive correlations
supports stability across test forms
Of conceptual use, but typically do not
compute an actual reliability coefficient.
When:
--Create a different but equivalent version of a
test for students need to make-up the original
test
--Create two equivalent versions of a test to
discourage cheating off other students’ tests.
14. Example:
This will establish reliability by using
alternative forms, Form 1 and Form 2, of my
unit test to measure the same skill or concept. I
could have one group of students take Form 1
first and then take Form 2. The second group of
students will take Form 2 first and then Form 1.
Although it would be rather difficult and time
consuming, I could then correlate the scores on
4
Provides an
indication of how
comparable or
consistent the two
forms of the test
are.
15. both forms and produce a reliability coefficient.
This would show me the strength of the
relationship between the two scores.
Internal
Consistency
Answers the
question:
Do different
items on a test
measure the same
construct?
Indication of how
consistently the
items or tasks
within a test
promote the same
result
Only
16. appropriate to
use if the items
on a test are
measuring the
same thing.
The results on different items on a test
give to the same group of students are
compared to see how well they relate.
In other words, provides an indication
of how
Consistently the items or tasks within a
test promote the same result.
Examples:
3 ways internal consistency reliability
coefficients
--Split-half
--Cronbach’s alpha
17. --Kuder Richardson formula 20 or 21.
Used conceptually, Typically do NOT compute
an actual reliability coefficient
When:
Wanting to make sure that performance on
items in a test that are measuring the same
construct are consistent in terms of results
Example:
I will have several items measuring the same
trait. My logic is that scores on these items
measuring the same trait should be correlated.
This will assist me in estimating how well the
items within my test are functioning in a
consistent manner. This is particularly useful
for me because I only have to administer my
test one time to gain this type of reliability
18. Inter-rater Answers the
questions:
--When using
different raters,
does it matter
who does the
scoring?
--Is one rater's
score
similar/consistent
to another rater's
score?
When more
than one
person is
scoring/grading
a test.
When a test taken by students is
19. scored/graded by at least two or more
scorer/graders.
The scores from the two raters/graders
are compared using % agreement on
items or the total scores for each
grader/rater are correlated.
High % agreement or
High positive correlations
supports stability across raters.
Used conceptually sometimes, but typically do
not compute an actual reliability coefficient.
When:
Comparing on teachers scoring/grading of a
test to another teacher’s scoring/grading. Good
idea to do this when the scoring is more
subjective.
20. Example:
Asking another teacher within my department
to review a small sample of my students’
5
Results from
different
raters/scores can
be compared to
ascertain the level
of
agreement. This
method used to
show how
consistently two
or more assessors
are scoring the
21. same items/tasks
answers to the essay and performance
components of my unit test. I have already
established the criteria they are to be evaluated
on, which are specified in the rubric I created. I
will then see if the scores I came up with are in
agreement to those my colleague assigned to
the students’ answers.
References
Darr, C. (2005). A hitchhiker’s guide to reliability. SET:
Research Information for Teachers, 2005(3), 59-60.
Evans, S. G. (2013, November 6). Five characteristics of quality
educational assessments – part two [Web log post]. Retrieved
from
https://www.nwea.org/blog/2013/five-characteristics-quality-
educational-assessments-part-two/
McMillan, J. H. (2008). Assessment essentials for standards-
based education. Thousands Oaks, CA: Corwin Press
22. https://www.nwea.org/blog/2013/five-characteristics-quality-
educational-assessments-part-two/
1
Assessing the Validity of Inferences Made from Assessment
Results
Sources of Validity Evidence
• Validity evidence can be gathered during the development of
the assessment or after the
assessment has been developed.
• Some of the methods used to gather validity evidence can
support more than one type of
source (e.g., test content, internal structure).
• Large scale assessment and local classroom assessment
developers often use different
methods to gather validity evidence.
o Large scale assessment developers use more formal, objective,
systematic, and
statistical methods to establish validity.
o Teachers use more informal and subjective methods which
often to not involve
the use of statistics.
Evidence based on Test Content
23. • Questions one is striving to answer when gathering validity
evidence based on test
content or construct:
o Does the content of items that make-up the assessment fully
represent the concept
or construct the assessment is trying to measure?
o Does the assessment accurately represent the major aspects of
the concept or
construct and not include material that is irrelevant to it?
o To what extent do the assessment items represent a larger
domain of the concept
or construct being measured?
• The greater the extent to which an assessment represents all
facets of a given concept or
construct, the better the validity support based on the test
content or construct. There is
no specific statistical test associated with this source of
evidence.
• Methods used to gather validity evidence based on test content
or construct
o Large Scale Assessments
§ Have experts in the concept or construct being measured
create the
assessment items and the assessment itself.
§ Have experts in the concept or construct examine the
assessment and
review it to see how well it measures the concepts or construct.
These
24. experts would think about the following during the review
process:
§ The extent to which the content of assessment
represents the content or construct’s domain or
universe.
§ How well the items, tasks, or subparts of the
assessment fit the definition of the construct and/or
the purpose of the assessment.
§ Is the content or construct underrepresented, or are
there content or construct-irrelevant aspects of the
assessment that may result in unfair advantages for
2
one or more subgroups (e.g., Caucasians, African
Americans)?
§ What is the relevance, importance, clarity, and lack
of bias in the assessment’s items or tasks
o Local Classroom Assessment
§ Develop assessment blue prints which indicate what will be
assessed as
well as the nature of the learning (e.g., knowledge, application,
etc.) that
should be represented on the assessment.
§ Build a complete set of learning objectives or targets, showing
25. number of
items and/or percentage of items/questions on the assessment
devoted to
each.
§ Discuss with others (e.g., teachers, administrators, content
experts, etc.)
what constitutes essential understandings and principles.
§ Ask another teacher to review your assessment for clarity and
purpose.
§ Review assessments before using them to make judgments
about whether
the assessment, when considered as a whole, reflects what it
purports to
measure.
§ Review assessments to make judgments about whether items
on the
assessment accurately reflects the manner in which the concepts
were
taught.
§ Have procedures in place to ensure that the nature of the
scoring criteria
reflect important objectives. For example, if students are
learning a
science skill in which a series of steps needs to be performed,
the
assessment task should require them to show their work.
§ When scoring answers give credit for partial correct answers
when
possible.
26. § Examine assessment items to see if they are favoring groups
of students
more likely to have useful background knowledge—for instance,
boys or
girls.
Evidence based on Response Processes
• Questions one is striving to answer when gathering validity
evidence based on response
processes:
o Do the assessment takers understand the items on the
assessment to mean what
the assessment developer intend them to mean?
o To what extent do the actions and thought processes of the
assessment takers
demonstrate that they understand the concept or construct in the
same way it is
defined by the assessment developer?
• The greater the extent to which the assessment developer can
be certain that the actions
and thought processes of assessment takers demonstrate that
they understand the concept
or construct in the same way he/she (assessment developer) has
defined it, the greater the
validity support via evidence response processes. There is no
specific statistical test
associated with this source of evidence.
• Methods used to gather validity evidence based on response
processes
27. 3
o Large Scale assessments
§ Analyses of individuals response to items or tasks via
interviews with
respondents
§ Studies of the similarities and differences in responses
supplied by
various subgroups of respondents
§ Studies of the ways that raters, observers, interviewers, and
judges collect
and interpret data
§ Longitudinal studies of changes in responses to items or tasks
o Local classroom assessment
§ Use different methods to measure the same learning objective.
For
example, one could use teacher observation and quiz
performance. The
closer the results of the two match the greater the validity
support.
§ If test should be assessing specific cognitive levels of
learning (e.g.,
application), make sure the items reflect this cognitive level.
For example,
essay items would be more appropriate than fill-in-the-blank
items for
measuring application of knowledge.
28. § Reflect on whether or not students could answer items on the
assessment
without really knowing the content. For example, is student
performing
well because of good test taking skills and not necessarily
because they
know the content?
§ Ask students before or after taking the assessment how they
interpret the
items to make sure it is in line with how you expected them to
interpret the
items. This procedure is called a think aloud and gives one
insight into the
thought processes of the student.
Evidence based on Internal Structure
• Question one is striving to answer when gathering validity
evidence based on internal
structure:
o To what extent are items in a particular assessment measuring
the same thing?
• If there is more than one item (ideally 6 to 8 items) on a test
measuring the same thing
and students’ individual answers to these item are highly
related/correlated, the greater
the validity support via evidence of internal structure. There
are specific statistical test
associated with this source of evidence, but these statistical
tests are typically used by
large assessment developer and not with local classroom
29. assessments.
• Methods used to gather validity evidence based on internal
structure
o Large Scale assessments
§ Factor- or cluster analytical studies
§ Analyses of item interrelationships, using item analyses
procedures (e.g.,
item difficulty, etc.)
§ Differential item function DIF studies
o Local classroom assessments
§ Include in one assessment multiple items (ideally 6 to 8)
assessing the
same thing. Don’t rely on a single item. For example, on an
assessment
4
use several items to measure the same skill, concept, principal,
or
application. Use of 6 to 8 items ensures consistency to conclude
from the
results that students do or do not understand the concept
measured by that
set of items. For example, one would expect that a student who
understands the concept would perform well on all the items
measuring
that concept and as consistency among the item results increases
so too
30. does the validity support increase. Consistency of responses
provides good
evidence for the validity of the inference that a student does or
does not
know the concept.
Evidence based on Relations to Other Variables
• Questions one is striving to answer when gathering validity
evidence based on
relationships to other variables.
o To what extent are the results of an assessment related to the
results of a different
assessment that measures the “same” thing?
o To what extent are the results of an assessment related to the
result of a different
assessment that measures a different thing?
• If observed relationships match predicted relationships, then
the evidence supports the
validity of the interpretation. There are specific statistical tests
associated with this
source of evidence, but these statistical tests are typically used
by large assessment
developers and not with local classroom assessments.
• Methods used to gather validity evidence based on
relationships to other variables
o Large Scale assessments
§ Correlational studies of
• the strengths and direction of the relationships between the
31. measure and external “criterion” variables
• the extent to which scores obtained with the measure predict
external “criterion” variables at a later date
§ Group separation studies, based on decision theory,
• That examines the extent to which a score obtained with an
instrument accurately predict outcome variables.
§ Convergent validity studies
• That examines the strength and direction of the relationships
between the measure and the other variables that the measure
should, theoretically, have high correlations with.
§ Discriminant validity studies
• That examines the strength and direction of the relationships
between the measure and the other variables that the measure
should, theoretically have low correlations with.
§ Experimental studies
• That test hypotheses about the effects of intervention on
scores
obtained with an instrument
§ Known-group comparison studies
• That test hypotheses about expected difference in average
scores
across various groups of respondents.
5
32. § Longitudinal studies
• That test hypotheses about expected differences
o Local classroom assessments
o Compare one group of students who are expected to obtain
high scores with
students of another group who are expected to obtain low
scores. One would
expect the scores to match the prior expectations (e.g., students
expected to
score low would actually score low) and the more the results
match the
expectations the greater the validity support.
o Compare scores obtained before instruction with scores
obtained after
instruction. The expectation would be that scores would
increase so increases
from pre to post would provide validity support.
o Compare two measures of the same thing and look for
similarities or
discrepancies between scores. For example, you could compare
homework
and quiz performance and the fewer discrepancies or the more
similarities, the
greater the validity support.
Evidence based on Consequences of Testing
• Questions one is striving to answer when gathering validity
evidence on consequences of
testing:
33. o What are possible intended and unintended results of having
students take the
assessment?
o Are the intended and unintended results of the assessment
used to make decisions
about the levels of student learning?
• Evidence based on consequences of testing is established by
considering both positive
and negative consequences of engaging in testing and using the
test scores for decision
making purposes. For example a consequence of high stake
testing could be the
increased use of Drill-and-practice instructional practices, and
important subjects that are
not tested are ignored.
• Methods used to gather validity evidence based on
consequences of testing
o Large Scale assessments
§ Longitudinal Studies of the extent to which expected or
anticipated
benefits of testing are realized.
§ Longitudinal Studies of the extent of which unexpected or
anticipated
negative consequences of testing occur.
o Local Classroom assessments
§ Before giving the assessment articulate the intended
consequences you
expect, such as assessment results which will help you address
through
34. instruction, areas in which students skills are lacking. Also
articulate
possible unintended consequences such as low scores on an
assessment
lowering a student’s perceived self-efficacy. Another
unintended
consequence of heavy use of objective items could be to
encourage
students to learn for recognition, whereas essay items motivate
students to
learning in a way that stresses the organization of information
§ After giving the assessment compare predicted, intended
consequences
with actual consequences.
6
Evidence based on instruction
• Questions one is striving to answer when gathering validity
evidence on consequences of
testing:
o How well does the content taught or what content the students
have the
opportunity to learn align with what is actually assessed on an
assessment?
• This is only a consideration for local classroom assessment
and not large scale
assessments
35. o Methods used to gather validity evidence based on instruction
§ The teacher should reflect on how he/she taught the
information by asking
yourself questions like…Where my instructional methods
appropriate.
Did I spend enough time on each concept? What is the match
between
what I taught and what was assessed? Have students had
adequate
opportunities to learn what I am assessing? Were the concepts
on the
assessment actually taught, and taught well enough, so that
students can
perform well and demonstrate their understanding? Answering
these
questions should help determine whether or not performance on
the
assessment is due to learning or if it is due to other factors such
as the
format of the assessment, gender, social desirability, and other
possible
influences on assessment performance besides instruction.
Journal entry assignment:
For this assignment, "think of a fictitious new test which can be
either a
large-scale assessment or a local classroom assessment. Briefly
describe in
36. your journal entry what the test is designed to measure. Next,
address how
you would go about gathering validity evidence for this test and
be as
specific as possible.
" This test could be anything such as one you might use in your
classroom
or a larger assessments and you don't have to go into a lot of
detail. Also,
address ways that the reliability could have been increased.
Below are some examples of acceptable test descriptions from
other
students: (you cannot use these, they are only example to show
you).
• "My fictitious test was developed to measure 2nd graders
understanding of capitalization
and punctuation within text."
• "I have designed a math test for third grade students to
determine if students can identify
the place value and value of each digit for six digit numerals
(Virginia SOL 3.1(a))."
I have also provided a summary of different ways to gather
validity evidence in the
(attached) word document entitled “Sources of Validity
Evidence Overview.final”
which you can use as a guide when discussing how you would
go about gathering
validity evidence.
NOTE: No word limit but it can be between one and a half
pages to two pages