アセスメント・リタラシーとは評価の基準と方法

アセスメント・リタラシーとは？
最近、教師として評価やテストについて知っておくべき
ことを「アセスメント・リタラシー」と呼び、世界的にその基準
を共有する努力が進められています。このスライドでは、
スコア偏重の数値的評価の問題点を挙げながら、さまざま
な評価の選択肢の中から、自分の授業やそれぞれの目的
・状況に適した評価方法を選ぶために必要な知識や考え
方が説明されています。
テストを作成したり、評価を実施する際に考えなければ
ならない側面が「必要なリタラシー」としてまとめられていま
す。 1

The Role of Assessment in the
Language Classroom
ELT Career & Professional
Development
May 27, 2012
Kahoko Matsumoto
Tokai University

3
Contents
1. Testing from a Japanese Perspective
3. Why Criterion-referenced Assessment
Now?
4. Increasing Importance of Performance
Assessment
2. What Language Teachers Should Know
(Assessment Literacy)
5. What Language Teachers Should Know
(Revisited)

1. Testing from a Japanese
Perspective
<The changes in the view on competence>
１．Divisible Competence （- 1950’s）
Audio-lingual/Grammar-translation period. Competence was
measured separately (discrete-point test).
２．Unitary Competence （1960’s – 1980’s）
Early days of communicative approach. The existence of
“general (universal) competence” was assumed.
(objective, data-based integrative tests→standardized test)
3. Multidimensional Competence (1990’s -）
More attention to performance (output) and its assessment.
There are various degrees of relationship among different
abilities. (skills-integrated tests, project/process-based
assessment, alternative assessment, etc.)

<TOEFL® Correlation Studies>
Correlation with Listening
TOEFL® Listening Section
Structure Section
Reading Section
-------------
０．６３５
０．６７２
TWE® ０．５７１
TSE® Pronunciation
Grammar
Fluency
Overall Composition
０．６０１
０．６４４
０．５８８
０．６４７
DeMauro (1992)

<General theories of competence>
 Canale & Swain (1980)
Grammatical, Discoursal, Sociolinguistic, Strategic
 Backman (1990)
Organizational(grammatical/textual) and
Pragmatic(illocutional/sociolinguistic)
*How much of these functional, real-life communicative
abilities can be measured by tests?
* How much of listening, reading, writing and speaking
abilities can be measured by grammar and vocabulary
test items?
* What do notorious “translation” items measure?

7
<The danger of quantified test
results>
7
２０３０５０８０９０６０
10 points 10 points 10 points
Do these 10 points represent the same ability difference?
➾No! Also a couple of points are always within the standard
error of measurement

8
<Validity and reliability>
Listening Ability
Items in Listening
Test A
Items in Listening
Test B
Both tests can be highly reliable,
but they measure quite different
constructs (factors) of listening.
→In classroom achievement/
diagnostic tests, validity (whether
the test measures what has been
taught) is more important!

9
2. What Language Teachers Should
Know (Assessment Literacy)
1) Taylor (2009) claims that the term assessment
literacy encompasses what language teachers
need to know about assessment matters. This
consists of the level of skill, knowledge and
understanding of assessment principles necessary
to maintain good, effective practice of teaching.
Also teachers have to make effort for this to be
shared by other test stakeholder groups including
government officials, policy planners, the media
and the general public.

10
2) Inbar-Lourie (2008) says that Language
Assessment Literacy involves additional
competencies such as language-specific
techniques in addition to general assessment
literacy skills. Her idea of the Language
Assessment Literacy (LAL) competencies
for educational purposes can be broken up into
several subsections; ‘why, what and how.’

11
 ‘Why’ refers to the rationale behind the testing. It looks to
analyze assessment culture and the concepts of validity and
ethics that go along with it. The author wishes to stress the
vital role of language assessment in making crucial decisions
in other areas of the learning spectrum.
 ‘What’ deals with the current theories regarding assessment
and the validity and reliability of the same. In relation to
language assessment, this relates to contemporary debates
on issues such as the norms of English as an International
Language.
 ‘How’ can look at test construction or development or the
role of assessment in a language curriculum.

12
<Why: purpose of a test>
1) to screen or stream students
→norm-referenced proficiency tests
(reliability>validity)
2) to make diagnosis of students’ improvements
and provide informed support
→ criterion-referenced diagnostic/achievement
tests (validity>reliability)

13
<What: goals of learning/assessment>
●Different kinds of validity:
- Content validity
- Concurrent validity
- Criterion validity
- Construct validity
- Predictive validity
●Nature of assessment
- summative vs. formative
- internal vs. external
- self-reflective assessment vs. rater assessment
*avoiding subjective/objective dichotomy

14
<How: development of a solid rubric>
1) Content/ability to cover→validity
2) Assessment criteria
3) Nature of task
4) Students’ ability range→difficulty level
5) Weighting among test items (global vs. local
abilities)

15
<How: choosing appropriate types
of test
1) test of receptive skills
- multiple-choice test
- cloze test
- essay test, etc.
2) performance test
- recitation/translation
- short response to a stimulus/question
- summary creation
- expression of an opinion
- free writing/speaking in response to a prompt
* individual vs. interactive tasks
3) integrated skills test
- controlled
- project/process-based assessment

3. Why Criterion-based
Assessment Now?
1) Established descriptive (explanatory) criteria help
increased accountability and transparency with added
validity
ex. Can-do Statements (CDSs)
= functional, qualitative statements of what learners
can really do in various communicative situations
2) Increasing use of Common European Framework of
Reference for Languages: Learning, Teaching and
Assessment=CEFR)
* verifiable by both quantitative and qualitative
measures

17
Ｒｅａｄｉｎｇ/ＷｒｉｔｉｎｇＣＤＳｓ
General Can-do List (Writing)

18
Intermediate Level Sub-list (Expository Writing )

19
Sub-list: Expository Writing Objectives
(= Assessment Criteria) for the 1st –year Intermediate level

20
Sub-list: Expository Writing （Student Self-check list)
ライティングについて
でき
ない
あまり
でき
ない
まあ
できる
できる
１明確にトピックセンテンス（主題文）、それをサポートする詳細や支
持文、結論文が区別された長いパラグラフ（２００語以上）が書ける
○ ○ ○ ○
2 thesis statement（主題）に関連した十分な量の知識や事実を、読み
手が興味を持つように組み込んで書ける。
○ ○ ○ ○
3 自分のアイディア（内容）を段階的に、一般的なものから具体的なも
のへと発展（展開）するように書ける。
○ ○ ○ ○
4 文と文をギャップがないようにつなぎ、文章全体についても一貫した
考え（主題）で書ける。
○ ○ ○ ○
5 パラグラフレベルにおいて、thesis（主題）をきちんとサポートする
適切な支持文や例を使うことができる。
○ ○ ○ ○
6 トピックやその文章の状況に合う適切な表現を使うことができ、読み
手に十分意図を伝えることができる。
○ ○ ○ ○
7 簡単な従属節や関係代名詞を含む文（＝複数の節をもつ文）を書くこ
とができる。
○ ○ ○ ○
8 典型的な接続詞やつなぎ言葉（＝高校で習う程度）を正確に使うこと
ができる。
○ ○ ○ ○
9 一般的によく使われる（＝頻度の高い）単語やイディオムをたいてい
の場合使うことができる。
○ ○ ○ ○
10 句読点（＝パンクチュエーションだけでなく、大文字の使用、段落の
最初を下げることなども含む）については、ほぼルールに沿って正確
に使える。
○ ○ ○ ○

21
<Statistical vs. qualitative>
- Validity is sometimes
questionable.
- Test scores don’t always
represent real-life
abilities.
- The consistency among
objectives, teaching and
assessment is secured.
- Sometimes hard to verify
the results.
Statistical Approach
(IRT, etc.)
Qualitative
Approach
* We can be benefitted from both approaches, if we know
which to use for different types of assessment.

FINAL ITEM PARAMETER ESTIMATES
Sample output of IRT software
Ｉｔｅｍ Lnk Flg a b c Resid PC PBs PBt N
1 0.63 -0.66 0.24 0.91 0.73 0.35 0.37 1424
2 0.51 -0.89 0.26 1.19 0.76 0.26 0.27 1424
3 0.82 0.13 0.24 0.29 0.6 0.43 0.43 1424
4 1.01 0 0.23 0.55 0.62 0.5 0.51 1424
5 0.98 -0.09 0.23 0.24 0.64 0.49 0.49 1424
6 1.04 0.14 0.22 0.83 0.58 0.51 0.51 1424
7 1.1 0.46 0.2 0.73 0.5 0.51 0.51 1424
8 0.98 0.89 0.23 0.7 0.44 0.4 0.39 1424
9 1.08 0.13 0.21 0.8 0.58 0.54 0.54 1424
10 K 0.97 2.73 0.26 0.72 0.29 0.09 0.06 1424
11 1.01 -0.67 0.23 0.61 0.76 0.49 0.52 1424
12 0.89 -0.6 0.23 0.64 0.74 0.47 0.49 1424
13 0.98 -0.82 0.23 0.37 0.78 0.46 0.5 1424
14 0.86 -0.58 0.23 0.48 0.73 0.45 0.48 1424
15 0.72 0.32 0.24 0.61 0.57 0.39 0.38 1424
16 1.09 0.71 0.24 0.95 0.48 0.44 0.42 1424
17 0.87 1.43 0.27 0.55 0.4 0.27 0.25 1424
18 0.78 1.09 0.24 0.45 0.44 0.33 0.31 1424
19 0.7 0.18 0.22 0.9 0.58 0.4 0.4 1424
20 1.09 1.11 0.23 0.67 0.4 0.38 0.36 1424
21 0.77 1.03 0.25 0.6 0.45 0.33 0.31 1424
22 0.77 0.29 0.23 0.5 0.57 0.42 0.42 1424
23 0.99 2 0.26 0.83 0.33 0.17 0.13 1424
24 0.74 0.61 0.23 0.96 0.51 0.38 0.38 1424
25 0.67 0.6 0.24 0.98 0.53 0.35 0.34 1424

23
Neural Test Theory
(Shojima, 2007)

4. Increased Importance of
Performance Assessment
1) Increased importance placed on English as a
lingua franca and also on worldwide internet
communication by writing (Warschauer, 2000;
Warschauer & Ware, 2006)
2) Rapid globalization and increased use of
online verbal communication tools (video-
conferencing, free learning websites,
webinars, etc.)

25
Productive Ability
Speaking
Pronunciation
Vocabulary
Grammar
Discourse
Fluency
Writing
Vocabulary
Grammar
Structuring/Organization
Consistency
Fluency
General Communication
Skills
Logical thinking
Understanding/Judgment
Critical thinking
Expressiveness
Persuasion
Capable tasks
Telephone response
Face-to-face transaction
Presentations
Discussion at meetings
Negotiation
Professional documents
(manuals, contracts,
reports, presentation
materials, etc.)
Complicated business letter
Daily e-mail
Routine documents
(invoice, export/import
documents, etc.)
Simple business letter
Are we aiming at BICS or CALP
(Cummins, 1979)?

<problems with productive
assessment>
1) practicality (when and how?)
2) inter-rater reliability
(norming and benchmarking)
3) rater training

27
5. What Language Teachers Should
Know (Revisited)
1) Taylor (2009) claims that the term assessment
literacy encompasses what language teachers
need to know about assessment matters. This
consists of the level of skill, knowledge and
understanding of assessment principles necessary
to maintain good, effective practice of teaching.
Also teachers have to make effort for this to be
shared by other test stakeholder groups including
government officials, policy planners, the media
and the general public.

28
2) Inbar-Lourie (2008) says that Language
Assessment Literacy involves additional
competencies such as language-specific
techniques in addition to general assessment
literacy skills. Her idea of the Language
Assessment Literacy (LAL) competencies
for educational purposes can be broken up into
several subsections; ‘why, what and how.’

29
 ‘Why’ refers to the rationale behind the testing. It looks to
analyze assessment culture and the concepts of validity and
ethics that go along with it. The author wishes to stress the
vital role of language assessment in making crucial decisions
in other areas of the learning spectrum.
 ‘What’ deals with the current theories regarding assessment
and the validity and reliability of the same. In relation to
language assessment, this relates to contemporary debates
on issues such as the norms of English as an International
Language.
 ‘How’ can look at test construction or development or the
role of assessment in a language curriculum.

Reference
Backman, L.F. (1990). Fundamental Considerations in Language Testing.
Oxford: Oxford University Press.
Canale, M and M. Swain. (1980). Theoretical bases of communicative approaches to
second language teaching and testing. Applied Linguistics 1, 1-47.
Cummins, J. (1979) Cognitive/academic language proficiency, linguistic
interdependence, the optimum age question and some other matters. Working
Papers on Bilingualism, No. 19, 121-129.
Demauro, G.(1992). Examination of the relationships among TSE, TWE, and TOEFL
scores. Language Testing, 9(2), 149-161.
Inbar-Lourie. O. (2008). Constructing a language assessment knowledge base:
A focus on language assessment courses. Language Testing, 25(3), 385-402.
Matsumoto, K. (2011). Studies on the Correlations of Listening Ability and
Productive Abilities. In JACET Testing SIG (eds.) Studies on L2 Listening:
Teaching and Assessing. 69-84.
Shojima, K. (2007) Japanese Journal for Research on Testing, 3, 161-178. (in
Japanese with English abstract). Scholastic achievement structure of National
Center Test 2006 by self-organizing map.
Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied
Linguistics. 29, 21-36.
Warschauer, M. (2000). The changing global economy and the future of the
English teaching, TESOL Quarterly, 32, 511-535.
Warschauer, M. , & Ware, P. (2006). Automated writing evaluation: Defining the
classroom research agenda, Language Teaching Research, 10(2), 219-233. 30

Thank you for listening!
For questions and comments:
mkahoko@tsc.u-tokai.ac.jp
31

アセスメント・リタラシーとは評価の基準と方法

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to アセスメント・リタラシーとは評価の基準と方法

Similar to アセスメント・リタラシーとは評価の基準と方法 (20)

More from englishteacherotasuketoukai

More from englishteacherotasuketoukai (15)

Recently uploaded

Recently uploaded (20)

アセスメント・リタラシーとは評価の基準と方法