Annotated Bibliography


Published on

Annotated bibliography to accompany presentation outline.

  • Be the first to comment

Annotated Bibliography

  1. 1. Annotated Bibliography The Missing Link: Student Learning Outcomes and Language Proficiency Assessment TESOL 2010 Boston, March 26 2010 Kevin B. Staff Alvarez, I. (1987). A rationale for discrete-point proficiency/placement testing in the Southwestern College bilingual office administration program. Unpublished master’s thesis, SDSU. The first of several master’s theses by SDSU students cited. After a review of the literature, featuring the work of Henning, Oller, and Spolsky, Alvarez demonstrates that a discrete-point multiple choice test can be an adequate assessment instrument in lieu of the time and labor intensive process of obtaining and evaluating writing samples for placement purposes in one particular program. ASCCC (2009). Coding the student progress pathway through basic skills English, ESL, mathematics and reading courses in California community colleges. Sacramento: Academic Senate for California Community Colleges. As part of the Basic Skills Initiative for California Community Colleges, a set of rubrics have been developed and discussed in committees and one-day conferences among teachers of ESL, English, mathematics, and reading. These serve as metrics describing a standardized set of expected outcomes for basic skills courses that can be used to determine equivalencies across the various campuses of the California Community College system. In the case of ESL, both a credit and non-credit rubric—with many similarities to each other—are now in place to describe six levels and outcomes for their corresponding courses that will bring a student’s language proficiency up to transfer level for “freshman English.” Ashwell, T. (2000). Patterns of teacher response to student writing in a multiple-draft composition classroom: Is content feedback followed by form feedback the best method? Journal of second language writing 9.3.227-257. Ashwell finds that students seem to rely more on form feedback, i.e. error correction, than on content feedback. No significant differences were found when one form of feedback was provided before the other. Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press. Probably the ultimate introductory textbook to testing methods. Bachman covers the uses to which language tests might be put, a model of communicative language ability that builds on the well-known Canale & Swain competencies, item and task selection, statistical methods, validity and reliability, and “persistent problems” that—nearly twenty years after the book’s publication— are still just as persistent. Ballard, B. & Clanchy, J. (1991). Assessment by misconception: Cultural influences and intellectual traditions. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (Pp. 19-35). Norwood, NJ: Ablex Publishing Corp.
  2. 2. An interesting analysis of the problems of second language writing in academic contexts in terms of three factors: (1) Language itself, (2) The structuring of ideas, or rhetoric, (3) Attitudes toward knowledge, or epistemologies. The authors speculate that the latter can be divided into three kinds of approaches to knowledge: (1) Reproductive, strongly identified with education in Asian cultures, (2) Analytical, i.e. critical thinking, (3) Speculative. The latter is strongly emphasized in education in Australia, though Asian students tend initially to find the approach, with its deliberate searching for new possibilities, pointlessly argumentative. Blumner, J. (1999). Authority and initiation: Preparing students for discipline-specific language conventions. In W. Barnett & J. Blumner (Ed.’s), Writing centers and writing across the curriculum programs (Pp. 33-44). Westport, CT: Greenwood Press. A discussion of WAC (Writing Across the Curriculum) programs, which seek to teach students how to produce “appropriate discourse”. The authors conclude that much of the knowledge necessary to write in specific disciplines comes from reading and in fact requires knowledge of content rather than simply language itself. These kinds of advanced writing skills are much more important for graduate students than for undergraduates, who are required primarily to “relay information rather than create knowledge”, though undergraduate study would seem an excellent time to raise awareness of discipline-specific conventions. Brown, J.D. & Hudson, T. (1998). The alternatives in language assessment. TESOL Quarterly 32.4.653-675. A short, concise identification of the assessment options available to language teachers and programs. The article points out the significance of “washback”, the positive effect assessment can have on program objectives and instruction. The authors also point out the importance of using a variety of measures in assessment. California Community Colleges Chancellor’s Office (2000). California pathways: The second language student in public high schools, colleges, and universities. Glendale: CATESOL Publications. A landmark document that has become influential in the formulation of educational policy, particularly at the community college level. It includes language proficiency descriptors for the four skills, based on the ACTFL scale. The latter has become the reference guide for the CB-21 coding of ESL courses, a common description of equivalencies for courses below the “freshman English” level. Also includes one of the earliest discussions of the “Generation 1.5” phenomenon. Carlson, S. (1991). Program evaluation procedures: Reporting the program publicly within the political context. In Hamp-Lyons (Pp. 293-320). The first of several articles on the political and public policy aspects of language proficiency assessment. The author prescribes several considerations to limit controversy and maximize the perceived fairness of evaluation: (1) Assessment instruments that test writing ability in specific genres and types of writing, (2) Advance notice and preparation in these genres and in the types of tasks that will form the basis of evaluation. Carlson advises that in some cases, “teaching to the test” is not necessarily a bad thing. Recurring questions that users of writing assessment instruments periodically have to address include (1) How can a student who receives good grades fail the writing test? (2) Why might there be a discrepancy between a writing test and a writing sample from another situation? (3) Why do readers of an assessment instrument need training when they ought to be able to recognize “good writing?” (4) How can in-class timed
  3. 3. writings be reliable and valid instruments if scores assigned by readers are discrepant? (5) How can papers contain errors and still receive high scores? (6) Do superficial characteristics in the writing unduly influence scores? (7) Why is a “top paper” in one program not so in another? CASAS (2003). CASAS skill level descriptors for ESL & ABE. San Diego: CASAS. CASAS is a non-profit organization that provides a comprehensive evaluation system, also helpful in the development of instruction. It is used extensively in non-credit adult education ESL programs. The two CASAS documents, ESL and Adult Basic Education (ABE), show the influence of the ACTFL scale, and incorporate skills from SCANS (Secretary’s Commission on Acquiring Needed Skills). The reading/writing descriptors emphasize skills needed to function in everyday life, with little reference to academic skills other than the very general “can read and interpret most non-simplified materials.” This approach to written language demonstrates a very early “split” between basic life skills that involve reading/writing and the needs of students in secondary or post-secondary academic programs. Cummin, A. et al (2001). Scoring TOEFL essays and TOEFL 2000 prototype writing tasks: An investigation into raters’ decision making and development of a preliminary analytic framework. Princeton: Educational Testing Service. The full version of an article that appeared in the Fall 2009 issue of the TESOL Quarterly. The authors identify a total of 29 strategies and decision-making behaviors employed by 10 experienced ESL/EFL instructors assessing 60 TOEFL essays. The behaviors were categorized under the three macro-considerations of: Self-monitoring focus, task fulfillment (rhetorical and ideational) focus, and language focus. Under each macro-consideration, the strategies were further categorized as either interpretation strategies or judgment strategies. While the list of behaviors itself is probably too large to be digestible by the average person trying to assess a given piece of writing, it provides the strongest descriptive framework of what goes on in a rater’s mind of any reference in this bibliography. The full report is available at: d21af5e44df4010VgnVCM10000022f95190RCRD&vgnextchannel=d35ed898c84f4010VgnVCM1 0000022f95190RCRD Cuyamaca College (2009). SLO assessment plan, 2009-2014. El Cajon: Department of Communication Arts. Documentation of the English as a Second Language section of the college’s English Department, showing progress on the development of SLOs and a timeline for assessing them. Damrau, A. & Price-Machado, D. (1998). Integrating SCANS skills in the ESL classroom. Workshop presented at Palomar College, 2/27/98. A demonstration that I attended many years ago, when SCANS was the “hot topic”. The Secretary’s Commission on Acquiring Needed Skills developed a list of skills that could be incorporated into most educational settings, regardless of discipline. In an adult education context, the foundation skills are identified as (1) Basic skills such as reading, writing, and quantitative operations, (2) Thinking skills such as making decisions and reasoning, (3) Personal qualities such as responsibility and honesty, (4) Resource management, which includes allocating time, money, and resources, (5) Interpersonal skills such as working in teams and in a culturally diverse setting, (6) Information management, which includes acquiring facts and interpreting information, (7) Systems management, which includes understanding of social organization and technological systems, (8) Technology, i.e. using computers for simple tasks.
  4. 4. Donigan, L. (2009). Community college rap session: CB-21 codes and ESL rubrics. Collaborative workshop at CATESOL ’09, Pasadena. A very ambitious yet successful session where a large group of community college ESL instructors evaluated and chose to adapt the proficiency scale from the California Pathways documentation as a set of rubrics for CB-21 coding, the descriptors of ESL course equivalencies below the “freshman English” level. Elbow, P. (1996). Writing assessment: Do it better; do it less. In E. White et al (Ed.), Assessment of writing: Politics, policies, practices (Pp. 120-134). New York: The Modern Language Association of America. The author argues—in his inimitable style—that portfolio assessment is the only fair and professional way to evaluate student writing, citing 19 articles and studies critical of holistic assessment. He refers to a holistic score as “nothing but a single point on a yea-boo applause meter.” In the end, however, he acknowledges that a limited amount of holistic scoring in a timed test situation may be needed, though he much prefers multiple trait scoring where practicable. Ferris, D. & Hedgecock, J. (1998). Teaching ESL composition. Mahwah, NJ: Lawrence Erlbaum Associates. The authors provide a persuasive argument that reading proficiency is a good—but not perfect— indicator of writing ability. Reading consists largely of constructing meaning through schemata, i.e. using knowledge to build knowledge. Writing is an improvable skill, best learned by doing. Ferris & Hedgecock also devote more attention than most authors in this bibliography to the problem of “authenticity” in portfolio assessment, though on the balance they feel that a portfolio approach provides a good learning experience in “process”. Forstrom, J. (2009). Assessing English literacy civics. CATESOL News 40.3.1-5. This article provides a good overview of the federally funded grant that connects non-credit ESL classroom-based learning with student success in the community. In California, evaluation is conducted through pre- and post-CASAS testing as well as EL Civics assessments developed locally. There is some reference and use of the U.S. Department of Education’s SCANS (Secretary’s Commission on Acquiring Needed Skills) in describing desired outcomes, and though EL Civics probably comes as close as anything in an American educational context to a standardized national curriculum, students can be surveyed for their interests, with lessons and assessments developed around the needs of specific educational contexts. This is particularly so when EL Civics is used in conjunction with CBET classes. The focus of instruction in ESL Civics is distinctly adult education for practical purposes rather than for acquiring academic skills, and provides an interesting alternate view of what it means to “know a language.” Forstrom, J. et al (2009). Teaching writing across the levels : Pre-assessment, implementation, and evaluation. Workshop presented at CATESOL ’09, Pasadena. A very practical and concise approach to getting a handle on teaching and evaluating writing at various levels, including the selection of tasks as learning experiences and evaluation instruments. Though not identified as such in the materials, some of the writing tasks come from the EL Civics curriculum, and entail practicing real-life writing tasks such as reporting an accident.
  5. 5. Gearhart, M. (1994). Toward the instructional utility of large-scale writing assessment: Validation of a new narrative rubric. Project 3.1. Studies in improving classroom and local assessments. Portfolio assessment: Reliability of teachers’ judgments. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing. A report on the Writing What You Read (WWYR) rubric developed for assessing the writing of elementary school students. An interesting contrast to the issues of assessing adult ESL learners’ abilities, with good discussion of rubrics in general for the assessment of writing, their purposes and shortcomings, and their three main types: Holistic, primary trait, and analytical. The WWYR is analytical, and used to rate the quality of narrative writing specifically. Its categories are: (1) Theme, (2) Character, (3) Setting, (5) Plot, and (5) Communication. The mechanics of punctuation, grammatical accuracy and such are not addressed in the WWYR. Greenberg, Ingrid (1993). Building on the past, looking toward the future: An ESL teacher reference for writing instruction in adult education. Unpublished master’s thesis, SDSU. Another gem among the master’s theses in the SDSU Library. Excellent literature review and summary of recurring issues and insights, with a discussion of why writing skill has been so often de-emphasized in adult education ESL. Greenberg advocates the “process-based” approach, with free expression followed by revisions. Lots of good information, but not really applicable to the kind of “classroom genres”, such as responding to essay prompts, that students encounter in an academic context. One interesting insight is the fact that writing instructors often encourage free expression, then grade primarily on surface-level features. Hamp-Lyons, L. (1996). The challenges of second-language writing assessment. In White et al (Pp. 226-240). Hamp-Lyons, editor of the single most useful book in this bibliography, Assessing second language writing in academic contexts, contributes a chapter to White’s Assessment of writing: Politics, policies, practices. She cites studies showing that university faculty are in general more tolerant of errors in writing by nonnative speakers of English than of natives, and also points out that rhetorical styles are a strong influence on the judgment of writing quality. This means that an instructor used to working with Japanese students might become more tolerant of errors and unconventional usages common to Japanese students than they would be toward nonnative students from other language backgrounds. The article includes a discussion of TOEFL scores and the TWE (Test of Written English) portion of the TOEFL, which is not always considered in the admissions process. Hamp-Lyons, L. & Henning, G. (1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning 41.3.337-373. The article features a rubric called the New Profile Scale (NPS) used to assess 91 essays written for the Test of Written English section of the TOEFL and 79 essays written for the University of Michigan Writing Assessment. The authors found it to be reliable in composite assessment, but also found little psychometric support for assessing certain individual components of the rubric. There is some discussion of “unidimensionality”, the assumption that a composite profile operationally defines a single latent continuum of ability. The rubric evolved from Hamp-Lyons’s work with the British Council, and has nine bands or levels. The seven components were based on observations by readers rather than on an underlying linguistic theory. They are
  6. 6. (1) Communicative Quality, (2) Interestingness, (3) Referencing, (4) Organization, (5) Argumentation, (6) Linguistic Accuracy, and (7) Linguistic Appropriacy. Higgs, T. (1984). Language teaching and the quest for the holy grail. In T. Higgs (Ed.), Teaching for proficiency, the organizing principle (Pp. 1-9). Lincolnwood, IL: National Textbook Co. A classic article by the late Ted Higgs, building on the previously published The push toward communication with Ray Clifford while the latter was dean of the Defense Language Institute. Provides a description of the ACTFL and ILR (Interagency Language Roundtable) scales and the kinds of generalized behaviors exhibited at each level of proficiency. ILR Level 2+, called “Superior”, is referred to as an “instructional ceiling”, beyond which the language probably must be “lived” for proficiency to continue to improve. Probably most applicable to oral language, but provides an excellent introduction to the nature of these important general proficiency scales. Hirsch, E.D. (2010). Creating a curriculum for the American people. American Educator 33.4.6-13. A well-written critique of the progressive movement, or “anti-curriculum movement” that took hold th in public secondary education in the latter half of the 20 century. The author argues that shared knowledge is essential to language comprehension as well as sense of community, and laments the emphasis of the movement on critical thinking skills rather than facts. For the author, resisting a rigorous academic curriculum in favor of encouraging children to develop their skills using whatever content they find engaging is contrary to a large body of cognitive science research, and has resulted in a reduction in shared knowledge among the populace and a surprising ignorance of what several generations ago would have been regarded as common knowledge. His proposals for implementing a “common core curriculum” are not unlike the description of EL Civics administration in Forstrom (2009), allowing for local autonomy and a variety of forms of instruction while providing a guiding structure and central core elements common to all citizens. Horowitz, D. (1991). ESL writing assessments: Contradictions and resolutions. In Hamp-Lyons (Pp. 71-85). My favorite article by a late acquaintance who passed long before his time was due. Horowitz asks whether a common core of academic writing ability might exist, when writing tasks vary greatly both by discipline and by genre. He poses the laugh-out-loud rhetorical question of whether any writing test can claim validity unless it is written for a particular individual in a particular course in a particular program at a particular time. Inherent contradictions include the tendency of test designers to seek generality, i.e. trying to mitigate differences in background knowledge, while the designers of academic tasks seek specificity, i.e. trying to find evidence of mastery of a body of knowledge. By way of solutions, Horowitz argues that both timed essay exams and out of class writings with editing and revision are needed for assessment, and cites the TOEFL’s TWE section as an admirable attempt to provide two generalized writing tasks that minimize cultural and knowledge bias. Hyland, K. & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly 41.2.235-253. The short answer to the question they pose is “No.” The authors cite corpus research of the widely used AWL (Academic Word List) to demonstrate that lexical items occur and behave differently across disciplines. Well… they do occur, don’t they?! The article also seems to carry a
  7. 7. bit of knowing smugness at the fact that systematic analysis of text using modern computational methods often turns widely held presumptions about language behavior on its head. James, M. (2009). “Far” transfer of learning outcomes from an ESL writing course: Can the gap be bridged? Journal of second language writing 18.32.69-84. In this study, 30 advanced ESL undergraduates enrolled in a one-semester academic writing class were interviewed on their use of learning outcomes from the class in performing a writing task on a science article they had read. It was found that over half of the students did not purposely or consciously make use of the learning outcomes, and the author poses the question of how transfer can be achieved most effectively. Perhaps a better question would be whether specific outcomes/strategies need to be purposely employed. Jeffrey, J. (2009). Constructs of writing proficiency in U.S. state and national writing assessments: Exploring variability. Assessing writing 14.1.3-24. A very comprehensive analysis of prompt/genre demands and assessment scoring criteria in the nationally administered ACT, SAT, and NAEP exams as well as 41 state writing exams for secondary school students. The prompts for the state exams were categorized by genre, with the number of states employing each genre in at least one of the writing tasks for their exams, as: Persuasive (24), Argumentative (18), Narrative (10), Explanatory (10), Informative (3), and Analytic (3). Wisconsin is the only state that provides information on the theoretical underpinnings of the tasks and assessment criteria. The writing tasks on the nationally administered exams are described as reflecting greater consciousness of genre and more coherent conceptualizations of proficiency than are nearly all of the state exams. Jeffries, M. & Youngjoo, Y. (2008). Relationship between spoken and written discourse of a generation 1.5 student in a college ESL composition class. The CATESOL Journal 20.1.65-81. This is a case study of a German speaking “Generation 1.5” ESL student in a college composition class. Like many Generation 1.5ers, the student writes as she speaks and has difficulty producing appropriate academic discourse. Explicit instruction was found to be partially effective. The authors identify three categories of revision suggestions used to guide the student to producing more appropriate discourse: (1) Sentence-combining revisions, (2) Use of formal rather than informal language, and (3) Additions of connectors and explanations, due to the nature of writing as a medium with greater “distance” between the writer and the intended audience. It is cautioned that these types of errors or inappropriate usages are not unique to ESL students. “Focused reading” is recommended as a means of explicit instruction, though in my own experience it is one of many partially effective techniques that some students “get” while others don’t. Johns, A. (1991). Faculty assessment of ESL student literacy skills: Implications for writing assessment. In Hamp-Lyons (Pp. 167-179). Here Ann wrestles with the problems of how to instill academic literacy in a group of student writers who seem to lack not only language skills but the background knowledge needed to succeed in an undergraduate political science class. She describes activities to provide students with a “sense of audience”, having them answer questions about the intended audience for a given piece of writing, the prospective readers’ academic background, biases, and knowledge of the world. She shares Horowitz’s concern about how to construct generalized writing tasks useful for academic writing practice, and proposes two genres with wide applicability: (1) Argumentation, taking the form of claim/warrant/data, and (2) Problem/solution.
  8. 8. Johns, A. (1995). Teaching classroom and authentic genres: Initiating students into academic cultures and discourses. In D. Belcher & G. Braine (Ed.’s), Academic writing in a second language: Essays on research and pedagogy (Pp. 277-291). Norwood, NJ: Ablex Publishing Corp. The author provides a brief history of recent trends in the teaching of writing, including the “process movement”, manifested in two distinct approaches: (1) Expressivism, or free writing as a means of eliciting a quantity of output, and (2) Cognitivism, based on pre-planning and thoughtful revision. The approach entailed no conscious awareness-raising of genre, and the author perceives a need to go beyond such an approach even at the undergraduate level, through the introduction of “classroom genres” that don’t necessarily resemble real-world writing tasks but nonetheless provide an introduction to genre awareness. An ATP (Academic Task Portfolio) is proposed, consisting of five types of tasks: (1) Data-driven writing, based on an interview with a subject matter expert, (2) Library assignment, where students synthesize insights from various sources, (3) Abstract writing, the summary of an article, (4) An extended essay, written out of class with revisions, (5) An in-class writing, as response to an exam prompt. Kawaguchi, L. (2009). What does proficiency look like on the ACCJC rubric? Rostrum, September 2009, 6-7. A good institutional overview of the development of SLOs and their importance in the accreditation process. ACCJC is the Accrediting Commission for Community and Junior Colleges, one of three commissions under the larger entity of Western Association of Schools and Colleges (WASC). ASCCJC is responsible for the accrediting of associate degree granting institutions in California, Hawaii, and the former Pacific Trust Territories. While the federal government’s Department of Education has an interest in the development of SLOs in higher education, there is a lot of local autonomy and very little “enforcement” other than the authority of non-governmental commissions such as WASC to bestow or withhold accreditation based on an educational institution’s progress in developing and assessing course level, department level, and degree level SLOs, and eventually reaching the goal of “Sustainable Continuous Quality Improvement.” Kermane, B. (2009). The broken window syndrome: Bad spelling, poor grammar? No problem! Questions on evaluating student writing. Paper presented at CATESOL ’09, Pasadena. Less a research project than a discussion session early one morning at the conference, a small group of attendees compared notes on the challenges of teaching Generation 1.5 students and the pressure to show measurable results with a student population that often just doesn’t seem to “get it.” The question of portfolio vs. in-class timed writing was re-visited, with consensus that the latter is probably a more accurate measure of actual proficiency. The first, however, provides opportunities for learning experiences that might lead to improvements in overall language proficiency and the production of appropriate academic writing. The problem is with assigning a meaningful grade to such projects. Kovach, C. (1992). Understanding essay prompts: Suggestions for teaching English for academic purposes. Unpublished master’s thesis, SDSU. In this third SDSU master’s thesis the author, an instructor at San Diego City College, explores in detail the problem of developing appropriate essay prompts for in-class timed writings in content- area classes. Second language students often have trouble with this particular “classroom genre”, particularly when content-area instructors scrutinize closely spelling and grammatical errors. Lack
  9. 9. of a “sense of audience” is a recurring theme, particularly for students schooled in the process approach. Major stumbling blocks in the essay prompts include the use of metaphor and idiomatic expressions unfamiliar to many second language students, linguistically complicated sentences or the use of more than one sentence, and the use of vague instructional verbs. Larson, J. & Jones, R. (1984). Proficiency testing for the other language modalities. In Higgs (Pp. 113-138). Most notable for its dearth of advice on the testing of writing proficiency, the authors begin by drawing a distinction between communicative competence and accuracy of usage, suggesting that the latter is a more appropriate definition for most contexts that entail daily interaction with native speakers of the language. The high intercorrelation of test components among large test populations provides strong evidence for the interrelationship of the four skills, and Oller’s “unitary factor hypothesis” is briefly resurrected. The discussion of writing skill begins by stating that “there is a much greater difference in ability among both first- and second-language users in writing than in any of the other modalities.” Five general types of writing tasks are identified: (1) Correspondence, (2) Providing essential information, (3) Completing forms, (4) Taking notes, and (5) Formal papers. The latter, obviously, is the most difficult and the most diverse across genres and disciplines. Larson & Jones suggest that writing, like speaking in the OPI (Oral Proficiency Interview) be tested directly and evaluated according to a proficiency description. Liesberg, H. (1999). A comparative analysis of English placement tests: Computer adaptive vs. traditional methods. Unpublished master’s thesis, SDSU. In a study similar to that of Alvarez, the author concludes that the LOEP (Levels of English Proficiency) test, a computer adaptive instrument that adjusts item difficulty to student responses, is an adequate assessment instrument for placement purposes in lieu of eliciting and evaluating writing samples. The study was conducted at Grossmont College. Liskin-Gasparro, J. (1984). The ACTFL proficiency guidelines: A historical perspective. In Higgs (Pp. 11-42). The author traces the evolution of the guidelines from their intial development in the 1950s at the U.S. Foreign Service Institute. Some earlier history of teaching and proficiency assessment in government language-teaching programs, including the roots of the audiolingual movement, are also covered, traceable to a pre-WWII intensive language project developed by the ACLS (American Council of Learned Societies) on a Rockefeller Foundation grant. Since 1968, the government’s version of the general proficiency scale has been known as the ILR (Interagency Language Roundtable) definitions. The ACTFL guidelines are the result of a U.S. Department of Education study entitled “A Design for Measuring and Communicating Foreign Language Proficiency.” They are intended as an organizing principle, around which various methods, approaches, materials, and curricula might be reconciled. Lutz, W. (1996). Legal issues in the practice and politics of assessment in writing. In White et al (Pp. 33-44). The author addresses the important issue of legal implications in the use of assessment instruments. While courts have shown a self-imposed restraint on second-guessing professional educators in the public sector, challenge is possible under two main bases: (1) Title VI of the Civil th Rights Act, and (2) The Equal Protection and the Due Process clauses of the 14 amendment of the U.S. Constitution.
  10. 10. Macken-Horaik, M. (2002). Something to shoot for: A systemic functional approach to teaching genre in secondary school science. In A. Johns (ed.) Genre in the classroom: Multiple perspectives (Pp. 17-42). Mahwah, NJ: Lawrence Erlbaum Associates. Eight key genres used in the teaching of writing across the curriculum in Australia are identified, using a systemic functional linguistics approach. This differs from the English for specific purposes (ESP) approach in its concern with “elemental genres in society” rather than with discourse communities. The key genres are categorized as: (1) Recount, (2) Informational report, (3) Explanation, (4) Exposition, (5) Discussion, (6) Procedure, (7) Narrative, (8) News story. McDonald, M. (2002). Systematic assessment of learning outcomes: Developing multiple-choice exams. Sudbury, MA: Jones and Bartlett Publishers. The author demonstrates the usefulness of multiple-choice exams for measuring learning outcomes for the training of nurses. The exams measure acquisition of very specific information with clear right/wrong answers. The contrast with language training is clear, and the inadequacy of such exams for ESL purposes, especially if used alone, becomes apparent, The author draws an interesting distinction between formative and summative evaluation, i.e. how the student is progressing vs. what the student knows. Mowry, M. (1996). Thirty years of first and second language composition theory and its relevance in the contemporary composition classroom. Unpublished master’s thesis, SDSU. The final master’s thesis cited here, the author provides a rich review of the rise and fall and resurrection of various approaches, with the interesting perspective of an English major rather than an ESL or applied linguistics major. Good discussion of the relationship of L1 to L2 writing theories. The author advises students to be aware that “school, work, and community are different domains of literacy.” Nam, M. et al (2008). Writing socialization for South Korean graduate students in a North American academic context. The CATESOL Journal 20.1.49-64. A non-empirical review of literature and studies concerning the difficulty of teaching appropriate academic writing to students who lack a background in performing academic writing tasks even in their native culture. Uses a contrastive rhetoric approach to explain some difficulties, such as the tendency of a thesis statement to appear at the end of an article in Korean writing. Includes several hardly surprising insights, such as that language socialization (LS) and legitimate peripheral participation (LPP)—i.e., acquiring a sense of appropriateness from observation—is key to socialization into the target academic community. North, Brian (2000). The development of a common framework scale of language proficiency. New York: Peter Lang Publishing. A reference I’d have missed if it weren’t sitting next to my master’s thesis in the SDSU Library. Describes early work on developing the 6-level Common European Framework of Reference for Languages through the auspices of the Council of Europe. More updated information and applications to specific languages, including English through Cambridge ESOL, is available online. The link on my presentation outline goes to the latter. A more general description of the framework is available at:
  11. 11. Palomar College (2007). Course outlines for ESL levels 1-6. San Marcos: Palomar College ESL Department. The internal departmental documentation for each level in the college’s ESL program follows a format similar to that of other institutions, specifying (1) The catalogue description, (2) Prerequisites, (3) Entrance skills, (4) Course content, i.e. skills to be addressed and developed, (5) Course objectives, (6) Method of evaluation, (7) Special materials required of the student, (8) Minimum instructional facilities, (9) Method of instruction, (10) Texts and references, and (11) Exit skills. Numbers 5 & 6 are in the process of being subsumed under the category of Student Learning Outcomes… So, where do Exit Skills (#11) fit into this new way of looking at things in terms of SLOs? Palomar College (2009). Assessment tools. Documentation from Palomar College Learning Outcomes Council summer workshop. The most salient point in the workshop is the need for “triangulation” in SLO (Student Learning Outcomes) assessment, i.e. using a variety of different tasks to assess. Several dichotomies in types of assessment data and assessment methods are defined: (1) Direct/Indirect data, or measurement of an exact value vs. evaluation of a trait (2) Qualitative/Quantitative data, or descriptive information vs. numerical/statistical values, (3) Formative/Summative assessment, or feedback for development vs. final determination, (4) Criterion-/Norm-referenced assessment, or scoring according to standards vs. ranking among individuals, (5) Embedded/Standardized assessment, or assessment that occurs within regular class activity vs. tests developed for broad public usage and data comparison. Perkins, K. (1983). On the use of composition scoring techniques, objective measures, and objective tests to evaluate ESL writing ability. TESOL Quarterly 17.4.651-71. The author identifies four main types of assessment instruments for evaluating writing ability: (1) Holistic, or a single score based on a scale or descriptive rubric, (2) Analytical, or a series of scores usually based on a rubric with several categories, (3) Primary trait scoring, where a piece of writing is evaluated for a single feature with other features not taken into consideration, and (4) Objective, i.e. a multiple choice test. Perkins feels the literature supports the conclusion that objective measures, even though they do not evaluate writing directly, work well much of the time. Pike, J. & Weldele, C. (2009). Generation 1.5 students: Diverse avenues to academic literacy. Paper presented at CATESOL ’09, Pasadena. Perhaps the best of several presentations at the conference on “Generation 1.5” students, the children of immigrants who are functionally bilingual in oral language but lack academic skills, particularly in writing. BSI (the Basic Skills Initiative) was implemented largely with these kinds of students in mind, and necessitates a heavy dependence on content area instructors to recognize the language needs of second language students and adjust instruction accordingly by providing a form of “sheltered immersion”. A laudable idea, but will content area instructors embrace it? Richards, J. (1985). The context of language teaching. Cambridge: Cambridge University Press. A collection of previously published papers by The Old Master. Particularly insightful is Chapter 3, “The secret life of methods”, in which Richards argues that broad issues of curriculum development and evaluation should take precedence over the comparison of particular
  12. 12. methodologies. Chapter 10, “The status of grammar in the language curriculum”, provides support for my view that a test of discrete-point grammatical knowledge should be a component of SLOs measurement. Though skeptical in some of his writings of the usefulness of general proficiency guidelines—at least beyond the lowest levels—his interest in outcomes is not inconsistent with the concerns of the “proficiency movement”. Ruth, L. & Murphy, S. (1988). Designing writing tasks for the assessment of writing. Norwood, NJ: Ablex Publishing Corp. The most comprehensive book in the literature, citing many psychometric studies. Contains the maxim: “If specifying form, leave content open. If specifying topic, liberate form.” The authors specify that any task should: (1) Be interesting to the writer, (2) Be interesting to the evaluator, (3) Furnish data to start the task from, (4) Be meaningful within the writer’s experience, (5) Elicit a specific response and place limits on content or form, (6) Suggest an audience, and (7) Have more than just a title as guidance. Ryan, B. (2004). Advanced composition for ESL students. Durham, NC: Carolina Academic Press. A textbook for teaching, Ryan designs projects around eight specific tasks or genres: (1) Narratives, (2) Description of processes, (3) Description of people, places, and things, (4) Comparison and contrast, (5) Evaluation, i.e. describing and comparing, (6) Problem/solution, (7) Cause and effect, (8) Research. Scott, C. (2009). Issues in the development of a descriptor framework for classroom- based teacher assessment of English as an additional language. TESOL Quarterly 43.3.530-535. This rather concise article in a special issue of the TQ concerned with classroom-based teacher assessment identifies factors that make the use of a common framework or a single scale for describing the understanding and use of language problematic. The four main issues concern: (1) Different learner groups, including children at different stages of cognitive development, learners with different levels of formal education and acculturation, and learners whose native language does not use the Roman alphabet; (2) Proper categorization of descriptors in terms of the 4 skills vs. the genre/field/tenor/mode categories of the systemic functional approach; (3) Organizing the descriptors by level while taking into account the different learner groups; and (4) The cognitive-affective dimension, meaning fatigue (or extreme hesitation) due to language overload. Song, B. & August, B. (2002). Using portfolios to assess the writing of ESL students: A powerful alternative? Journal of second language writing 11.1.49-72. A fairly recent article that I’d be remiss in not citing, but proof that there’s really nothing new under the sun. A re-visit to the arguments in favor of portfolio assessment as an alternative means of assessing writing proficiency. Stevens, D. & Levi, A. (2005). Introduction to rubrics. Sterling, VA: Stylus Publishing. A practical guide to developing and using rubrics in various disciplines. The author identifies the four components of a rubric as: (1) The task description, including a descriptive title for the task, (2) The scale, generally with four possible levels of achievement that correspond to grades A-D, (3) Dimensions, which outline the skills and knowledge involved in task accomplishment,
  13. 13. (4) Specific feedback, in the form of descriptors for each level of performance on the scale. A website to accompany the book: Tedick, D. & Matheson, M. (1995). Holistic scoring in ESL writing assessment: What does an analysis of rhetorical features reveal? In D. Belcher and G. Braine (Pp. 205-230). The authors in general dislike holistic scoring, feeling it too susceptible to cultural values in the evaluator. They identify two main criteria in the evaluation of rhetorical features: (1) Framing, or the way the writer sets the scene for the rest of the exposition, (2) Elements of task compliance, or the somewhat arbitrary way a rubric or set of evaluation criteria are analyzed by the evaluator. A writing sample with good framing might be evaluated highly even if the elements of task compliance are weak, simply because the writing gives a good first impression. Writers from some cultural backgrounds, where acceptable rhetorical style encourages minimal framing with discussion saved for the end, might be at a disadvantage in such an evaluation even though the other elements of task compliance are strong. TESOL (1999). Position statement on the acquisition of academic proficiency in English. Alexandria: Teachers of English to Speakers of Other Languages, Inc. Among TESOL’s periodic pronouncements, perhaps one of the few that has had a discernable impact on the educational community. Its main tenets are that: (1) Language acquisition is a long- term process, (2) There is a clear distinction between social language and academic language (It is implied but not stated that acquisition of “fluency” in everyday situations does not equal acquisition of appropriate language for academic success), (3) Students need to attain rigorous standards for the use of culturally appropriate English in both social settings and academic content areas, (4) Students are heterogeneous in background, with variations in learning, (5) There exist identifiable predictors of student success, most specifically including content-based instruction. Turner, C. & Upshur, J. (2002). Rating scales derived from student samples: Effects of the scale marker and the student sample on scale content and student scores. TESOL Quarterly 36.1.49-70. The authors’ main argument is that any rating scale based on general theory will not be appropriate for assessing performance on any specific task. Their study found that specific writing samples used to train for interrater reliability tend to take on a life of their own and have a greater effect on ratings than the scale itself. Vaughn, C. (1991). Holistic assessment: What goes on in the rater’s mind? In Hamp- Lyons (Pp. 111-125). Not unlike Turner and Upshur, Vaughn finds that rater fatigue can have a deleterious effect on ratings, stating that inevitably a large number of papers become “one long discourse” and come to be compared with each other rather than with the rubric or scale criteria. With this in mind, the author feels strongly that holistic scoring by “trained experts” should not replace the judgment of the classroom teacher who works on a regular basis with the students being evaluated. Again and finally, an argument that portfolio assessment should be at least a supplement to formal evaluations. Favorite quote from the article: “Holistic assessment is a lonely act.”
  14. 14. U.S. Department of Education (2006). A test of leadership: Charting the future of U.S. higher education. Jessup, MD: Education Publications Center. A report commissioned by the Secretary of Education, containing various recommendations for the improvement of higher education in the U.S. Clarion call for the student learning outcomes movement. Available online at: Xu, Y. and Lin, Y. (2009). Teacher assessment knowledge and practice: A narrative inquiry of a Chinese college EFL teacher’s experience. TESOL Quarterly 43.3.493-513. This second article from the special TQ issue on classroom-based teacher assessment is a case study of the problems of assessment/decision making when teachers’ own judgment conflicts with external demands such as social realities and power arrangements. It makes the argument that a teacher’s personal judgment is important, however much guidance is provided by rubrics or training. The conflict between organizational expectations and personal judgment is referred to as “sacred stories” vs. “secret stories.” Notable quote: “…the interactive and context-dependent nature of teacher-based assessment suggests that teachers need space and resources to develop their own interpretations and adjustments of rubrics according to their students’ learning, even though a common understanding has been considered a prerequisite for valid assessment.” Conclusions Drawn From the Bibliography ●General proficiency scales work best at the lower levels. The description of writing proficiency is more problematic than the description of oral proficiency. Nonetheless, it is possible to use a general proficiency scale as a “chassis” on which to build content-specific descriptions applicable to specific contexts. ●The traditional six level program carries a certain amount of psychological reality for placement and assessment purposes, dividing learners or candidates into distinct groups with many similarities in general proficiency. ●For the assessment of writing, portfolio vs. timed essay have long battled for supremacy as the most desirable means of determining students’ true ability. My own conclusion is that portfolios are fine for formative assessment and as learning experiences, but timed essays are better summative measures. ●Indirect measures of writing ability, such as multiple choice tests measuring discrete point grammatical knowledge, work reasonably well, especially for placement purposes. However, when practicable a direct measure—writing sample—is preferable. ●Writing effectively in different genres is acquired primarily through exposure, i.e. through reading in different genres. The specific teaching of genre awareness is more effective with some individual students than with others. ●Regardless of the amount of standardization training, quality of rubrics, or ethical and legal implications of the decisions made, the assessment of writing always has an element of subjectivity. High inter-rater reliability is often possible nonetheless. ●Whenever practicable, “triangulation” through the use of multiple assessment instruments is desirable, keeping in mind that students’ levels of oral and written proficiency may be very different.