Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Beyond instrumentation



Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Beyond instrumentation

  1. 1. Beyond instrumentation: redesigning measures and methods for evaluating the graduate college experience Patricia L. Hardré & Shannon Hackett Received: 19 December 2013 /Accepted: 21 September 2014 /Published online: 5 October 2014 # Springer Science+Business Media New York 2014 Abstract This manuscript chronicles the process and products of a redesign for evaluation of the graduate college experience (GCE) which was initiated by a univer- sity graduate college, based on its observed need to reconsider and update its measures and methods for assessing graduate students’ experiences. We examined the existing instrumentation and procedures; met with and interviewed staff and stakeholders regarding individual and organizational needs; collected systematic questionnaire data on stakeholder perceptions; and then redesigned, developed, and tested new evaluation instruments, systems, and procedures. The previously paper-based, one-time global exit questionnaire was redesigned into a digitally administered, multi-event assessment series, with content relevant to students’ incremental academic progress. Previously discrete items were expanded into psychometrically coherent variable scales in parallel forms to assess change over time (entry, mid-point, exit, post-graduation). They were also strategically designed as stable and independent enough so administrators could vary the timing and sequence of administration to fit their ongoing needs The team conducted two testing cycles, gathering pertinent information on the redesigned as- sessment and procedures (N=2,835). The final redesigned evaluation serves as an exemplar of evaluation that enhances assessment quality including psychometric prop- erties and multiple stakeholder validation, more effectively addresses the organization’s incremental evaluation needs, increases timeliness of data collection, improves reach to and participation of distributed students, and enables longitudinal data collection to provide ongoing trajectory-of-change evaluation and a research data stream. Product and process analysis informs strategies for more effectively and dynamically assessing graduate education. Keywords Graduate experience . Assessment design, development, and testing . Program evaluation . Higher education . Graduate education Educ Asse Eval Acc (2015) 27:223–251 DOI 10.1007/s11092-014-9201-6 P. L. Hardré (*) :S. Hackett Department of Educational Psychology, Jeannine Rainbolt College of Education, University of Oklahoma, 820 Van Vleet Oval, ECH 331, Norman, OK 73019-2041, USA e-mail:
  2. 2. This project involved entirely reconceptualizing, extending, and expanding a research university graduate college’s program evaluation measures and methods. The redesign process elicited original need assessment and ongoing feedback from students and faculty across the university’s graduate programs. The team tested and refined instru- mentation and system designs iteratively, based on stakeholder perspectives. The new instruments and system were designed to provide direct information needed by the Graduate College and also to provide data-driven feedback to graduate departments and programs to support their continual program improvement. This 2-year-long systematic design and development process replaced a one-page, qualitative exit questionnaire with a multi-event, systematic design; digital, online administration; and psychometri- cally sound instruments, aligned with current organizational goals and reporting needs. We went beyond instrumentation, to include redesign of the administration media, timing, and other systemic features. This manuscript will first present a review of the relevant foundational and current research and evaluation literature. Then, it will present an overview of the project methods, over phase I (needs analysis, instrument and systems redesign, and alpha testing) and phase II (iterative revision and beta testing). Following the overview, each phase of the process and instrumentation will be broken down into sequential, detailed procedures and specifications, with results of each analysis and implications leading to the next phase or to final recommendations as appropriate. It will conclude with both evaluation lessons learned and principles supported and the important contributions of this work to academic program assessment and more general evaluation research and practice. 1 Literature review More than 1.5 million people are enrolled in graduate programs in the USA each year (Gardner and Barnes 2007; Allum et al. 2012) and many times that number worldwide (Council of Graduate Schools 2012). Contrary to popular belief, many major research universities enroll more graduate students than undergraduates (US Department of Education 2005). Yet, relatively little systematic research is conducted that informs more than a very small subset of those who teach, manage, and make policy to support graduate students (Nesheim et al. 2006). 2 Studies of the graduate experience A number of studies have been done focused on various elements of the graduate college experience (GCE). Some of these studies have been localized, focused on a single discipline or program (e.g., Benishek and Chessler 2005; Coulter et al. 2004; Gardner and Barnes 2007; Hegarty 2011; Schram and Allendoerfer 2012). Other studies have focused on very specific groups, such as alumni, dropout, or non- attendees, and only addressed a few key variables such as why they chose to leave or not attend (e.g., Belcher 1996; Delaney 2004; Lipschultz and Hilt 1999). Some studies conducted internationally have combined disciplinary and institutional factors with broader cultural factors, generating deeply contextualized data to inform local needs (e.g., Kanan and Baker 2006). 224 Educ Asse Eval Acc (2015) 27:223–251
  3. 3. Others have attempted to reach more broadly but faced low return on the population sampled, raising questions about their representativeness (e.g., Davidson-Shivers et al. 2004; Farley et al. 2011). In each of these cases, different methods and instruments have been used and different construct and characteristics studied, making it difficult to compare findings. The generally discrete nature of the samples has made it difficult even to synthesize the findings in ways that inform graduate education. In many universities, each college or department devises its own measures, making comparisons even within the institution problematic. The body of research on the GCE could be more effective and productive across universities, if there was accessible, consistent, and comparable instrumentation to measure some common characteristics and goals of graduate programs and institutions. In spite of the lack of comparability across these studies, a few principles are clear, both from the collection of findings and from the more global literature on the psychologies of adult education and human experience. Major changes of context and experience, such as going to graduate school, cause people to go through identity transitions and experience dramatic change in their self-perceptions and how they understand themselves and others (Austin et al. 2009; Chism et al. 2010; Hephner LaBanc 2010), often including very strong self-doubt and anxiety (Gansemer-Topf et al. 2006; Brinkman and Hartsell-Gundy 2012). Graduate education involves redirecting cognitive attention and emotional energy in ways that can impact key relationships and cause family and emotional crisis (Baker and Lattuca 2010). Success in graduate school depends on interpersonal and social relationships, as well as on intellectual mastery (Cicognani et al. 2011). Being back in acadème after years away can be a tremendous adjustment, which is amplified when the return is to a different discipline, culture, and context, requiring substantial reacculturation and socialization (Fu 2012; Hardré et al. 2010b). 3 Need for graduate-level information and feedback Various sources cite attrition from graduate programs as high as 50 % or more (Lovitts 2001; Offstein et al. 2004). Given the life changes attributable to returning to graduate education, it is easy to understand that many students might not make those shifts easily without substantial support. Graduate education is a huge investment of time, funding, and expertise, by faculty, departments, and institutions (Stone et al. 2012; Smallwood 2004). Institutions, research units and policy-making bodies need clear, useful infor- mation about graduate education (Gansemer-Topf et al. 2006). Much research and scholarly attention on the graduate experience has been focused on academic abilities and aptitudes (Golde 2000), and success has been largely attributed to academic preparation (Fu 2012). Popular measures of these characteristics include (1) standardized tests (such as the graduate record examination (GRE), required by most graduate programs nationally) and (2) grade point averages (GPAs) from previous and current coursework. These measures are easy because they are simple, quantified, and standardized, and thus comparable and generalizable. However, academics are only part of the story that explains graduate students’ academic success. Interacting with them are numerous other elements of graduate life, such as scholarly and professional development, personal satisfaction, identity, stress Educ Asse Eval Acc (2015) 27:223–251 225
  4. 4. and anxiety, social support, peer relationships and community, and overall well-being (Gansemer-Topf et al. 2006; Offstein et al. 2004). Some studies have addressed socialization into graduate school and into the scholarly culture and values of students’ disciplines and professions, generating sets of factors that influence these processes (e.g., Gardner and Barnes 2007; Weidman et al. 2001). However, it is unclear how the characteristics and circumstances of an increasingly diverse and ever-changing profile of the graduate student interacts with both institutional constants and discipline-based cultural nuances to support their learning and professional development (see also Hardré 2012a, b; Hardré and Chen 2005, 2006). This information needs to include insight into the current and authentic nature of the graduate college experience, its impacts on students, other impacts on students’ success within it, and students’ perceptions of their journeys. Perceptions are important in any novel experience and particularly in transitions, as the nature and impacts of transition depend less on the actual, measurable events than on the participants’ individual and collective perceptions of those events (Hardré and Burris 2011; Schlossberg et al. 1995; Bloom et al. 2007). Stress is a core component of the graduate experience, and people handle stressful circumstances very differently (Offstein et al. 2004; Williams-Tolliver 2010). Goals and goal attainment have tremendous impact on how people work and learn (Kenner and Weinerman 2011). We have seen goals and expectations studied among higher education faculty, showing significant effects (e.g., Hardré et al. 2011), yet little systematic research has included the goals and expectations that graduate students bring into their educational experience and the reasons why they make choices along the way. Some theorists and practitioners have called for more concerted institutional efforts at understanding and supporting graduate students’ experiences and success, similar to those traditionally focused on undergraduates (e.g., Gansemer-Topf et al. 2006; Hyun et al. 2006). 4 Need for instrument and system design and fit Various efforts have been made to produce standardized measures and create national databases of information on graduate students. More than a decade ago, the National Doctoral Program Questionnaire, funded by the Alfred P. Sloan Foundation, was heralded as a grassroots attempt to use data about the graduate experience to improve graduate education nationally (Fagen and Suedkamp Wells 2004). The Higher Education Research Institute (HERI) project and the American Educational Research Association (AERA) graduate questionnaire project strove to generate data via ques- tionnaire instruments for comparing student experiences and faculty perceptions of their work climates (HERI 2012). However, in centralized systems such as these, neither the measurement parameters (instruments, participants, sampling, timing) nor the resulting raw data sets are directly accessible to, or controlled by, potential users (researchers or institutions), which severely limits their utility. Researchers and administrators in graduate education need instruments that gener- alize and transfer across institutions and contexts (Hyun et al. 2006). Having adaptive, useful, and efficient tools to investigate the graduate experience in higher education could help address the need for more scholarly research in this critical area for higher education (Gardner and Barnes 2007). Having the right tools and information could 226 Educ Asse Eval Acc (2015) 27:223–251
  5. 5. help administrators assess and address issues with attention to specialized local needs (Nesheim et al. 2006). It is clear that a need exists for systematically designed and well- validated tools for assessing a range of dimensions of the graduate experience, to address issues relevant to graduate program development and improvement, as seen through graduate students’ perspectives. Beyond instrumentation, graduate institutions need insight into administrative systems, timing, and related strategies to support optimal assessment. 5 Method 5.1 Context and reflexivity This project occurred in a public research university in the Southwestern USA. The Graduate College is more than 100 years old and enrolls over 4,000 advanced degree students annually. It confers doctoral and masters degrees in hundreds of academic majors, both traditional programs and continuing education degree programs and certificates. Some programs are very structured with students in cohorts, while others are more fluid and adaptive, allowing students to cover curricula on their own pace and schedule, supported by their academic advisors. The institutional culture gives auton- omy to colleges and departments to determine graduate academic program require- ments, and the graduate college oversees curriculum revisions, monitors progress, and maintains accountability. The graduate student body is 70 % US domestic and 30 % international from 42 countries; ages range from 21 to 90, and it is about evenly divided by gender. Full-time students make up 60 % of the graduate populations, and the remaining 40 % attend part-time; many graduate students also work outside of school and have families. The evaluator and assessment designer was a senior graduate faculty member in the university, with specialized training and expertise in this area, who also did external evaluation design and consulting professionally. The Graduate College Dean invited the faculty member to take on the evaluation redesign project, based on the advice of the university Provost. The evaluator worked on this project without personal financial compensation, but with the understanding that she could use the data gathered for research presentation and publication. The Graduate College did provide one graduate assistantship (0.5 fte) to assist with the primary project tasks. The evaluator also utilized a team of other graduate assistants on particular components of the project. 5.2 Process and procedures overview 5.2.1 Phase I: needs analysis, redesign, and alpha testing Invited by the Graduate College Dean to redesign its assessment of the graduate experience, the team reviewed the relevant literature to gain a general scope of coverage and variables of interest. Consistent with evaluation standards, we also involved others with interest in the outcomes (faculty and administrative stakeholders) to define the evaluation (Yarbrough et al. 2011). We conducted focus groups and interviews and administered generative, paper-based instruments with students, faculty, and Educ Asse Eval Acc (2015) 27:223–251 227
  6. 6. administrators. The goal at this early stage was to determine the most appropriate variables and indicators and to include nuanced information for client and program needs. Based on this information, the team designed and developed the first (alpha) version of the GCE assessment instrument. Given the need to reach a distributed group of technology-active participants with multiple tools, it was decided to use online admin- istration and the first (alpha) instruments were developed with the SurveyMonkey® administrative software. At this stage, three initial versions of the instruments were developed. Over 500 students completed the alpha test instruments, producing data adequate to demonstrate generally good psychometric characteristics and also deter- mine refinements necessary to improve on the GCE assessments. 5.2.2 Phase II: revision and beta testing Following the analysis of the development and alpha test data, the evaluation team generated a revised version of the GCE instrument. During the alpha testing, the team recognized relevant limitations in the original (SurveyMonkey®) digital administration system. In consultation with the Graduate College administration, it was decided to develop the beta instrument with the more adaptive Qualtrics® digital administration system. In its beta versions, the GCE evaluation contained refined scales and items. It was also extended to include forms for five participant groups, the original three (entrance, mid-point, and exit) plus two additional (non-attendees and alumni). These additional forms extended the range of information the evaluation package provided to the GC client. Over 2,000 student participants completed the beta instrument. In addition to the student respondents, the evaluation team sent the beta instrument to faculty who instruct and mentor graduate students across all academic colleges for feedback on its fit and relevance to their program evaluation and improvement needs. This strategy was based on a general interest in faculty perceptions (as key stakeholders), plus the Graduate College’s organizational goal of producing data useful to graduate programs. The beta data yielded additional information for further refining all five forms of the instruments and baseline findings for the GC clients. These data were analyzed in two ways for two types of outcomes: instrument performance and participant response patterns. 6 Phase I: needs analysis, redesign, and alpha testing 6.1 Needs analysis The purpose of the needs assessment and analysis was to determine how students, faculty, staff, and administrators defined the nature, parameters, and goals of the graduate experience. The results of this process provided information to guide the scope, definitions, and instrument development, as well as the testing plan. 6.1.1 Participant stakeholder groups Four stakeholder groups were identified to provide input for the redesign and data testing: 13 graduate students, 23 faculty, 10 staff, and 5 administrators of the Graduate 228 Educ Asse Eval Acc (2015) 27:223–251
  7. 7. College. A convenience sample was drawn from a list of individuals generated by the Graduate College and evaluation team. All of the identified members of the stakeholder groups participated in focus groups and some in additional follow-up interviews to inform needs and determine the scope and content of the GCE instruments. Graduate students and graduate college assistants The sample group to determine the definition of the graduate experience was derived from the pool of graduate students at the university. This sample included graduate assistants working in the graduate college, and members of an interdisciplinary group of graduate student program representatives, along with students they recruited. Diverse graduate students partici- pated at this stage in the process to help frame instrument language appropriate across all groups. Faculty, staff, and administrators Faculty, staff, and administrators at the univer- sity have unique perspective on the role of the Graduate College and concepts of the graduate experience. To better understand these issues, the evaluators solicited feedback from graduate program professors and administrators from various colleges. 6.1.2 Procedure To define and clearly identify components of the graduate experience, the evaluation team used focus groups, interviews, and open-ended questionnaire instruments. Due to their explanatory nature and the designers’ developmental interest in dialogue with stakeholders, these first questionnaires were paper- based. Responses were transcribed and coded in analysis. Participants were recruited through targeted e-mails and mailings using contact lists of current graduate students, faculty, staff, and administrators provided by the Graduate College. Focus groups Focus groups (of six to ten participants) discussed issues related to the graduate experience (time ≈60 min). The format was semi-structured with some direct questions available to guide the meeting and address relevant goals. Sample question was “What events and activities are part of the graduate student’s experience at [univ]?” Interviews Each individual interview was conducted in a semi-structured format (time ≈60 min). Each interview concerned either feedback on instrument development or more detailed understanding of issues raised in a previous focus group. Twenty-two questions were created as options to ask the interviewee concerning the graduate experience. Sample question was “Please define for you what constitutes the graduate experience at [univ].” Open-ended questionnaires Participants completed a 12-question (≈30-min) question- naire. Sample question was “What is your perception of the Graduate College?” Educ Asse Eval Acc (2015) 27:223–251 229
  8. 8. 6.1.3 Results of needs analysis Data from focus groups, interviews, and open-ended questionnaires provided the definition of the scope and terms to develop the GCE alpha questionnaire assessment instruments. The information from these participants generated the scales and language used for the first formal questionnaire items. From the needs analysis and developmental data, the following points were clear: & All of the stakeholder groups agreed that a new and more comprehensive assess- ment of the GCE was needed. & There were differences among groups as to what should be included and what should be emphasized in the new instrument. & There were, however, enough points of convergence and even consensus to draft and test an instrument that reflected both the client’s interests and the breadth of other stakeholders’ needs. The single-event, end-of-degree administration via paper questionnaire needed to be redesigned and replaced with methods more attentive to current activities, goals, and needs. Based on these results, the evaluators proceeded with designing and testing a new assessment instrument and system. 7 Redesign of administration timing, media, and methods Parameters of this redesign needed to be accessible and salient for students. To meet this need, the redesign included various administrations spread over students’ graduate experience (which lasted from 2 to 10 years). A challenge of timing (given the variability in duration across degrees and among full-time and part-time stu- dents) was identifying the key points in progress at which students would receive each instrument. Program faculty and department administrators need prompt, timely feedback to support program improvement. This could be achieved in part by the multi-event, incremental assessment design, and further enhanced by creating parallel forms of instruments that offered data on developmental change over time. Based on client and stakeholder input, the following potential improvements were indicated. & Appropriateness of item content for participant users could be improved by the incremental administration redesign so students received questions at times more proximate to their actual experiences. Utility and primary inves- tigator potential for administrative users (the Graduate College, academic programs) could be improved by the incremental administration redesign, so they received data before students graduated, making responsive improve- ments more timely and meaningful. & Administration efficiency and utility for the client could be improved by digital administration that eliminated manual data entry. Administration rescheduling and timeliness of access for users could be improved by online digital administration that they could access from remote and distributed sites. 230 Educ Asse Eval Acc (2015) 27:223–251
  9. 9. & Administration potential to share data with academic programs (a stated goal) and ability to track change over time would both be vastly improved by the redesign using both incremental administration and digital instrumentation. 7.1 Procedure The evaluation team developed item sets to address relevant constructs and outcomes indicated by the developmental data. Multiple team members independently examined the transcripts, discussed and collaborated to develop the overall instrument scope and content (topical scales and items). Then, the team organized the content for initial administration to students, as appropriate to their degree (masters/doctoral) and progress-toward-degree of study (entrance/mid-Point/exit). All administration occurred in an asynchronous online questionnaire administration system, with all participant identification separated from item responses. Testing participants were recruited via e- mail invitation, using lists of eligible students provided by the Graduate College. All study activities were consistent with human subjects’ requirements and approved by the institutional IRB. De-identified responses were analyzed and stored according to IRB standards for data security and confidentiality. 7.2 Participants Participants were 504 graduate students invited to take the form of the questionnaire appropriate to their point-in-program whether they were at the beginning (130), middle (118), or end (256). Detailed participant demographics are shown in Table 1. Students were demographically representative of the larger graduate student population on campus, with similar distributions of genders, ethnicities, colleges, and degree types (within ±6.1 %). The eventual intent of the instrument design was to assess developmental trajectories of experiences in the same graduate students over time (a within-subjects sample). However, in order to collect sample data efficiently, in a single year, we used different graduate students as proxy groups for progress-in-program (a between-subjects sample). 7.3 Instruments A total of 149 items were developed for the first round (alpha) instruments: 21 demographic items (selection and fill-in), 97 Likert-type items, 19 dichotomous (yes/ no) items, and 12 open-ended items. For the Likert-type items, after consultation and discussion with the client regarding the tradeoffs in various scale lengths and config- urations, an eight-point scale (1=strongly disagree, 8=strongly agree) without a neutral mid-point was used. In addition to the formal quantitative items, open-response fields were provided and participants encouraged to “explain any responses” or “provide any additional information.” The items were organized into theoretical and topical clusters and subscales. The sections were (1) Why Graduate School?, (2) Admissions Process, (3) Decision to Attend, (4) Financial Aid, (5) The Graduate Experience, (6) Graduate College Advising Educ Asse Eval Acc (2015) 27:223–251 231
  10. 10. and Staff, (7) Graduate College Events, (8) Graduate College Media and Materials, (9) Program of Study Satisfaction, (10) Social Interaction, and (11) University Resources and Services. Table 2 shows a summary of the scope and organization of the instru- mentation, as well as reliabilities and factor structures. Based on the initial redesign feedback, three web-based forms of the questionnaire instruments were created. Each was designed to measure satisfaction with the graduate experience at specific incremental points in students’ progress-toward-degree: entry, mid-program, and exit. Participants were recruited via e-mail and provided with active, generic hyperlinks to the questionnaires, which they could access from any location. Timing for the assessments was at three key time points in their specific programs: at entrance (their first semester), mid-point (first semester of second year for masters; first Table 1 Alpha participant demographic characteristics Frequency Percentage All Masters PhD Institution Sample Degree type Masters 375 – – 72.5 75.0 Doctoral 125 – – 27.5 25.0 Gender Male 237 164 70 51.7 47.6 Female 261 209 51 48.3 52.4 Ethnicity African American/black 31 23 8 5.0 5.7 Asian American/Asian 44 29 15 5.1 8.1 Pacific Islander/native Hawaiian 2 2 – 0.2 0.4 Hispanic/Latino 25 23 2 5.2 4.6 Native American/American Indian 29 24 5 4.9 5.4 White/Caucasian 397 295 98 72.7 73.3 Other 14 9 5 6.9 3.6 Colleges Architecture 6 6 – 2.2 1.2 Arts and Sciences 217 148 69 37.0 43.0 Atmospheric and Geographic Sciences 6 5 1 3.5 1.2 Business 37 30 6 8.3 7.4 Earth and Energy 11 9 2 5.2 2.2 Education 60 42 18 18.0 11.9 Engineering 43 30 13 14.1 8.6 Fine Arts 25 17 7 5.6 5.0 Journalism and Mass Communication 10 6 4 1.8 2.0 International Studies 11 11 – 0.4 2.2 Liberal Studies 46 44 2 2.9 9.2 Dual Degree/Interdisciplinary 31 26 3 0.8 6.2 232 Educ Asse Eval Acc (2015) 27:223–251
  11. 11. semester of third year for doctoral students), and exit (graduating semester). At this stage of development, all students completed the same questionnaire sections, with two exceptions: Admissions Process (entry students only) and Career Preparation (mid-point and exit only). 8 Analysis Once questionnaires were completed, data were exported to SPSS® for statistical analysis. Means and standard deviations were computed for each Likert-type question. Additional subgroup mean comparison statistics were computed for significant differences (by degree type—masters and doctoral, and progress-toward-degree groups). Exploratory factor analyses (EFAs) were conducted on theoretical and topical sections with more than five items, to examine structural nuances and help determine the appropriateness of items within sections. Reliabilities for the theoretically coherent subscales were computed using Cronbach’s alpha (target α≥0.80). Additional generative commentary and questions provided qualitative information to utilize in evaluation and revision. 8.1 Alpha measurement testing results The alpha testing focused on measurement performance, with the system test implicit at all stages, from development through response patterns. The assessment of validity at this stage was a preliminary look at the appropriateness of instrument scope, item Table 2 Section overview (alpha version) Type of Scale No. of items Alpha No. of factors Why graduate school Item cluster 8 – – Admissions process Subscale 6 0.864 1 Decision to attend Item cluster 8 – 3 Financial aid Item cluster 8 – – The graduate experience Graduate experience satisfaction Subscale 13 0.928 2 To me, the graduate experience includes… Item cluster 12 – 3 Graduate college advising and staff Subscale 4 0.813 1 Graduate college events Item cluster 10 – – Graduate college media and materials Item cluster 5 – 1 Program of study satisfaction Program of study Subscale 9 0.806 2 Academic advisor Subscale 7 0.975 1 Academic program faculty Subscale 12 0.950 1 Career preparation Subscale 6 0.841 1 Social interaction Subscale 9 0.830 2 University resources and services Subscale 19 0.881 4 Negatively worded items were reverse-coded both for the reliability and factor analyses Educ Asse Eval Acc (2015) 27:223–251 233
  12. 12. content, and overall fit. The assessment of reliability at this stage was to assess subscale and section range and coherence, along with the nature and roles of item contributions. Data on section-level relationships and item contributions would support refinement for both content appropriateness and instrument efficiency, without reducing instrument coherence or sacrificing measurement scope. Qualitative responses and commentary provided information on how particular participants and subgroups processed and interpreted the instrument content, which further informed revision and refinement. 8.2 Validity The first goal of the alpha testing analysis (focused on validity) was to assess the appropriateness, scope, and fit of the instrumentation for addressing the target variables and indicators (Cook and Beckman 2006; Messick 1995), overall and for each sub- group at point-in-program. This included not only the item and section content but also the instructions and administration system. Validity information was contained in the developmental data (from the “Needs analysis” section), based on both expert-client and user-stakeholder perspectives on what should be included. From the alpha testing, analysis of authentic user responses added authenticity to illuminate the hypothetical. The EFAs were conducted on all sections (criteria of loading at 0.80 with cross- loadings not exceeding 0.30). This analysis would confirm that the language used in the items (taken from the generative contributions of various stakeholders) were communicating what they were intended to, and relating appropriately with one another as interpreted by end-users. Additionally, the open-ended fields inviting additional explanation and commentary were analyzed for contributions to the scope, content, and appropriateness of the instrument and sections, as well as for system issues. Most of the scales showed adequate and consistent loadings, and those falling short of target criteria provided information needed to refine them. The sample was inadequate to demonstrate discriminatory capacity for all subgroups of interest, but its differential performance in some global groups (such as between masters and doctoral students) showed promise. Overall, the synthesis of validity data showed both current strength and future promise in the redesigned GCE assessment. 8.3 Reliability The second goal (focused on reliability) was to conduct a preliminary assessment of subscales’ internal coherence and item-level contributions, along with their discrimi- natory capacity. As evidence of internal reliability, all of the theoretically coherent sections (scales) were assessed for internal coherence using Cronbach’s alpha (criterion of 0.80). Some met the test immediately, and others varied based on nuances in participant responses. These data analyses demonstrated how those scales and sections falling short of the standard could be refined to meet it and thus improve the measure. All instruments demonstrated high stability over multiple administrations. 8.4 Divergence of “should be” versus “is” Notably more comments were received on the item set defining the Nature of The Graduate Experience. That section’s item stem was phrased as: “For me, the graduate 234 Educ Asse Eval Acc (2015) 27:223–251
  13. 13. experience includes….” followed by the list of descriptors supplied by students, faculty, and staff during the needs analysis process. Comments on this section converged to the question of whether that section’s instructions were intended to address what the student’s actual experience did include or an ideal perception of what the graduate experience should include. The original item had been written to address the former, the student’s actual experience, but the frequency of these comments illuminated a pattern of fairly widespread perceptions that there was a difference between the two. That is, they suggested a need to inquire as to how graduate students’ actual experiences differed from their expectations of what they should be. In addition, factor structuring showed a divergence of content focus between perceptions that clustered as preferences and perceptions that clustered as quality indicators as a proxy for satisfaction, indicating a need to further restructure this section. 8.5 Perceived length A common global comment received was that the whole instrument was very long. We recognized that containing just over 100 substantive items, it was longer than most students commonly completed (particularly in the current climate of short, quick digital questionnaires). However, the internal systems data also confirmed that average time- on-task for users who completed all items was only about 30 min. This was within the task time required that we had predicted (below the maximum time range in our participant consent document). It was also within the time frame considered reasonable for an online administration, with the caveat that some users may perceive it to be much longer. 8.6 Online administration The cumulative data on system redesign (both method and tooling) indicated that the digital, online administration was more effective and appropriate for reaching more graduate students, including distance learners and part-time students, than the previous (paper-based) method. The specific tool chosen (SurveyMonkey®) had presented challenges in development, requiring a good deal of specialized back-end programming for configuring it to deliver the instrument as designed. In particular, differential options that required skip logic and similar special presentations were tedious to develop. In addition, some critical issues that arose in compatibility with user-end systems required intervention. For a new evaluation package used over time and across platforms, we decided to seek a new administration tool that would add ease for both developers and end-users. 9 Conclusions and measurement revisions The evidence and information provided by the full range of data produced in the first round of instrument testing demonstrated that the GCE redesign was largely successful to date. This reasonable sample yielded strong evidence for both the validity and reliability of the instruments at this stage. It also provided a good deal of both psychometric data and direct user feedback on how they could be further improved Educ Asse Eval Acc (2015) 27:223–251 235
  14. 14. for the beta testing. Based on all of the information accrued, the evaluators made the following revisions for the next round of testing: Given the users’ qualitative feedback on the “Nature of the Graduate Experience” we adopted the dual range of their responses, one general and ideal, the other personal and perceptual. In the beta, this item cluster was presented two parallel clusters: the graduate experience should include and my graduate experi- ence does include. & By the client’s request, two more participant group versions were added: alumni and non-attendees. The first increased the scope and range of assessment of program effects beyond students’ perceived preparation for careers, to include their experiential perceptions after graduation and entry into their professions. It consti- tuted a fourth sequential assessment for students who attended this institution. The second addressed the client’s interest in what caused candidates accepted into graduate programs not to enter them, to support recruitment and retention efforts. It constituted an entirely different instrument for a new participant group. & Based on multiple item-level and scale-level analyses, we determined that approx- imately 17 items could be removed to shorten the assessment without reducing subscale reliabilities. However, we retained those items in the beta versions, in order to test those conclusions with a second round of testing and a larger, more diverse sample. & We acknowledged that our revision decisions included significantly increasing the length of the instrumentation and that the users already perceived it to be long. However, we wanted to gain evidence for the full range of possible redesign decisions from the retest data in the beta cycle. We determined that with the independent sample test-retest data, we would be better equipped with ample evidence to make those revision decisions for the final client handoff. & Based on the weaknesses found in the initial tool, we selected a different develop- ment and administration system for the beta testing, with more sophisticated development functionality and the added benefit of institutional licensure accessibility. 10 Phase II: redesign and beta testing—student questionnaires 10.1 Procedure All administration occurred in an asynchronous online questionnaire administration system, with all participant identification separated from item responses. A new (between-subjects) group of testing participants was recruited via e-mail invitation, using lists of eligible students provided by the Graduate College. Participants were offered small individual incentives (tee-shirts for the first 100 completing the instru- ments) and all participants entered into a drawing for a larger incentive (a digital device). All study activities were consistent with human subject requirements and approved by the institutional IRB. All participant data was de-identified and kept confidential. 236 Educ Asse Eval Acc (2015) 27:223–251
  15. 15. 10.2 Participants The 2,081 current or potential student participants were invited to take the form of the questionnaire appropriate to their identity and point-in-program whether they were individuals admitted, but who chose not to attend (22); students at the beginning (661), middle (481), or end of their program (672); or alumni (245). Detailed participant demographics are shown in Table 3. Participants were demographically representative of the larger graduate student population on campus, with similar distributions of genders, ethnicities, and colleges (within ±6.6 %). Two colleges were overrepresented: Liberal Studies (+9.5 %) and Dual Degree/Interdisciplinary (+16.0 %). Masters stu- dents were also overrepresented (+13.6 %). Response rate from e-mail lists was 72.6 % (2,081 out of 2,865). Table 3 Beta participant demographic characteristics Frequency Percentage All Masters PhD Institution Sample Degree type Masters 1431 – – 72.5 86.1 Doctoral 230 – – 27.5 13.9 Gender Male 863 716 146 51.7 46.9 Female 1019 904 114 48.3 54.1 Ethnicity African American/black 166 151 15 5.0 8.8 Asian American/Asian 146 108 38 5.1 7.8 Pacific Islander/native Hawaiian 5 5 – 0.2 0.3 Hispanic/Latino 110 96 12 5.2 5.8 Native American/American Indian 85 77 8 4.9 4.5 White/Caucasian 1,297 1,118 179 72.7 68.9 Other 74 66 8 6.9 3.9 Colleges Architecture 30 30 – 2.2 1.5 Arts and Sciences 652 523 111 37.0 33.3 Atmospheric and Geographic Sciences 36 26 8 3.5 1.8 Business 113 102 9 8.3 5.8 Earth and Energy 52 49 3 5.2 2.7 Education 225 172 44 18.0 11.5 Engineering 146 109 32 14.1 7.5 Fine Arts 47 34 13 5.6 2.4 Journalism and Mass Communication 31 24 7 1.8 1.6 International Studies 53 51 – 0.4 2.7 Liberal Studies 243 225 6 2.9 12.4 Dual Degree/Interdisciplinary 328 277 29 0.8 16.8 Educ Asse Eval Acc (2015) 27:223–251 237
  16. 16. 10.3 Instruments A total of 268 items were administered for the second round (beta) questionnaires: 17 demographic items (selection and fill-in), 237 Likert-type items, 9 dichotomous (yes/ no) items, and 5 open-ended items. Similar to the alpha questionnaires, for theoretically continuous items, an eight-point Likert scale (1=strongly disagree, 8=strongly agree) was used. Open-response fields enabled participants to “explain any responses” or “provide any additional information.” The 11 sections for the alpha questionnaires largely remained, with some new sections added and refined, based specifically on the data and feedback from the alpha testing. After the revisions, a total of three sections were added to create a better understanding of the Graduate College experience. Five forms of the questionnaire instruments were created: non-attend, entrance, mid- point, exit, and alumni. The expanded design was for graduate students to be assessed at four time points in their programs: at entrance (their first semester), at mid-point (first semester of their second year for masters students, or first semester of their third year for doctoral students), at exit (their graduating semester), and at 2 years post-graduation. The fifth version would be completed only by students who were accepted but chose not to attend, to help the Graduate College gain information about why. All student forms were parallel except for the Admissions (entry only) and Career Preparation (mid-point, exit, and alumni). Further, some items within sections were unique to alumni, relevant to post-graduation expe- riences. The non-attend version of the questionnaire was much shorter and different in context than the other instruments as appropriate to its purpose and target group. The various sections and subscales are described below, with the results of their reliability and factor analyses as appropriate. The summary of statistical results is also shown in Table 4. 10.4 Subscales and item clusters The 14 sections were divided into subscales and/or item clusters as follows: Why graduate school? This section was designed to determine the reasons that students attend graduate school. It presented the item stem “I am pursuing a graduate degree,” and then listed 17 different reasons, each with a Likert-type scale. Sample item is “I am pursuing a graduate degree…to gain a competitive advantage in my field.” The EFA showed four factors. Admission process. This section presented items about the individual’s admission experience, process, and satisfaction. First, a single item addressed whether or not students used the (then still optional) online system (dichotomous). Second, a subscale addressed participants’ satisfaction with their admissions process (four items; Likert-type; α=0.866). Sample item was “The instructions for completing the application were adequate and easy to understand.” The EFA confirmed a single factor. Decision to attend. This section assessed reasons why students chose to come to this university. It first asked if this was the student’s first choice school (dichotomous). Then, a summary Likert-type item to endorse was 238 Educ Asse Eval Acc (2015) 27:223–251
  17. 17. “I am happy with my decision to attend [univ]1 .” The third component was an item cluster (14 items; Likert-type scale). Item stem was “My decision to attend [univ] was influenced by…” followed by 16 different responses to endorse (e.g., “having similar research interests as professors in the department”). Financial aid. This section asked students to identify the sources and types of their support for attending and engaging in graduate studies (e.g., graduate assistant- ships, tuition waivers). Graduate experience. This section consisted of three parts: satisfaction, what it should be, what it is (all on Likert-type scales). Satisfaction with the graduate experience (12 items; α=0.901). Sample item was “I would recommend [univ] to prospective graduate students.” The EFA confirmed one factor. Students’ 1 These items presented the university’s acronym, replaced here with the generic “[univ]”. Table 4 Summary of instrument structure and statistical performance (beta version) Type of scale No. of Items Alpha No. of factors Why graduate school Item cluster 18 – 4 Admissions process Subscale 4 0.866 1 Decision to attend Item cluster 17 – 4 Financial aid Item cluster 7 – – The graduate experience Graduate experience satisfaction Subscale 12 0.901 1 To me, the graduate experience should include… Item cluster 34 – 4 To me, the graduate experience did include… Item cluster 34 – 3 Graduate college advising and staff Subscale 6 0.879 1 Graduate college events Item cluster 2 – 1 Graduate college media and materials Subscale 8 0.924 1 Graduate program self-efficacy Success in graduate program Subscale 6 0.808 1 Success in chosen profession Subscale 7 0.873 1 Program of study satisfaction Program of study Subscale 12 0.865 2 Academic advisor Subscale 9 0.987 1 Academic program faculty Subscale 12 0.971 1 Career preparation satisfaction Career preparation Subscale 10 0.973 1 Utility/value of degree Subscale 5 0.938 1 Professional competence Subscale 10 0.957 1 Social interaction Subscale 21 0.855 3 University resources and services Subscale 19 0.929 3 Final thoughts Qualitative 2 – – Negatively worded items were reverse-coded both for the reliability and factor analyses Educ Asse Eval Acc (2015) 27:223–251 239
  18. 18. perceptions of what the graduate experience “should include” and the parallel of what it “does include” for that student (34 items each) both presented item stems followed by lists of parallel characteristics, each for the student to endorse. Sample item was “To me, the graduate experience should include…developing close connections with faculty.” The EFA showed that the “should include” scale loaded on four factors, while the “does include” loaded on three. Graduate college advising and staff. This section first asked students whether they had experienced direct contact with the GC staff, for advising or other assistance, then presented items assessing their understanding of its role and services five5 items; Likert-type; α=0.879). Sample item was “I understand the role of the Graduate College.” The EFA confirmed a single factor. Graduate College events. This section assessed students’ participation in various GC-sponsored activities, to support ongoing program planning. Sample items were “I attended activities during [event]” (dichotomous), and “I often attend Graduate College sponsored events” (Likert-type). Graduate College media and materials. This section assessed students’ satisfaction with, and perceived benefit from, the GC website and other informational materials (eight items; Likert-type; α=0.924). Sample item was “Viewing information on the Graduate College’s website benefits me.” Graduate program and career self-efficacy. This section (two subscales) assessed students’ perceptions of self-efficacy (positioning for success) in their graduate programs and professions. Program self-efficacy consisted of six items (Likert- type; α=0.808) and professional self-efficacy of seven items (Likert-type; α=0.873). Sample items were “I am certain that I will do well in this graduate program.” and “I am just not sure if I will do well in this field.” EFA confirmed one factor for each subscale. Program of study satisfaction and career preparation. This section (four subscales) assessed students’ satisfaction with various components of their graduate pro- grams: program (focus on content and curriculum) (12 items; α=0.848; 2 factors), program faculty (focus on teaching and advising) (20 items; α=0.966; 2 factors), career preparation (9 items; α=0.973; 1 factor), and career utility and value of degree (5 items; α=0.938; 1 factor) (all Likert-type items). Sample items were program (“I believe that the level of difficulty in my coursework is appropriate”), faculty (“The faculty in my program are fair and unbiased in their treatment of students.”), career preparation (“My program area course content is preparing me to practice effectively in the field.”), and career utility and value of degree (“My graduate degree will open up current and future employment opportunities.”). Professional competence and identity development. This subscale assessed stu- dents’ perceptions of becoming competent professionals (ten items; Likert-type; α=0.957). Sample item was “More and more, I am becoming a scholar in my field.” EFA confirmed a single factor. Social interaction. This subscale assessed participants’ social interaction and engagement in the graduate community (21 items; Likert-type; α=0.855). Some items differed for alumni, as appropriate. Sample items were current students (“I have many friends in this university”) and alumni (“I am still in contact with friends from my graduate program”). 240 Educ Asse Eval Acc (2015) 27:223–251
  19. 19. University resources and services. This section assessed participants’ satisfaction with university campus resources and services (19 items; Likert-type; α=0.929). Sample item was “I am happy with the condition of the building(s) containing my classrooms.” Final thoughts. They were also asked to answer two qualitative questions describ- ing notable positive and challenging experiences in graduate school. Items were “Please describe one of your most meaningful and important graduate experiences at this university to date. Give as much detail as possible. Include the reasons why it was so meaningful and important for you,” and “Please describe one of your most challenging graduate experiences at this university to date. Give as much detail as possible. Include the reasons why it was so challenging for you.” 11 Analysis The same instrument performance analyses were conducted for the beta test data as for the alpha test, utilizing SPSS® (see Table 4). In addition, the larger beta sample size made it possible to perform more fine-grained subgroup mean comparison statistics with greater statistical power, to confirm that the instruments maintained reliability within subgroups, and determine if they also demonstrated some discriminatory power for within-group differences (see Table 5). To assess their discriminatory potential, we used two key subgroups, by-degree (masters and doctoral) and progress-toward-degree (entry, mid-point, exit). Student subgroup data demonstrated good consistency of performance across subgroups, with some discrimination of mean differences. 12 Phase II: redesign and beta testing—faculty questionnaires 12.1 Procedure Faculty members were also asked to give feedback regarding the various forms and subscales on the student questionnaires. Five forms of web-based questionnaire instru- ments were created to parallel the five versions of the student beta questionnaires, presenting faculty members with screenshots of the student instruments and unique response items for faculty. Participants were recruited via e-mail and provided with active, generic hyperlinks to the questionnaires. They responded regarding the value and fit of that information for their program development and improvement. 12.2 Participants Faculty participants were invited from a list of faculty who teach and advise graduate students. The list was randomly divided into five groups, and each received one of the five forms of the student questionnaires (all sections). Faculty responses (N=199) were divided as follows: 43 non-Attend, 33 entrance, 42 mid-point, 44 exit, and 37 alumni. Detailed participant demographics are shown in Table 6. Educ Asse Eval Acc (2015) 27:223–251 241
  20. 20. Table5Summaryofsubgroupmeansbydegreetypeandprogress-toward-degree All (N=1,663) Masters (N=1,431) PhD (N=230) Non-attend (N=14) Entrance (N=598) Mid-point (N=410) Exit (N=593) Alumni (N=205) Admissionsprocess6.546.606.126.466.54––– Decisiontoattend7.127.156.905. Thegraduateexperience6.516.546.37–6.636.346.496.58 GraduateCollegeadvisingandstaff6.– GraduateCollegemediaandmaterials5.525.515.60–5.605.335.58– Successingraduateprogram6.916.926.85–6.886.836.98– Successinchosenprofession6.366.376.31–6.396.336.376.28 Programofstudy6.076.026.37–6.215.916.04– Academicadvisor6.166.086.73–– Academicprogramfaculty6.836.836.82–6.926.846.836.58 Careerpreparation6.746.726.88–6.866.636.806.49 Utility/valueofdegree6.826.806.95–6.976.766.886.38 Professionalcompetence6.566.526.84–6.656.546.586.33 Socialinteraction4.954.945.02–4.994.844.975.28 Universityresourcesandservices6.276.286.26–– Allsubscalesaremeasuredonaneight-pointLikertscale(1=stronglydisagree,8=stronglyagree) 242 Educ Asse Eval Acc (2015) 27:223–251
  21. 21. 12.2.1 Instruments Faculty members reviewed screen captures of each section of the student questionnaires and responded to six items (three Likert and three open-response). 12.2.2 Perceived appropriateness This section assessed how appropriate (applicable, coherent, and useful) the faculty members found the student assessment sections (three items; Likert-type; α=0.80). Items were “The items in this section are applicable to our graduate department/program;” “The items in this section are cohesive, providing perspective related to the section topic;” and “The results from this section will be useful to know about our graduate students.” Table 6 Frequency of faculty par- ticipant demographic characteristics All Gender Male 115 Female 51 Other gendered 1 Ethnicity African American/black 3 Asian American/Asian 5 Pacific Islander/native Hawaiian – Hispanic/Latino 4 Native American/American Indian 2 White/Caucasian 144 Other 6 Colleges Architecture 9 Arts and Sciences 109 Atmospheric and Geographic Sciences 8 Business 10 Earth and Energy 6 Education 10 Engineering 19 Fine Arts 5 Journalism and Mass Communication 7 International Studies – Liberal Studies 1 Dual Degree/Interdisciplinary – Professorial rank Assistant professor 31 Associate professor 58 Full professor 80 Other 3 Educ Asse Eval Acc (2015) 27:223–251 243
  22. 22. 12.2.3 Open-response items Three additional generative items invited original faculty input: (1) “Are there any additional items that you believe need to be added to this section? If so, please identify which items those are, and why they are needed here;” (2) “Are there any items here that you believe should be removed from this section? If so, please identify which items those are, and why they should be removed;” and (3) “Other comments.” 13 Analysis Analyses were conducted utilizing SPSS®. Reliabilities for the fit scale were computed as Cronbach’s alpha (target α≥0.80). De-identified questionnaire re- sponses were analyzed and stored according to IRB standards for data security and confidentiality. On the three quantitative fit items, faculty members reported finding the information applicable, cohesive, and useful for their programs (M=6.34, SD=1.42). Overall appropriateness of each questionnaire was as follows: non-attend (M=6.17, SD= 1.41), entrance (M=6.15, SD=1.71), mid-point (M=6.21, SD=1.41), exit (M=6.79, SD=1.15), and alumni (M=6.38, SD=1.42). Tables 7 and 8 show the subscale item means and standard deviations of the faculty feedback. 14 Overall measurement performance results These data together constitute an independent samples’ test-retest of the GCE instru- ment and system redesign. The beta testing cycle was a confirmatory retest, along with some extension and refinement, of the alpha testing. Its analysis functioned on the same goals, assessing the validity, reliability and fit of the new GCE assessment, through both direct administration and stakeholder feedback. Understanding the item-level contributions, particularly across two testings with independent, authentic user samples, supported final instrument refinement for align- ment and efficiency. We had retained longer versions of the evaluation instruments knowing that the second testing would confirm or disconfirm which items could be removed to retain optimal evaluative effectiveness and efficiency. In addition, the two rounds of testing (alpha and beta) provided independent confirmation of the psycho- metric properties of these measures. Results from the beta instrument testing were similar to those from the alpha cycle. The first goal of the beta testing analysis was to assess the appropriateness, scope, and fit of the refined instrument content in addressing the target variables and indica- tors, overall and for key student subgroups (by degree type and point-in-program). The scales and sections performed with a high degree of consistency across the whole group, while also demonstrating the capacity to discriminate between groups both by degree type (masters/doctoral) and by progress-in-program (entry, mid-point, exit). The scales and sections once again loaded consistently, demonstrating good test-retest stability as evidence of reliable performance and validity in assessing the target constructs. 244 Educ Asse Eval Acc (2015) 27:223–251
  23. 23. The second goal was to conduct a confirmatory assessment of subscale reliability, subscale and section range and coherence, and item contributions. Consistent with their performance in the previous cycle, nearly all subscales met the target criteria in both internal consistency and factor loadings. Those that demonstrated less coherence (generally the newly added and reorganized sections) demonstrated statistically how they could be refined to meet the criteria. Across the two testing cycles, the scales and sections also demonstrated a high level of test-retest stability and external consistency. In addition to their performance with students, the instruments received favorable perceptions of fit from faculty members across colleges and disciplines. Few additions and deletions were recommended, and those suggested were specific to particular fields rather than generally appropriate to the broader graduate faculty. Overall, the revised (beta version) GCE instrument demonstrated excellent measurement performance. Table 7 Faculty feedback on scale fit reliabilities, means, and standard deviations All subscales are measured on an eight-point Likert scale (1= strongly disagree, 8=strongly agree). Also, “Success in gradu- ate program” and “Success in chosen profession” were mea- sured as one section; “Academic advisor” and “Academic program faculty” were measured as one section; and “Career preparation” and “Utility/value of degree” were measured as one section Fit Alpha Mean SD Demographics 0.905 6.09 1.65 Why graduate school 0.958 6.32 1.79 Admissions process 0.960 6.55 1.67 Decision to attend 0.942 6.75 1.44 Financial aid 0.912 6.73 1.41 The graduate experience Graduate experience satisfaction 0.956 6.81 1.23 To me, the graduate experience should include… 0.951 6.47 1.50 To me, the graduate experience does include… 0.956 6.84 1.27 Graduate college advising and staff 0.923 6.55 1.52 Graduate college events 0.966 5.70 1.90 Graduate college media and materials 0.928 6.57 1.19 Graduate program self-efficacy Success in graduate program 0.968 6.41 1.59 Success in chosen profession 0.968 6.41 1.59 Program of study satisfaction Program of study 0.969 6.80 1.29 Academic advisor 0.918 7.06 1.25 Academic program faculty 0.918 7.06 1.25 Career preparation satisfaction Career preparation 0.967 6.72 1.36 Utility/value of degree 0.967 6.72 1.36 Professional competence .958 6.50 1.38 Social interaction 0.958 6.49 1.38 University resources and services 0.948 6.47 1.45 Final thoughts 0.989 6.64 1.56 Educ Asse Eval Acc (2015) 27:223–251 245
  24. 24. The new administration system (Qualtrics®) required something of a learning curve in development but paid off with a high degree of clarity and usability for both developers and end-users. A few user comments included confusion regarding the interface, but those were easily addressed. As to time-on-task required to complete the beta version, participants took only a few minutes more than the alpha version (37 min on average). One in-system revision indicated prior to implementation was to simplify the programming logic, as the originally complex skip-logic appeared to confound use of the progress bar. 15 Data-driven findings demonstrating evaluation enhancement While the research-based findings are the topic of separate manuscripts, it is important here to underscore those that constitute evidence of the value-added of this particular Table 8 Faculty section means Applicability (N=141) Cohesiveness (N=137) Usefulness (N=137) Demographics 6.34 5.99 6.09 Why graduate school 6.70 6.62 6.32 Admissions process 6.37 6.33 6.27 Decision to attend 6.79 6.73 6.72 Financial aid 6.70 6.87 6.64 The graduate experience Graduate experience satisfaction 6.94 6.75 6.77 To me, the graduate experience should include… 6.51 6.48 6.44 To me, the graduate experience does include… 6.89 6.85 6.79 Graduate college advising and staff 6.58 6.67 6.48 Graduate college events 5.73 5.91 5.60 Graduate college media and materials 6.62 6.67 6.40 Graduate program self-efficacy Success in graduate program 6.49 6.38 6.34 Success in chosen profession 6.49 6.38 6.34 Program of study satisfaction Program of study 6.89 6.66 6.82 Academic advisor 7.19 6.96 7.09 Academic program faculty 7.19 6.96 7.09 Career preparation satisfaction Career preparation 6.82 6.64 6.72 Utility/value of degree 6.82 6.64 6.72 Professional competence 6.56 6.50 6.47 Social interaction 6.44 6.57 6.41 University resources and services 6.51 6.54 6.37 Final thoughts 6.65 6.61 6.66 246 Educ Asse Eval Acc (2015) 27:223–251
  25. 25. redesign strategy. One powerful product of this project was the instruments themselves, developed from the direct input of faculty, staff, administrators, and students, then tested and refined through authentic use. In addition, among data-driven findings are potentially important patterns that are illuminated by specific elements of the instru- ment and system design. For example, the subgroup differences by degree type and point in progress-toward-degree had not been demonstrated in the previous published literature nor had they every been analyzed or compared in this Graduate College’s evaluation process, because the previous design did not allow for this type of compar- ison. Also important was the general pattern of the mean score drop at mid-point across multiple perceptions, as this trajectory of perceptions had been demonstrated in very focused and small-scale studies, but not in a diverse interdisciplinary group of graduate students across an entire university population. This was, again, because the published studies did not present design of instrumentation and implementation on this scope. Similarly, the development of parallel scales, such as the two forms of the Graduate Experience scale (“should” and “is”) and the two self-efficacy scales (program and career), support direct comparison of differential perceptions in these potentially important nuanced constructs. In the test samples, there were some striking differences in these perceptions. In addition, the redesign to include both graduate college and program-level outcomes, explicitly endorsed by graduate faculty, supported the grad- uate college giving back de-identified data to departments and programs. The redesign included moving to online administration, which resulted in dramatically improved participation rates in the graduate college evaluation overall, and the dual-phase testing process included testing two different development and administration systems and identifying weaknesses in one before full implementation. These results underscore the value and importance of the redesign to include the range and types of perceptual instruments, the development and delivery system, and the multi-point (trajectory) of administration. 16 Limitations A limitation of this developmental design and analysis is implicit in the available sample. It was (1) volunteer (required by IRB) rather than comprehensive and (2) from independent samples (resulting in between-subjects rather than within-subjects analy- sis). These sampling constraints introduce variability beyond that for which the instru- ments were designed. However, following implementation, the authentic, within- subjects sample will be accessed over the next 5 years. An additional limitation is the sample from a single institution and future goals include a multi-institutional test. 17 Conclusions Based on the instrument and system performances, the evaluators recommended transfer to the client for full implementation, with a list of items the client could choose to delete without reducing the overall quality of the measure. It was important to underscore that assessment efficiency is not the only criterion for item selection or inclusion. Efficiency must be balanced with effectiveness as operationalized by scope Educ Asse Eval Acc (2015) 27:223–251 247
  26. 26. and range of each scale or section. The evaluators proposed length reduction using the criteria of maximum efficiency without reducing scale reliabilities (below 0.80) or unduly constraining the scope of assessment to exclude a critical subgroup of students or disciplines typically represented in a research university. Based on these criteria, a maximum of 55 items could be removed. After discussion with the client, only 19 items were removed for initial implementation, to maintain a robust instrument with the greatest range for possible nuanced differences among colleges and disciplines. These redesigned program evaluation methods and measures offer substantive benefits consistent with the Graduate College’s expressed goals and emergent needs. Product and process outcomes include the following: 1. Updated, reasoned, multi-event administrative process, attentive to organization and programs across the university 2. Psychometrically sound instrumentation that produces objectively verifiable, de- fensible results 3. Excellent validity evidence on internal context, scope, structure and substance of the instrumentation, with perceived fit and value-added perceptions of faculty across disciplines 4. Excellent reliability evidence including internal coherence as well as external and factor structures, test-retest with students; and consistency across subgroups 5. Self-contained, stand-alone variable subscales and item clusters that enable admin- istrators to utilize part or all sections of the instrument as needed 6. Updated administrative system and media to reach a larger group including off-site and distributed graduate students The team also emphasized that in addition to administering the complete instrument at once, each subscale and section is designed as a potential stand-alone section. It is feasible for an institution or unit to remove some sections that address issues of less immediate priority or administer sections at different times with this design. If the user intended to compare responses from the various sections, administering them at the same time would control for some order and administration effects. Future directions for this project include the extension via longitudinal testing with the dependent sample for which it was originally designed. That data may also provide additional confirma- tory insight on performance of the shorter (revised) version supported by these data. 18 Discussion The redesign of assessments goes beyond instrumentation. Rethinking assessments is much more than generating a new set of items, or even user instructions. Effective redesign requires re-examining the full range of features, contexts, and conditions, including timing, technology, tools, reframing of longitudinal instrumentation, and so on, to produce a whole-system redesign. Many institutional assessments are moving to digital administration systems, a shift that is more than simple digitization, involving translation, as well as transfer (Bandilla et al. 2003; Hardré et al. 2010a). Administrators need to consider design features (Vincente and Reis 2010) as well as system and context elements that may influence user behaviors and consequent data 248 Educ Asse Eval Acc (2015) 27:223–251
  27. 27. outcomes (Hardré et al. 2012). Tools and systems need to be tested in authentic ways with real user participants (Patton 2012), so test data not only reflects accurate product, but also illuminates issues of process that may need to be adjusted for final implemen- tation. This systematic and systemic approach to assessment design, development, and testing provides the rigor needed to demonstrate accurate assessment and validate data meaningfulness and use. References Allum, J. R., Bell, N. E., & Sowell, R. S. (2012). Graduate enrollment and degrees: 2001 to 2011. Washington: Council of Graduate Schools. Austin, J., Cameron, T., Glass, M., Kosko, K., Marsh, F., Abdelmagid, R., & Burge, P. (2009). First semester experiences of professionals transitioning to full-time doctoral study. College Student Affairs Journal, 27(2), 194–214. Baker, V. L., & Lattuca, L. R. (2010). Developmental networks and learning: toward an interdisciplinary perspective on identity development during doctoral study. Studies in Higher Education, 35(7), 807–827. Bandilla, W., Bosnjak, M., & Altdorfer, P. (2003). Survey administration effects? A comparison of web-based and traditional written self-administered surveys using the ISSP environment model. Social Science Computer Review, 21, 235–243. Belcher, M. J. (1996). A survey of current & potential graduate students. Research report 96–04. Boise: Boise State University. Benishek, L. A., & Chessler, M. (2005). Facilitating the identity development of counseling graduate students as researchers. Journal of Humanistic Counseling Education and Development, 44(1), 16–31. Bloom, J. L., Cuevas, A. E. P., Evans, C. V., & Hall, J. W. (2007). Graduate students’ perceptions of outstanding graduate advisor characteristics. NACADA Journal, 27(2), 28–35. Brinkman, S. N., & Hartsell-Gundy, A. A. (2012). Building trust to relieve graduate student research anxiety. Public Services Quarterly, 8(1), 26–39. Chism, M., Thomas, E. L., Knight, D., Miller, J., Cordell, S., Smith, L., & Richardson, D. (2010). Study of graduate student perceptions at the University of West Alabama. Alabama Counseling Association Journal, 36(1), 49–55. Cicognani, E., Menezes, I., & Nata, G. (2011). University students’ sense of belonging to the home town: the role of residential mobility. Social Indicators Research, 104(1), 33–45. Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: theory and application. The American Journal of Medicine, 119, 116.e7–166.e16. Coulter, F. W., Goin, R. P., & Gerard, J. M. (2004). Assessing graduate students’ needs: the role of graduate student organizations. Educational Research Quarterly, 28(1), 15–26. Council of Graduate Schools. (2012). Findings from the 2012 CGS international graduate admissions survey. Phase III: final offers of admission and enrollment. Washington: Council of Graduate Schools. Davidson-Shivers, G., Inpornjivit, K., & Sellers, K. (2004). Using alumni and student databases for evaluation and planning. College Student Journal, 38(4), 510–520. Delaney, A. M. (2004). Ideas to enhance higher education’s impact on graduates’ lives: alumni recommen- dations. Tertiary Education and Management, 10(2), 89–105. Fagen, A. P., & Suedkamp Wells, K. M. (2004). The 2000 national doctoral program survey: an online study of students’ voices. In D. H. Wulff, A. E. Austin, & Associates (Eds.), Paths to the professoriate: strategies for enriching the preparation of future faculty (pp. 74–91). San Francisco: Jossey-Bass. Farley, K., McKee, M., & Brooks, M. (2011). The effects of student involvement on graduate student satisfaction: a pilot study. Alabama Counseling Association Journal, 37(1), 33–38. Fu, Y. (2012). The effectiveness of traditional admissions criteria in predicting college and graduate success for American and international students. Doctoral dissertation, University of Arizona. Gansemer-Topf, A. M., Ross, L. E., & Johnson, R. M. (2006). Graduate and professional student development and student affairs. New Directions for Student Services, 2006(115), 19–30. Gardner, S. K., & Barnes, B. J. (2007). Graduate student involvement: socialization for the professional role. Journal of College Student Development, 48(4), 369–387. Golde, C. M. (2000). Should I stay or should I go? Student descriptions of the doctoral attrition process. The Review of Higher Education, 23(2), 199–227. Educ Asse Eval Acc (2015) 27:223–251 249
  28. 28. Hardré, P. L. (2012a). Scalable design principles for TA development: lessons from research, theory, testing and experience. In G. Gorsuch (Ed.), Working theories for teaching assistant and international teaching assistant development (pp. 3–38). Stillwater: NewForums. Hardré, P. L. (2012b). Teaching assistant development through a fresh lens: a self-determination theory framework. In G. Gorsuch (Ed.), Working theories for teaching assistant and international teaching assistant development (pp. 113–136). Stillwater: NewForums. Hardré, P. L., & Burris, A. (2011). What contributes to TA development: differential responses to key design features. Instructional Science, 40(1), 93–118. Hardré, P. L., & Chen, C. H. (2005). A case study analysis of the role of instructional design in the development of teaching expertise. Performance Improvement Quarterly, 18(1), 34–58. Hardré, P. L., & Chen, C. H. (2006). Teaching assistants learning, students responding: process, products, and perspectives on instructional design. Journal of Graduate Teaching Assistant Development, 10(1), 25–51. Hardré, P. L., Crowson, H. M., & Xie, K. (2010a). Differential effects of web-based and paper-based administration of questionnaire research instruments in authentic contexts-of-use. Journal of Educational Computing Research, 42(1), 103–133. Hardré, P. L., Nanny, M., Refai, H., Ling, C., & Slater, J. (2010b). Engineering a dynamic science learning environment for K-12 teachers. Teacher Education Quarterly, 37(2), 157–178. Hardré, P. L., Beesley, A., Miller, R., & Pace, T. (2011). Faculty motivation for research: across disciplines in research-extensive universities. Journal of the Professoriate, 5(2), 35–69. Hardré, P. L., Crowson, H. M., & Xie, K. (2012). Examining contexts-of-use for online and paper-based questionnaire instruments. Educational and Psychological Measurement, 72(6), 1015–1038. Hegarty, N. (2011). Adult learners as graduate students: underlying motivation in completing graduate programs. Journal of Continuing Higher Education, 59(3), 146–151. Hephner LaBanc, B. (2010). Student affairs graduate assistantships: an empirical study of the perceptions of graduate students’ competence, learning, and professional development. Doctoral dissertation, Northern Illinois University. Higher Education Research Institute (HERI) (2012). Faculty satisfaction survey. Accessed 15 June 2013 Hyun, J., Quinn, B. C., Madon, T., & Lustig, S. (2006). Needs assessment and utilization of counseling services. Journal of College Student Development, 47(3), 247–266. Kanan, H. M., & Baker, A. M. (2006). Student satisfaction with an educational administration preparation program: a comparative perspective. Journal of Educational Administration, 44(2), 159–169. Kenner, C., & Weinerman, J. (2011). Adult learning theory: applications to non-traditional college students. Journal of College Reading and Learning, 41(2), 87–96. Lipschultz, J. H., & Hilt, M. L. (1999). Graduate program assessment of student satisfaction: a method for merging university and department outcomes. Journal of the Association for Communication Administration, 28(2), 78–86. Lovitts, B. E. (2001). Leaving the ivory tower: the causes and consequences of departure from doctoral study. Lanham: Rowman & Littlefield. Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5–8. Nesheim, B. E., Guentzel, M. J., Gansemer-Topf, A. M., Ross, L. E., & Turrentine, C. G. (2006). If you want to know, ask: assessing the needs and experiences of graduate students. New Directions for Student Services, 2006(115), 5–17. Offstein, E. H., Larson, M. B., McNeill, A. L., & Mwale, H. M. (2004). Are we doing enough for today’s graduate student? The International Journal of Educational Management, 18(6/7), 396–407. Patton, M. Q. (2012). Essentials of utilization-focused evaluation. Thousand Oaks: Sage. Schlossberg, N. K., Waters, E. B., & Goodman, J. (1995). Counseling adults in transition: kinking practice with theory (2nd ed.). New York: Spring. Schram, L. N., & Allendoerfer, M. G. (2012). Graduate student development through the scholarship of teaching and learning. Journal of Scholarship of Teaching and Learning, 12(1), 8–22. Smallwood, S. (2004). Doctor dropout. Chronicle of Higher Education, 50 (19), A10. Retrieved from: http:// Stone, C., van Horn, C., & Zukin, C. (2012). Chasing the American Dream: recent college graduate and the great recession. New Brunswick: John J. Heldrich Center for Workforce Development. 250 Educ Asse Eval Acc (2015) 27:223–251
  29. 29. US Department of Education, National Center for Education Statistics. (2005). Integrated post-secondary education data system, Fall 2004. Washington: US Department of Education. Vincente, P., & Reis, E. (2010). Using questionnaire design to fight nonresponse bias in web surveys. Social Science Computer Review, 28(2), 251–267. Weidman, J. C., Twale, D. J., & Stein, E. L. (2001). Socialization of graduate and professional students in higher education: a perilous passage? San Francisco: Jossey-Bass. Williams-Tolliver, S. D. (2010). Understanding the experiences of women, graduate student stress, and lack of marital/social support: a mixed method inquiry. Doctoral dissertation, Capella University. Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards: a guide for evaluators and evaluation users. Los Angeles: Sage. Educ Asse Eval Acc (2015) 27:223–251 251

    Be the first to comment

    Login to see the comments



Total views


On Slideshare


From embeds


Number of embeds