Kirkpatrick’s 4 Levels of Evaluation Level 1 - Reaction Level 2 - Learning Level 3 - Behavioral Results  (A) Observable (skills) (B) Non-Observable (attitudes) Level 4 - Organizational Results
If you want to know …    Then you must ask at this level…. Did participants like the program?   Did participants learn the content intended? Are they applying skills and behaviors taught? Are they applying non-observable outcomes to the job? Has there been any impact on the organization? Level 1 Level 2 Level 3 (A) Level 3 (B)   Level 4
What do you want to find out with Level 1 questions? Did the program meet the expectations of trainees? What aspects were most helpful? Interesting? Informative? What aspects were least helpful, interesting, or informative? What were participants’ reactions to the program’s design? Pacing? Materials? Precourse work? Instructor? Do people intend to use what they have learned? How? What barriers, if any, do people believe will inhibit their ability to use what they have learned?
Level 2 - Evaluating Learning Measures knowledge, skills, and attitudes Use a control group, if practical Pre-test can serve as a needs assessment Strive for 100% response on questionnaires
Knowledge Tests Essay or open-ended answer tests tests Write in or short answer tests Binary true-false tests Multiple choice tests
Competency Demonstrations: Testing for Skill  Learners are demonstrate competencies while being observed by a trained evaluator. Can use simulations or demonstrations can be part of the learning activity (role plays, etc) Need consistency in observers; multiple observers need to be trained
Level 3 Focuses on Transfer: Are They Using What They Learned?   Affective outcomes  focus on attitudes, values, and beliefs of learners Cognitive outcomes  are the concepts, principles, and knowledge used on the job Behavioral or skill outcomes  address what learners are able to do that can be observed by others
Decisions in Evaluating Level 3 - Behavior  When  to evaluate How often  to evaluate How  to evaluate  Costs vs. Benefits : When is it worth evaluating at Levels 3 & 4?
Guidelines for Level 3 Evaluations Use a control group if practical Allow time for behavior change to take place Evaluate before and after program Survey / interview at least one or more: Trainees Immediate supervisors Trainees’ direct reports
Behavior Change  without Reinforcement
Example of Reinforced Behavior Measured 3 to 6 Months after Training
Evaluating Behavior:   Questions to Ask  Ask if trainees if they are doing anything differently on the job as a result of training? If so, ask them to describe If not, ask why Explore effects of management support, organizational barriers, etc
Level 4 Questions: What’s the Organizational Impact? How much did quality improve? How much did productivity increase? How much did did we save or prevent? (accidents, turnover, wasted time, etc)  What tangible benefits has the organization seen from the money spent on training?
Evaluating Results - Level 4 Look for evidence, not proof Use a control group, if possible, to isolate the effects of training Measure before and after the program Repeat measurement at appropriate times  Consider cost vs. benefits Not as expensive as Level 3 to collect Organizational data often available
Four Major Categories  of Hard Data  Output increases Units produced, items sold or assembled Students graduated, patients visited, applications processed Quality improvement Scrap, rework, rejects, error rates, complaints   Cost savings Unit costs, overhead, operating, program costs   Time savings  Cycle time, overtime, equipment downtime
Categories of Soft Data  Work habits Absenteeism, tardiness, communication breakdowns, first aid treatments   Climate Turnover, discrimination charges, job satisfaction, number of grievances New skills Decisions made, problems solved, grievances resolved,intention to use new skills, frequency of use of new skills, importance of new skills
Categories of Soft Data, con’t  Development  Number promotions, pay increases, training programs attended, PA ratings Satisfaction Favorable reactions, attitude changes, increased confidence, customer satisfaction Initiative Successful completion of new projects, number of suggestions implemented, new ideas implemented
Sources of Data Organizational Performance Records Participants Supervisors of Participants  Direct Reports of Participants Team / Peer Groups Internal or External evaluators as observers

Adlt 606 class 11 kirkpatrick's evaluation model short version

  • 1.
    Kirkpatrick’s 4 Levelsof Evaluation Level 1 - Reaction Level 2 - Learning Level 3 - Behavioral Results (A) Observable (skills) (B) Non-Observable (attitudes) Level 4 - Organizational Results
  • 2.
    If you wantto know … Then you must ask at this level…. Did participants like the program? Did participants learn the content intended? Are they applying skills and behaviors taught? Are they applying non-observable outcomes to the job? Has there been any impact on the organization? Level 1 Level 2 Level 3 (A) Level 3 (B) Level 4
  • 3.
    What do youwant to find out with Level 1 questions? Did the program meet the expectations of trainees? What aspects were most helpful? Interesting? Informative? What aspects were least helpful, interesting, or informative? What were participants’ reactions to the program’s design? Pacing? Materials? Precourse work? Instructor? Do people intend to use what they have learned? How? What barriers, if any, do people believe will inhibit their ability to use what they have learned?
  • 4.
    Level 2 -Evaluating Learning Measures knowledge, skills, and attitudes Use a control group, if practical Pre-test can serve as a needs assessment Strive for 100% response on questionnaires
  • 5.
    Knowledge Tests Essayor open-ended answer tests tests Write in or short answer tests Binary true-false tests Multiple choice tests
  • 6.
    Competency Demonstrations: Testingfor Skill Learners are demonstrate competencies while being observed by a trained evaluator. Can use simulations or demonstrations can be part of the learning activity (role plays, etc) Need consistency in observers; multiple observers need to be trained
  • 7.
    Level 3 Focuseson Transfer: Are They Using What They Learned? Affective outcomes focus on attitudes, values, and beliefs of learners Cognitive outcomes are the concepts, principles, and knowledge used on the job Behavioral or skill outcomes address what learners are able to do that can be observed by others
  • 8.
    Decisions in EvaluatingLevel 3 - Behavior When to evaluate How often to evaluate How to evaluate Costs vs. Benefits : When is it worth evaluating at Levels 3 & 4?
  • 9.
    Guidelines for Level3 Evaluations Use a control group if practical Allow time for behavior change to take place Evaluate before and after program Survey / interview at least one or more: Trainees Immediate supervisors Trainees’ direct reports
  • 10.
    Behavior Change without Reinforcement
  • 11.
    Example of ReinforcedBehavior Measured 3 to 6 Months after Training
  • 12.
    Evaluating Behavior: Questions to Ask Ask if trainees if they are doing anything differently on the job as a result of training? If so, ask them to describe If not, ask why Explore effects of management support, organizational barriers, etc
  • 13.
    Level 4 Questions:What’s the Organizational Impact? How much did quality improve? How much did productivity increase? How much did did we save or prevent? (accidents, turnover, wasted time, etc) What tangible benefits has the organization seen from the money spent on training?
  • 14.
    Evaluating Results -Level 4 Look for evidence, not proof Use a control group, if possible, to isolate the effects of training Measure before and after the program Repeat measurement at appropriate times Consider cost vs. benefits Not as expensive as Level 3 to collect Organizational data often available
  • 15.
    Four Major Categories of Hard Data Output increases Units produced, items sold or assembled Students graduated, patients visited, applications processed Quality improvement Scrap, rework, rejects, error rates, complaints Cost savings Unit costs, overhead, operating, program costs Time savings Cycle time, overtime, equipment downtime
  • 16.
    Categories of SoftData Work habits Absenteeism, tardiness, communication breakdowns, first aid treatments Climate Turnover, discrimination charges, job satisfaction, number of grievances New skills Decisions made, problems solved, grievances resolved,intention to use new skills, frequency of use of new skills, importance of new skills
  • 17.
    Categories of SoftData, con’t Development Number promotions, pay increases, training programs attended, PA ratings Satisfaction Favorable reactions, attitude changes, increased confidence, customer satisfaction Initiative Successful completion of new projects, number of suggestions implemented, new ideas implemented
  • 18.
    Sources of DataOrganizational Performance Records Participants Supervisors of Participants Direct Reports of Participants Team / Peer Groups Internal or External evaluators as observers

Editor's Notes

  • #2 The first level of training program evaluation in Kirkpatrick’s model is the reaction or critique evaluation; at this level you are asking about how satisfied individuals are with the program. Why do we want to know if people are satisfied? If people are not satisfied, they will probably not use what they’ve learned. If they are REALLY unhappy about the program, they may have been so turned off that they didn’t learn anything Sometimes you’ve probably heard this level dismissed as irrelevant - called a “smile sheet” or “happiness index” or “whoopie sheet.” I think this is a mistake, no matter what Jerry Harvey says. If you are dissatisfied with the quality of info obtained from a reaction eval, check the kinds of questions you are asking and how good THEY are.
  • #3 How are we using a reaction eval in this class? In our class, the CIQ serves as a class-by-class Level 1 evaluation, although it differs in one aspect in that it engages you in a process of ongoing reflection that is a little more sophisticated that most Level 1 evaluations. Level 2 asks where or not participants learned what you intended them to learn. This is your quality assurance index for your teaching session. Typical ways of evaluating at this level paper and pencil tests and observed simulations or skill demonstrations. Level 2 in this class? In our class, your written assignments are used as level 2 evaluations, as are your teaching demos and presentations. K. breaks behavior into 2 parts in Level 3-- observable behavior chg and non-observable behavior change. Why is this important? This question asks are participants using what they learned on the job? Are you using what you’ve learned in class in your departments and when you teach? How can we tell?
  • #6 These are the four types of knowledge tests most often used in evaluation - we’ll discuss the advantages / disadvantages of each. Essay & open ended answer - Describe the five teaching perspectives according to Pratt and give examples of how each is used in medical education. Advantages :Easy to construct; allows freedom in answering and adapts well to why and how questions that require higher level cognitive skills. Disadvantages: Must be read and scored by a knowledgeable person; manually scored ; writing ability may affect score. Write in or short answer - Test items are sentences with key words missing. Ex: The theoretical orientation that attempts to account for differences between adults and children as learners is called ………………… Advantages : limited number of correct answers; can be scored by a person with a list of correct answers (little interpretation) Disadvantages : Format does not adapt well to how or why questions and tests cannot be machine scored. Binary (including true-false): Humanistic learning theory is concerned with a) the development of the whole person; (b) creating an environment that will elicit the desired response from a learner. Advantages : Easy to score; can be machine or computer scored; instructions easy to understand; Disadvantages: Questions limited in scope; test writer must have high content knowledge and be able to construct unambiguous statements; there is a tendency for people to view these as “trick” questions and read something into them that is not there.
  • #7 Multiple choice Tests: Vary from binary questions up to as many as five choices. Advantages : Easy to score by machine or computer; Questions can be more complex than binary questions. When constructed with a penalty for wrong answers, participants are less tempted to guess at answers, because sheer guesses will result in a greater number of incorrrect answers. Disadvantages: Do not adapt well to complex answers, such as those dealing with rationales; take more time to develop than binary questions, because the test writer needs to be able to develop logical wrong answers.