ICT role in 21st century education and it's challenges.
Â
Assessing Authentic Problem-Solving In Heat Transfer
1. Paper ID #36855
Assessing authentic problem-solving in heat transfer
Jiamin Zhang
Jiamin Zhang, PhD, is a postdoctoral scholar and lecturer in physics at Auburn University. Her research focuses on
studying authentic problem-solving in undergraduate engineering programs and what factors impact student persistence in
STEM. She earned her PhD in chemical engineering from the University of California, Santa Barbara.
Soheil Fatehiboroujeni (Assistant Professor )
Soheil Fatehiboroujeni received his Ph.D. in mechanical engineering from the University of California, Merced in 2018
focused on the nonlinear dynamics of biological filaments. As an engineering educator and postdoctoral researcher at
Cornell University, Sibley School of Mechanical and Aerospace Engineering, Soheil worked in the Active Learning
Initiative (ALI) to promote student-centered learning and the use of computational tools such as MATLAB and ANSYS in
engineering classrooms. In Spring 2022, Soheil joined Colorado State University as an assistant professor of practice in
the Department of Mechanical Engineering. His research is currently focused on the long-term retention of knowledge and
skills in engineering education, design theory and philosophy, and computational mechanics.
Matthew Ford
Matthew J. Ford (he/him) received his B.S. in Mechanical Engineering and Materials Science from the University of
California, Berkeley, and went on to complete his Ph.D. in Mechanical Engineering at Northwestern University. After
completing a postdoc with the Cornell Active Learning Initiative, he joined the School of Engineering and Technology at
UW Tacoma to help establish its new mechanical engineering program. His teaching and research interests include solid
mechanics, engineering design, and inquiry-guided learning. He has supervised undergraduate and master's student
research projects and capstone design teams.
Eric Burkholder (Postdoctoral Scholar)
Eric Burkholder is an assistant professor of physics and of chemical engineering at Auburn Univeristy. He received his
PhD in chemical engineering from Caltech and spent three years as a postdoc in Carl Wieman's group at Stanford
University. His research focuses broadly on problem-solving in physics and engineerin courses, as well as issues related to
retention and equity in STEM.
Š American Society for Engineering Education, 2022
Powered by www.slayte.com
2. Assessing authentic problem-solving in heat transfer
Introduction
Engineers are known as problem-solvers. In their work, they encounter ill-structured problems
that require them to collect additional information, consider external constraints, and reflect on
their solution process1,2,3,4,5,6
. Recent graduates cite these skills as the most important technical
skills required of them in their everyday work7
. However, there are reports from employers and
researchers that undergraduate students are not prepared for solving these kinds of problems when
they graduate8,9
.
One of the reasons for this skills gap is that the majority of problem-solving in traditional
undergraduate engineering programs consists of solving textbook problems. Textbook problems
are designed to exercise a limited set of knowledge and skills, and thus may not reflect the
problem-solving practices that are used in real-world settings. Textbook problems do not allow
room for students to make their own assumptions, decide what information is needed, decide how
to present their findings, etc. Project-based learning, such as in capstone design courses and
senior labs10,11
, is one alternative that allows room for more student decision-making, but these
opportunities are often limited in undergraduate curricula.
A principal challenge in teaching problem-solving is that one cannot teach what one cannot
measure, and real-world problem-solving skills are difficult to measure. As discussed above,
textbook problems may not measure real-world problem-solving skills. Design projects and other
long-term projects may be more realistic measures of problem-solving practice, but they are
impractical for large-scale assessment. Indeed, the National Academies are calling for the
development and widespread use of research-based assessments to better measure student
understanding and skills in undergraduate programs12
.
There are some existing instruments that aim to measure problem-solving, but they each have
limitations. Some attempt to measure problem-solving using puzzle-like challenges that do not
rely on any scientific or engineering content knowledge13
. These assessments thus do not
measure how one solves engineering problems, and it is not clear to what extent problem-solving
on these assessments correlates with problem-solving in real-world practice14,15,16
. Other
âcritical-thinkingâ or aptitude assessments are commercially available, but due to their proprietary
nature, the evidence for reliability and validity of these assessments is not widely available17,18,19
.
To-date, there are almost no assessments of problem-solving in engineering.
Our aim in this paper is to describe the development and pilot-testing of an assessment of
problem-solving in engineering. We chose heat transfer as the context for our assessment so that
3. it would be useful to educators in a wide variety of engineering disciplines.
Theory
Problem-solving has long been studied in engineering education research12
. Early studies are
based on information-processing models, which posit a step-by-step approach to
problem-solving20
. These models consider how knowledge is represented, the role of background
knowledge, and limits on working memory. Other early studies are grounded in constructivism21
or socioconstructivism22,23
. One limit to these frameworks is that they are often not grounded in
direct empirical evidence.
Cognitive systems engineering24
, grounded in naturalistic decision-making theory25
, incorporates
both cognitive and ecological (contextual) elements of problem-solving and is grounded in
empirical evidence. This work often studies how skilled practitioners make critical decisions
when solving specific problems in a real-world setting. Based on this work, researchers have
developed a framework for characterizing expert problem-solving in science and engineering that
describes problem-solving as a set of 29 decisions-to-be-made26
. How these decisions are made is
argued to be highly context dependent and draws upon deep disciplinary knowledge.
In addition, Price et al.27
have come up with a general template for how to assess these decisions.
The researcher chooses an authentic context and structures the problem so that the solver is asked
to make a relevant subset of the 29 decisions to solve the problem. The basic components of the
framework are 1) provide an authentic problem context, 2) ask a series of questions that require
test-takers to make decisions about problem definition and planning how to solve, 3) provide
more information, 4) ask a series of questions that require decisions about interpreting
information and drawing conclusions, and 5) have test-takers choose and reflect on their solution.
One of the authors has previously developed such an assessment in chemical process design28
.
One important feature of these assessments is that students are not graded based on a scale that
compares them to one another, but rather compares their responses to a consensus of experts. This
represents a philosophical shift in educational assessment. As described in Adams and Wieman29
,
the development phase of the assessment consists of student and expert pilot-testing as an integral
phase of rubric development. We describe this process in greater detail below.
Assessment
When choosing the context for the heat transfer assessment, we wanted to ensure (1) the relevant
physics includes the key concepts in heat transfer (e.g., convection and conduction), (2) any
additional physics should be simple enough to explain, and (3) a professor who does research in
heat transfer or someone from industry who regularly uses heat transfer to design systems should
be able to be considered âexpert-enoughâ to solve the problem. The physical context we chose is
the countercurrent heat exchange between arteries and veins in the human finger. This
countercurrent heat exchange mechanism reduces the degree to which the returning venous blood
must be warmed by the core, at the expense of a lower average temperature in the
extremities.
We situate the participant of the assessment as a member of an engineering team responsible for
4. the design and development of a thermal regulation system for the next generation of space suits.
The preliminary step for the engineering team is to develop a quantitative model of the bodyâs
natural thermal regulation mechanisms, particularly in extremities. The model should capture the
effect of countercurrent heat exchange between arteries (warmer blood) and veins (cooler blood)
in a finger. The assessment begins broadly by asking the participant how they would model the
heat transfer in the finger. Then, it asks what assumptions the participant would make to simplify
the problem, what information they need to solve the problem, and what variables they think the
findings will be most sensitive to. These questions prompt the participant to identify key features
of the problem and think about possible models. We then provide the participant with a model
proposed by their colleague for the heat exchange in the finger and ask more detailed questions
about the feasibility of the model to draw the participantâs attention to some important features of
the model, such as the geometry. We then ask the participant what information they would need to
further evaluate the model. Next, we provide a revised, more detailed model and ask the
participant to give further feedback. We then ask some more detailed questions about the second
model, again to draw the participantâs attention to important features of the model, such as the
boundary conditions and governing equations. The schematics provided at the beginning of the
assessment and for the two models are included in Figure 1. After asking questions about the two
proposed models, we provide a comparison of experimental measurements and simulation results
from the second model and ask the participant whether they think the model matches the
experimental data. Finally, we ask the participant to compare the two proposed models and
discuss how they would model the phenomenon given the two models.
(a) (b)
(c)
Figure 1: Schematics in the heat transfer assessment (a) Blood vessels in a human hand (from Complete
Anatomy App30), (b) Resistor network model (Model 1), (c) Finite element model (Model 2).
5. Figure 2 provides a summary of the sequence of provided information and questions in the heat
transfer assessment. The color coding reflects the decisions relevant to the assessment that are
part of the 29 decisions identified by Price et al.26
Each item in the table in the middle of the
figure represents either provided information or a question to be answered, color coded according
to the template item type. Items with multiple color codes indicate that multiple decisions were
probed in one question. For the items that have half-yellow, information was provided in the
context of a question being asked.
iterations
Figure 2: Assessment sequence with color coding for decision probed. Each item represents either
provided context/information or a question to be answered, color coded according to the template item
type. Items with multiple color codes indicate that multiple decisions were probed in one question (or
where half-yellow, information was provided in the context of a question being asked).
After developing the assessment, we first asked former teaching assistants in heat transfer courses
to take the assessment to ensure that the questions were being interpreted the way we intended
and that the prompts were adequately capturing the thought process of the solver29
. The
assessment was then pilot tested with 12 experts (faculty members who do research in heat
transfer, as well as professional engineers who design heat exchangers) and 12 undergraduate
students. The experts were either acquaintances of members of the research team or volunteers
recruited through email lists of The American Physical Society (APS) and American Society for
Engineering Education (ASEE). The students were undergraduate students who were taking or
had recently taken a heat transfer class in mechanical engineering at the time of the assessment.
We used think-aloud interviews31
for both the experts and students during the pilot-testing. The
participants were given a link to the Qualtrics survey for the assessment and were asked to
complete the assessment and talk about their thought process during the interview. The responses
in Qualtrics, interview transcripts, and audio files were used during the data analysis.
6. Analysis
We used data from the expert responses to create a scoring rubric for the assessment. Students
were graded based on how well their responses agreed with the experts. We started by analyzing
the three closed-response, multiple-choice questions: assumptions, information needed, and
sensitivity (numbers 3, 4, and 5 in Fig. 2). As an example, for the sensitivity question (item 5 in
the assessment), the participants were asked to choose five out of nine variables that they thought
the findings would be the most sensitive to. Choices with stronger expert consensus (either for or
against) were given more weight to the total score than choices on which experts disagreed or
highlighted different key features. The score for each question was calculated using the following
formula:
score =
X
items
(2 â response â 1) â (%consensus â 50%). (1)
For Item 3 (assumptions) and Item 5 (sensitivity), possible answers to each item in the question
are TRUE and FALSE, which translate to values of 0 and 1, respectively. For Item 4 (information
needed), possible answers to each option in the question are Definitely Want, Might Want, and
Donât Need, with values of 1, 0.5, and 0, respectively. The expert consensus is calculated by
summing the expert response values and dividing by the total number of experts. According to
Equation 1, if the %consensus for an item in the assumptions question is 75% and a student
selected TRUE for that assumption, they will get 0.25 added. If a student selected FALSE for that
assumption, they will get 0.25 subtracted. For questions on which experts disagree (i.e., %
consensus is close to 50%), our weighting scheme ensures that these questions only make a very
small contribution to the total score.
To develop the rubric for the open-response questions, we needed to code the expert responses to
identify themes of answers that experts agree on. We used an emergent scheme and did three
rounds of coding, with 3 coders coding separately then discussing the codes for the first two
rounds and 2 coders coding separately then discussing the codes for the third round. The first
round of coding focused on counting codes separately for each question. The second round of
coding focused on the decisions experts and students made when responding to each question.
From the first two rounds of coding, we found many overlapping codes for multiple questions and
grouped questions that got similar answers together for counting the codes: Items 7-11 for
feedback on Model 1, Items 13-17 for feedback on Model 2, Items 19 & 22 for feedback and
questions about the experimental data, and Items 20 & 21 for feedback and questions about the
simulation results. We then used the expert responses to come up with a list of key themes which
were mentioned by multiple experts. For example, the experts overwhelmingly mentioned that
Model 1 (resistor network model) was missing the temperature dependence in the axial direction.
The lists of codes for these groups of questions are included in Table 1.
Items 8, 14, and 21 explicitly asked the participants a yes/no question about the feasibility or
validity of the model. Item 8 asked whether the colleague will successfully investigate the role of
countercurrent heat exchange in the finger using the resistor network model. Out of the 12 experts
interviewed, 7 experts explicitly said the colleague wonât be successful because the first model
has limitations. One expert said the model is a good start, two experts said the model is OK, and
7. two experts didnât explicitly answer the question. Similar level of consensus was achieved for
Items 14 and 21. For Item 14, the majority of experts explicitly stated that the colleague will be
successful using the finite element model; for Item 21, the majority of experts explicitly stated
that the simulation results donât agree with the experimental data.
The other codes listed in Table 1 are derived from either general questions asking the participant
for their feedback or asking about specific aspects of the model, such as the geometry, and
boundary conditions. The open-response nature of the questions results in high variability among
experts and experts can choose a variety of things to mention. Even a few experts mentioning the
same code indicates high importance. Thus, we considered codes that were mentioned by at least
3 experts to be expert consensus items. The other items mentioned by experts were labeled as
expert non-consensus items.
Students were then graded based on how well their responses agreed with the experts. Students
were rewarded for highlighting expert consensus items, and penalized for highlighting extraneous
features that experts did not consider to be important. Not all non-consensus items mentioned by
students are considered extraneous: in the feedback to Model 1, one student stated that the
resistors should be in parallel, which is incorrect. This response was considered extraneous.
When responding to Item 9, âare there important features missing from the modelâ, one student
asked for more material properties. While this was not mentioned by any of the experts, we didnât
label this response as extraneous because it is a reasonable comment consistent with a correct
understanding of the model. For each group of questions, students were awarded one point for
each expert consensus item and penalized 0.5 points for each extraneous item:
score = Nexpert consensus items â 0.5Nextraneous items. (2)
When coding the questions to identify expert-consensus feedback items, we coded Questions 19
& 22 and Questions 20 & 21 separately, because 19 & 22 were about the experiment and data
collection while 20 & 21 were about the model prediction and validation. However, when
calculating the scores, we grouped questions 19, 20, 21 and 22 together because the questions
asked participants to compare the experimental results with the simulation results and thus belong
to the same overall category. Items 2, 23, and 24 were not coded because there was no clear
expert or student consensus, and interview audio files suggested that experts did not interpret
those questions in the manner intended. These questions will be eliminated or significantly
revised in future iterations.
There were a few experts and students who did not complete all of the questions in the Qualtrics
survey. In those cases, we used the recorded interview to fill in the missing responses. After using
this approach, we only had one missing response from one student for one of the options in Item 4
information needed. We left the response blank when calculating the score for that student so the
student did not gain or lose points for the missing response. One expert misinterpreted the
questions about the two models and the experimental data and two experts had incomplete
responses to questions about Model 1 and questions about experiments and simulation. Their
responses were excluded when calculating the statistics and plotting the scores in the Results
section.
8. Table 1: Rubric for scoring the assessment based on responses from eleven experts. The feasibility and
validity items were mentioned by at least seven experts. The other rubric items were mentioned by at
least three experts. The contributing questions refer to the items in Figure 2.
Topic Area Elements of Expert Solution (Expert Consensus Codes) Contributing
Questions
Feedback
on Model 1
(Resistor
Network
Model)
⢠Feasibility: No, colleague wonât be successful using this
modela
⢠Missing axial dependence
⢠Problems with lumped flows or missing capillaries
⢠Missing internal convective resistance
⢠Unclear boundary conditions (Ta, Tv not specified)
7 - 11
Feedback
on Model 2
(Finite
Element
Model)
⢠Feasibility: Yes, colleague will be successful using this
modelb
⢠Remove transient term in equation
⢠Neglect circumferential conduction (in the theta direction)
⢠Neglect axial conduction
⢠Difficulty in estimating external convection coefficient
⢠Problems with lumped flows or missing capillaries
13 - 17
Feedback
on
Experiment
and
Simulation
Questions about experiment and data collection:
⢠Asking about error bars
⢠Asking about measurement probe
⢠Asking about experimental control: environment
⢠Asking about subject differences
19 & 22
Questions about model prediction and validation:
⢠Validity: No, model doesnât match experimental datac
⢠Large temperature mismatch between model and experiment
⢠Asking about error bars in simulation result
⢠Asking about model parameter: perfusion rate
20 & 21
a
Number of experts who explicitly stated colleague wonât be successful : number of experts who explicitly stated
colleague will be successful = 7 : 2
b
Number of experts who explicitly stated colleague will be successful : number of experts who explicitly stated
colleague wonât be successful = 7 : 1
c
Number of experts who explicitly stated model doesnât match experimental data : number of experts who explicitly
stated model matches experimental data = 8 : 0
Results
For the three closed-response questions, the average scores of the students are all lower than that
of the experts and the standard deviations for the student scores are larger than that of the experts.
9. The sensitivity question is the best at distinguishing between expert and student responses, with
the average score for students two standard deviations below the average score for experts. The
assumptions question has the smallest separation between the expert and student performances,
with the average score for students only half of one standard deviation below the average score
for experts. The descriptive statistics for the closed-response questions are included in Table 2.
The maximum possible score was calculated by selecting the expert consensus choice for all of
the options. The number of items indicates the number of sub-questions (or items) in a question.
For example, in the information question, 32 different pieces of information were listed (such as
outside air temperature and average blood temperature in the vein) and the participant was asked
to select from Definitely Want, Might Want, and Donât Need for each piece of information.
Table 2: Descriptive statistics for expert and student scores in closed-response questions (items 3-5).
Assumptions
(Item 3)
Information
(Item 4)
Sensitivity
(Item 5)
Max Possible Score 1.58 7.75 2.00
Min Possible Score -1.58 -7.75 -2.00
Number of Items 10 32 9
Number of Subject 12 12 12
Experts Average 0.99 5.07 1.25
Standard Dev. 0.60 0.88 0.38
Number of Subject 12 12 12
Students Average 0.75 4.00 0.49
Standard Dev. 0.75 1.73 0.63
To compare the performance of experts and students across these three questions that were on
different scales, we rescaled the expert and student scores using the expert average and expert
standard deviation:
scaled score =
score â mean(expert scores)
stdv(expert scores)
. (3)
The scaled scores for Assumptions, Information, and Sensitivity are plotted in Figure 3. The box
encompasses the 25th to the 75th percentiles of each score. Note that the thick center line in the
box plot indicates the median, and thus may differ from zero.
One of the reasons why the assumptions question and information needed question have low
separation between experts and students is that different experts had different interpretations of
the assumptions and information. For example, âignore axial heat transferâ is ambiguous. Several
experts didnât make this assumption because axial advection (heat transported by the flow of
warm blood) shouldnât be ignored. However, in the feedback to Model 2, several experts
recommended ignoring axial conduction. If we were to change the wording of the assumptions
question to âignore axial conductionâ or âignore axial advectionâ, we will likely get better
consensus among experts. Furthermore, experts wonât necessarily make a simplifying assumption
if they think the data can be easily obtained. For example, for âassume specific heat and density of
blood and flesh are the sameâ and âassume thermal conductivity of blood and flesh are the sameâ
some experts didnât make these assumptions because they think the data will be easy to find.
10. Figure 3: Comparison of scaled expert and student scores of closed-response questions. The scores
are scaled by subtracting the average expert score then dividing by the expert standard deviation. The
median student score is lower than the median expert score on all items. The sensitivity question shows
the clearest differentiation between students and experts.
For the open-response questions, we calculated the scores for three groups of questions: Model 1,
Model 2, and experiment & simulation. The statistics of the expert and student scores are
included in Table 3. The maximum possible score is the total number of expert consensus items
for each group of questions. The number of subjects differs because some expert responses were
missing or discarded, as described in the Analysis section. For all of the open-response question
groups, the average scores for the students are lower than that of the experts. Model 2 questions
have the largest separation between expert and student performances, with the average student
score more than one standard deviation below that of the experts. Model 1 questions and
experiment & simulation questions have smaller separation between expert and student
performances, with the average student score less than one standard deviation below that of the
experts. Importantly, although the average scores for Model 1 and Model 2 questions are very
similar for experts, the average Model 2 student score is only one third of the average Model 1
student score. Model 2 is the finite element model, which is a more complicated model compared
to the resistor network model and students are not performing as well for Model 2.
We plotted the scaled scores in Figure 4. The scores are rescaled according to Eqn. 3. One
striking feature of the box plot is that for Model 2 questions, the 75th percentile of the student
score is lower than the 25th percentile of the expert score, again highlighting the large separation
between experts and students for questions regarding the finite element model.
Out of the 12 students in the pilot study, only two students mentioned âremove transient termâ,
and only one student mentioned âneglect axial conductionâ for Model 2. This implies that
students are not very good at making simplifying assumptions for the finite element model. For
Model 1, however, students are much better at identifying the model flaw: âmissing axial
11. Table 3: Descriptive statistics for expert and student scores in open-response questions.
Model 1
Questions
(Items 7-11)
Model 2
Questions
(Items 13-17)
Experiment and
Simulation
(Items 19-22)
Max Possible Score 5 6 8
Number of Subject 9 11 9
Experts Average 2.39 2.41 3.89
Standard Dev. 1.27 1.41 1.27
Number of Subject 12 12 12
Students Average 1.54 0.50 2.83
Standard Dev. 1.22 0.71 1.47
Figure 4: Comparison of scaled expert and student scores of open-response questions. The scores
are scaled by subtracting the average expert score then dividing by the expert standard deviation. The
median student score is lower than the median expert score for all question groups. The Model 2
questions show the clearest differentiation between students and experts.
dependenceâ, with 6 students mentioning this in their response.
Discussion
Across the three closed-response questions and the three groups of open-response questions,
students on average score lower than experts. In particular, the sensitivity question and Model 2
questions have the largest separation between expert performance and student performance.
Although the questions for Model 1 and Model 2 are nearly identical, with the only difference
being the specific feature highlighted in Items 10 and 17, the average Model 2 student score is
only one third of the average Model 1 student score. For Model 1, all of the expert consensus
codes point out flaws in the model (e.g., âmissing axial dependenceâ), whereas only one of the
12. five expert consensus codes for Model 2 is about model flaws (âproblems with lumped flows or
missing capillariesâ). The consensus codes for model flaws correspond to the decision âwhat are
the important underlying features?â in the list of decisions described by Price et al.26
. The
majority of the expert consensus codes for Model 2 suggest simplifications for the model (e.g.,
âremove transient term in equationâ), which corresponds to the decision âwhat approximations
and simplifications to make?â in the list of decisions described by Price et al.26
The differences in
student performance for Model 1 and Model 2 questions suggest that students are good at
identifying important missing features, but need more practice making appropriate
simplifications. We hypothesize this is related to what decisions students had opportunities
practicing in the heat transfer course. Specifically, most textbook problems require students to use
equations to model a heat transfer problem and students are given feedback from the professor on
whether the equations and calculations are correct (i.e., if they missed any key features). However,
students are rarely given models that are correct but too complicated and are asked to simplify the
model. Results from the pilot study suggests that in future courses in heat transfer, students need
to be given more opportunities to practice a variety of the problem-solving decisions, such as how
to make assumptions, how to make simplifications and approximations.
The analysis of responses to the closed-response and open-response questions reveals differences
in the development of the predictive framework26
by the experts and students. In particular, for
the sensitivity question, the average score for experts is higher than that for the students and the
responses across the experts are more consistent. This is because experts have more experience
solving authentic problems which build better intuition about what features one should pay close
attention to.
Limitations and Future Work
Although the current heat transfer assessment was able to distinguish between problem-solving
skills of experts and students, several aspects of the assessment need to be changed to improve the
level of consensus among the experts and make the questions less ambiguous. First, as mentioned
in the Results section, some of the closed-response items are ambiguous and result in different
interpretations by different experts. Second, the items in the assumptions question are all
reasonable assumptions and donât include any options that are obviously wrong. In the next
version of the assessment, we will revise the items in the closed-response questions to remove
ambiguity and include a few assumptions, sensitivity variables, and information needed
mentioned by students that are obviously wrong. These items either come from the interview
transcripts corresponding to the closed-response questions or the extraneous items we coded for
the open-response questions. The advantage of including wrong options mentioned by students is
that experts are very unlikely to choose these options while students are likely to choose these
options, resulting in bigger separation between expert and student performances for the
closed-response questions. Third, for the open-response questions, we plan to remove repetitive
questions that generated similar responses from the participants to make the assessment
shorter.
In addition to revising the assessment and doing pilot studies with experts and students using the
revised assessment, we also plan to conduct pre- and post-test in undergraduate heat transfer
courses to study how well studentsâ problem solving skills improve after taking the course. The
13. assessment will be included as one of the questions on the first homework and last homework in
the course. Furthermore, we plan to develop a closed-response version of all of the questions in
the assessment to automate the scoring of the assessment. This will significantly reduce the time
needed to analyze the student responses and enable a wider distribution of the assessment.
Responses from initial pilot testing, think-aloud interviews with students, and expert responses
will be used to replace the open-response questions with âchoose-manyâ multiple-choice
questions. This will ensure that the questions include reasonable distractors generated by students
as well as answers chosen by a consensus of experts.
Further refinement of this assessment and wide distribution in classrooms will be an important
advance in engineering education research, as it will provide a reliable way to measure
problem-solvingâan important, but not guaranteed outcome of an engineering education. It can
be used to answer many different research questions, e.g. are there differences in outcomes
between traditional heat transfer courses and heat transfer courses that focus on developing
problem solving skills? It can also be used as an assessment tool for various engineering
departments to decide whether their undergraduate programs are adequately preparing students
for the workplace. Improving our ability to measure problem-solving is an important step in being
able to improve the way we teach problem-solving to undergraduate students and prepare them
for engineering careers. We hope to encourage other educators to use this assessment in their
courses to measure how well they are preparing their students to solve real-world engineering
problems.
References
[1] ABET, âCriteria for accrediting engineering programs, 2019 â 2020,â 2019.
[2] C. L. Dym, âDesign, systems, and engineering education,â International Journal of Engineering Education,
vol. 20, no. 3, pp. 305â312, 2004.
[3] C. L. Dym, A. M. Agogino, O. Eris, D. D. Frey, and L. J. Leifer, âEngineering design thinking, teaching, and
learning,â Journal of Engineering Education, vol. 94, no. 1, pp. 103â120, 2005.
[4] D. Jonassen, J. Strobel, and C. B. Lee, âEveryday problem solving in engineering: Lessons for engineering
educators,â Journal of Engineering Education, vol. 95, no. 2, pp. 139â151, 2006.
[5] D. H. Jonassen, âDesigning for decision making,â Educational Technology Research and Development, vol. 60,
no. 2, pp. 341â359, 2012.
[6] N. Shin, D. H. Jonassen, and S. McGee, âPredictors of well-structured and ill-structured problem solving in an
astronomy simulation,â Journal of Research in Science Teaching, vol. 40, no. 1, pp. 6â33, 2003.
[7] H. J. Passow, âWhich ABET competencies do engineering graduates find most important in their work?â
Journal of Engineering Education, vol. 101, no. 1, pp. 95â118, 2012.
[8] Q. Symonds, âThe global skills gap in the 21st century,â 2018. [Online]. Available:
https://www.qs.com/portfolio-items/the-global-skills-gap-in-the-21st-century/
[9] C. Grant and B. Dickson, âPersonal skills in chemical engineering graduates: the development of skills within
14. degree programmes to meet the needs of employers,â Education for Chemical Engineers, vol. 1, no. 1, pp.
23â29, 2006.
[10] A. J. Dutson, R. H. Todd, S. P. Magleby, and C. D. Sorensen, âA review of literature on teaching engineering
design through project-oriented capstone courses,â Journal of Engineering Education, vol. 86, no. 1, pp. 17â28,
1997.
[11] C. L. Dym, âLearning engineering: Design, languages, and experiences,â Journal of Engineering Education,
vol. 88, no. 2, pp. 145â148, 1999.
[12] N. R. Council, âDiscipline-based education research: Understanding and improving learning in undergraduate
science and engineering,â Washington, DC: The National Academies, 2012.
[13] W. K. Adams, âDevelopment of a problem solving evaluation instrument; untangling of specific problem
solving skills,â Unpublished Doctoral Dissertation, University of Colorado, 2007.
[14] C. J. Harris, J. S. Krajcik, J. W. Pellegrino, and A. H. DeBarger, âDesigning knowledge-in-use assessments to
promote deeper learning,â Educational Measurement: Issues and Practice, vol. 38, no. 2, pp. 53â67, 2019.
[15] F. Fischer, C. A. Chinn, K. Engelmann, and J. Osborne, Scientific Reasoning and Argumentation: The Roles of
Domain-specific and Domain-general Knowledge. Routledge, 2018.
[16] P. Kind and J. Osborne, âStyles of scientific reasoning: A cultural rationale for science education?â Science
Education, vol. 101, no. 1, pp. 8â31, 2017.
[17] ACT CAAP Technical Handbook. Iowa City, IA: ACT, 2007.
[18] S. Klein, R. Benjamin, R. Shavelson, and R. Bolus, âThe collegiate learning assessment: Facts and fantasies,â
Evaluation Review, vol. 31, no. 5, pp. 415â439, 2007.
[19] E. T. S. (ETS), MAPP Userâs Guide. Princeton, NJ: Educational Testing Service, 2007.
[20] D. P. Simon and H. A. Simon, âIndividual differences in solving physics problems,â in Childrenâs Thinking:
What Develops?, R. S. Siegler, Ed. Lawrence Erlbaum Associates, Inc, 1978, pp. 325â348.
[21] J. Piaget, Success and Understanding. Cambridge, MA: Harvard University Press, 1978.
[22] J. Lave and E. Wenger, Situated Learning: Legitimate Peripheral Participation. Cambridge university press,
1991.
[23] L. B. Resnick, âShared cognition: Thinking as social practice,â in Perspectives on Socially Shared Cognition,
L. B. Resnick, J. M. Levine, and S. D. Teasley, Eds. American Psychological Association, 1991, pp. 1â20.
[24] G. Lintern, B. Moon, G. Klein, and R. R. Hoffman, âEliciting and representing the knowledge of experts,â in
The Cambridge Handbook of Expertise and Expert Performance, 2nd ed., K. A. Ericsson, R. R. Hoffman,
A. Kozbelt, and A. M. Williams, Eds. Cambridge, UK: Cambridge University Press, 2018, pp. 165â191.
[25] K. Mosier, U. Fischer, R. R. Hoffman, and G. Klein, âExpert professional judgments and ânaturalistic decision
makingâ,â in The Cambridge Handbook of Expertise and Expert Performance, 2nd ed., K. A. Ericsson, R. R.
Hoffman, A. Kozbelt, and A. M. Williams, Eds. Cambridge, UK: Cambridge University Press, 2018, pp.
453â475.
[26] A. M. Price, C. J. Kim, E. W. Burkholder, A. V. Fritz, and C. E. Wieman, âA detailed characterization of the
expert problem-solving process in science and engineering: Guidance for teaching and assessment,â CBEâLife
Sciences Education, vol. 20, no. 3, p. ar43, 2021.
[27] A. M. Price, E. W. Burkholder, S. Salehi, C. J. Kim, V. Isava, M. P. Flynn, and C. E. Wieman, âAn accurate and
practical method for measuring science and engineering problem-solving expertise,â Submitted.
[28] E. Burkholder, A. Price, M. Flynn, and C. Wieman, âAssessing problem solving in science and engineering
programs,â in Proceedings of the Physics Education Research Conference, 2019.
15. [29] W. K. Adams and C. E. Wieman, âDevelopment and validation of instruments to measure learning of
expert-like thinking,â International Journal of Science Education, vol. 33, no. 9, pp. 1289â1312, 2011.
[30] Complete Anatomy App. [Online]. Available: https://3d4medical.com/press-category/complete-anatomy
[31] K. A. Ericsson and H. A. Simon, âVerbal reports as data,â Psychological Review, vol. 87, no. 3, p. 215, 1980.