WESTERBERG, VM Page 1 of 13 INTRODUCTION TO PSYCHOLOGICAL RESEARCH - 2 Research report extract:Friedenberg, M.A., & Kendler, B. (1999). Double-blind study of the possible proximityeffect of sucrose on skeletal muscle strength. Perceptual and Motor Skills 89, 966-968. Student´s t test and chi square analyses. Westerberg, V.M. Date: 7 May 2010
WESTERBERG, VM Page 2 of 13 INTRODUCTION TO PSYCHOLOGICAL RESEARCH 2Question 1Critically evaluate the main conclusion of the study: Strengths, weaknesses andoverall determination of the trustworthiness of the study.The validity of the design of any experimental research study is a fundamental part of thescientific method. Without a valid design, valid scientific conclusions cannot be drawn.Internal validity is an estimate of the degree to which conclusions about causalrelationships can be made (e.g.: cause and effect), based on the measures used, theresearch setting and design. Experiments with high internal validity lead to trustworthyresults.There are factors that can strengthen or weaken internal validity. Among the former are:the use of a standardised scale, an adequate variable level number (≥2, otherwise itcannot vary), correct choice of magnitude levels and sensitive variable relationship(linear, curvilinear). Internal validity can be weakened by: Poor construct validity(inadequate cause-effect relationship), poor extraneous variable control (noise), samplebias, drop outs (not notified, not managed), measures invalidity / unreliability, statisticalanalysis inadequacy (lacking data, wrong calculations).We are also asked to evaluate the main conclusion of the study. The statistical conclusionvalidity is the degree to which conclusions reached about relationships between variablesare justified. This involves adequate internal validity, that is, ensuring adequate samplingprocedures, appropriate statistical tests, and reliable measurement procedures.Conclusion validity is only concerned with whether there is any kind of relationship at allbetween the variables being studied as it may only be a correlation.Construct validity involves the quality of choices made about the particular forms of theindependent and dependent variables. These choices will affect the quality of researchfindings. Threats to construct validity can arise from the choice of treatment (theoperationalization of the IV, and the delivery of the experiment), the choice of outcomemeasure (the operationalization of the DV, and the administration of the measurement).Inadequate operationalization of the IV, like lack of reliability (measurements varyingfrom measurement to measurement) obscures the relationship being studied. We havenot been given the results of the measurements, and therefore it is not possible to assess
WESTERBERG, VM Page 3 of 13this point. The IV should be representative, the procedure must show an adequateoperational representation of the theoretical construct of interest, and have a measurableimpact on research participants.Further threats to the internal validity of a study are experiment artifacts, those that arisefrom the presentation of the study to the participants or to the research setting: Howmotivated is the researcher / each participant?, are his/her instructions clear, is theenvironment adequate for the chosen experiment?, is there a language/ disabilitybarrier?Lack of control of extraneous and moderating variables can interfere with internalvalidity (i.e.: with the attempt to isolate causal relationships). Events outside theexperiment or between repeated measures of the dependent variable (extraneous effectsor history) may affect participants responses to experimental procedures. They canaffect participants attitudes and behaviours in a way that it becomes impossible todetermine whether any change on the dependent measures is due to the independentvariable, or the historical event. One example is the nutritional status of participants. Thestudy says that volunteers were in a fasting state and that they did not take any breakfastbecause the experiment was done relatively early in the morning. That is not a goodenough reason to assume that they actually did not have any meal intake. If in fact, aspresumed by the researchers, participants had been fasting, they would be weaker andmay have found it difficult to hold the bottles, with the subsequent alteration of theoutcome of the experiment in favour of muscle weakness. Pre and postprandial statuscould be moderating variables to consider if the test is repeated.As the experiment progresses, as time passes, maturation takes place. This is not specificto particular events and range from growing hungrier, more tired, uninterested,discouraged, etc. These events may change the way a subject would react to theindependent variable, so that upon completion of the study, the researcher may not beable to determine if the cause of the discrepancy is due to time or the independentvariable.The effects of taking this test repeatedly (practice effects) would not make participantsany wiser; they cannot learn anything new from the previous hold that they can apply tothe next one. Practice effects should not be expected in this test.Researchers used a commercialised cable tensiometer to measure maximal isometricvoluntary contraction (MIVC) of the biceps brachii for the operationalization(instrumentation) of the DV. Every measure should give stable results. If the variation inmeasurements is large, the usefulness of the measure would be compromised and sowould the result of the study. Additionally, the sensitivity of the dependent variableshould be enough to detect any possible relevant difference in outcome.
WESTERBERG, VM Page 4 of 13An isometric exercise means maintenance of muscle contraction for a period of timeinvolving the action of a large number of different joints and muscle groups. The studyclaims that they were specifically measuring contraction of one single muscle, the bicepsbrachii. For the purpose of single-muscle evaluation, dynamometers have been developedthat provide more specific, but still not conclusive, information on individual muscles.Detailed information about the characteristics of the tensiometer used have not beenprovided.Additionally, values obtained from MVIC testing are difficult to interpret at present asnormative data are limited. Another question is whether MVIC was performed using theQuantitative Muscle Assessment system and whether age and sex related referencevalues were calculated. The way age could alter the results would be that younger peopleare more resistant to fatigue. The way sex could influence the outcome of the test isbecause, for a given weight, men are stronger than women and more resistant to fatigue.When you are fatigued you have to work harder (contract the muscles more) to keep upwith the workload. When fatigue is so intense that it weakens the individual, he / she maywant to cease the activity and not be able to go through the test.Looking at the choice of test to measure muscle strength, a tensiometer (ordynamometer) seems like an adequate one. Electromiography would be the ideal test toevaluate muscle strength, but it is not ethically acceptable for this purpose in humans. Itwould be expected that the tensiometer be well calibrated and display interval data ofmeasurements. As for the scale of measurement, the International System ofMeasurement appears to have been applied but inconsistently. The containers (bottles)are said to be “45-gm bottles”. If they meant “grams”, they should have written “g” orbetter yet, using the SI base units, 0.045Kg. If the bottles were identical, they would havethe same size, shape and colour, and ideally, for the purpose of this experiment, the sameweight. But if the bottles had the same shape and size, they could hardly have had thesame weight as the relative density of sand is highly heterogeneous depending on itsconstituents (types of minerals), degree of compaction and humidity, whereas sucrosedensity is homogeneous (δ=1.587 g/cm3 ). This means that the test results would not beattributable to close proximity of sucrose to the skeletal muscle, but to the difference inweight of the bottles. The results would account for the fact that the sand bottle is lighter,easier to lift, requiring less muscle tension which was mistakenly interpreted as sucroseproximity to skeletal muscle increases strength. Finally, bottle size should have beenexpressed in units of volume (litres or cm3), not in grams which is a unit of weight.Instrumentation, changes in calibration, of a measurement tool or changes in theobservers or scorers, may produce changes in the obtained measurements. No mention ismade about this. If any instrumentation changes occurred or if the tensiometer became
WESTERBERG, VM Page 5 of 13disadjusted because of repeated use, the internal validity of the main conclusion would beaffected, as alternative explanations could account for the results obtained.We do not know, because it is not mentioned in the study, whether possible outliers tookpart in the test: Stronger or weaker than average (than the mean) individuals. This wouldaffect the internal validity of the conclusion as the reduction in power could beattributable to the presence of outliers and not to sucrose proximity to skeletal muscle.Drop out occurrence was not mentioned (were there any?, what are the characteristics ofthose who dropped out? At which point did they drop out?). If there were any, they couldhave influenced the results if it had been the weaker participants, like women, whodropped out, leaving the strongest volunteers to carry on with the test and who mightconsider that the bottle of sucrose was comparatively quite light, therefore being able tohold it with a lesser effort than weaker individuals. If dropping out leads to relevant biasbetween groups, then alternative explanations are possible that account for the observeddifferences.Experimenter bias was considerably reduced through the use of a double blind studydesign, in which the experimenter is not aware of the condition to which each participantis exposed to. This is a good point found in this experiment. In a double-blind experiment,neither the individuals nor the researchers know who belongs to the control group andthe experimental group. Only after all the data have been recorded (and in some cases,analyzed) do the researchers learn which individuals are which. Performing anexperiment in double-blind fashion is a way to lessen the influence of the prejudices andunintentional physical cues on the results (the placebo effect, observer bias, andexperimenters bias). The key that identifies the subjects and which group they belongedto is kept by a third party and not given to the researchers until the study is over.Additionally, subjects were also blind to the hypothesis to avoid participant bias (e.g.:desire to prove / deny the research hypothesis).A further positive point in this study is that the order of sucrose and sand presentationswas counterbalanced across subjects, so that order effects, even though they are notremoved, are taken into account and controlled.The choice of study design, within-subject, is also adequate. Within-subject designs havemore statistical power than between-subject designs. This kind of design requires fewerparticipants (good for a small sample like the one in this study) and offers better controlof stable participant variables, like age and gender. The sand test was intended to be thecontrol group in this study, but it really isn´t, because the result of the sand group testmake the results of the sucrose test agree with the researchers hypothesis, reflected inthe difference in muscle tension values of the sucrose group (the muscle tension recordedfor the sucrose group is greater than that of the sand group, therefore researchers infer
WESTERBERG, VM Page 6 of 13that sucrose proximity to skeletal muscle affects muscle strength). Moreover, reasonsshould have been given regarding why they assumed that sand and not sucrose is inert.The assumption that the sucrose and the sand bottles weigh the same is incorrect. Bothbottles should weigh the same, not just be the same size, shape and colour (see previousexplanation about relative density differences).The choice of statistical test is correct, Student´s t test, which evaluates if there is adifference between sample means and is it a chance effect or not. It is a test ofsignificance. The study mentions that “a paired t test was used to analyse responses”,meaning that the study design is a within-group one.The choice of alpha level, exploratory level, is correct and it indicates that researchers areready to accept a type 1 error or taking a 5% chance of rejecting the null hypothesis whenit is true. The way the statistical terms are formulated in the conclusion is inadequate.There is no need to mention units of measurement, like Kg, after the media values. The tvalue given lacks the degrees of freedom in parentheses. There should be no “vs” inbetween the results of the two tests. The conclusion should read like this: The t test indicated that the mean maximal isometric voluntary contraction of the biceps brachii while holding the bottle of sucrose (M= 18.05, SD= 5.46) was significantly greater than when holding the bottle of sand (M= 17.86, SD= 5.27), t(27)=-2.08, p<0.05Given the flaws mentioned in internal validity, one wonders if the calculations have beendone correctly. They haven´t.The critical t value (α = 0.05, two-tailed test, df= N-1=27) is 2.052.The study has a within-subject design, as they mention that “a paired t test was used toanalyse the results” and that “the significant level was set at p<.05”The t observed value given is t= -2.08. According to the data provided, the t observed is5.27.tobs = Mtest-Mcontrol (SDtest- SDcontrol) / N 18.05 – 17.86(5.46-5.27) / 280.190.19 5.29
WESTERBERG, VM Page 7 of 130.19 = 5.270.036tobs = 5.27This is a positive value, in keeping with the first, therapy or sucrose, mean being largerthan the second, control or sand, mean, implying that sucrose proximity affects musclestrength, but not negatively as the previous researches had show, but positively,increasing it. The current study research hypothesis only says that sucrose proximityaffects muscle strength, not in which direction. The negative t value provided ismisleading, apart from wrong.tobserved > tcritical . This means that the null hypothesis may be rejected and that the resultsobtained are not likely to be due to a chance effect.With regard to the main conclusion of the study, just emphasize that internal validityestimates the degree to which conclusions about causal relationships can be made basedon the measures used, the research setting and the whole research design. A goodexperimental technique, in which the effect of an independent variable on a dependentvariable is studied under highly controlled conditions, will allow for higher degrees ofinternal validity and for highly trustworthy results, whatever these may be. Even if theresult concludes that the null hypothesis should be accepted, the test will have shownsome light with regard to the hypothesis tested.In view of the deficient procedure, wrong calculations and the weak internal validity ofthe test, results should be interpreted with caution and a re-test with a larger, morehomogeneous sample and close observance of internal validity criteria and mathematicalcalculations are highly advisable.Question 2Imagine a hypothetical study that compared the muscle strength of a group of people inclose proximity to sucrose to the muscle strength of another group of people who were inclose proximity to sand (a placebo group).Using the following hypothetical data carry out an appropriate t test to see if the groupmeans are significantly different. The dependent variable is the same as in the journalarticle, with strength being measured in kilograms (kg).Group 1 (proximity to sucrose) 18, 20, 17, 18, 19, 17, 19, 17, 19, 20 kgGroup 2 (proximity to sand) 18, 19, 17, 17, 15, 19, 16, 15, 17, 18 kg
WESTERBERG, VM Page 8 of 13Show all seven steps of null hypothesis testing. All mathematical calculations mustbe shown. Report the results as you would in a research report.********Use a two-tailed test and an alpha level of 0.05 (at those steps explain why those choiceswere made).Step 1: State the hypotheses.Η0: μ0 = μ1 . Null hypothesis: There is no difference in the mean skeletal muscle strengthbetween the sucrose group and the sand group.Η1: μ0≠ μ1. Research hypothesis: There is a difference in the mean skeletal muscle strengthbetween the sucrose group and the sand group. (Note: In a two-tailed test the differenceis non-directional, change can happen in any direction)Step 2: Select an appropriate alpha level: Use 0.05With an alpha (α) level of 0.05 it is easier to get significant results with small samples butthere is a risk of making a Type I error, that is, the error of rejecting the null hypothesiswhen it is true. Accepting a Type I error, we say we are observing a difference when intruth there is none, thus indicating a test of poor specificity. A Type I error is a falsepositive and can be regarded as an error of excessive credulity.Step 3: Select the correct statistical test.If we are asked to determine if there is a statistically significant difference between twosample means, we will use a t test. The difference in the sample means must be such thatwill allow us to confidently say that it reflects a real difference in the population ofinterest, not attributable to a chance effect.Types of t test:Unrelated (independent) t test – between subject design.Related (dependent or paired) t test – within subject design.Single sample t test – compares a group mean with some known value from a previousresearch or test.The problem test has a between subject design and an unrelated (independent) t testshould be used.Step 4: Check the test statistic assumptions.
WESTERBERG, VM Page 9 of 13Interval or ratio data?Interval is a measurement where the difference between two values is meaningful andfollows a linear scale. For example: in physics, a temperature 0 degrees in any scale doesnot mean no temperature; in biology, a pH of 0.0 does not mean no acidity. Intervaldata is continuous data where differences are interpretable, ordered and follow aconstant scale, but there is no “natural” zero meaning “absence of”. Examples aretemperature, dates and pH.Ratio is the relation in degree or number between two similar things or a relationshipbetween two quantities, ordered, constant scale, with natural zero. Ratio data isinterpretable. Ratio data has a natural zero. Examples are height, weight, age and length.The problem study dependent variable measures strength in weight units (kilograms).Weight is ratio data.Step 5: Calculate the critical value.Alpha level = 0.05Two-tailed testDegrees of freedom (df) = (n1-1) + (n2-1) OR (n1 + n2) – 2The degrees of freedom are the number of observations that are free to vary and supplyindependent bits of information. A critical t value table can be consulted to obtain therequested value.(df) = (n1 + n2) – 2 = (10+10)-2 = 18Critical t value (α= 0.05, two-tailed test, df=18)= ±2.101 (=±2.10)tcritical=±2.10(Note: The ± sign in the critical t value only indicates direction. For statistical purposes and tocompare this value with the t observed , only the numerical or absolute value will be taken intoaccount)Step 6: Calculate the test statistic observed value.tobs = tobserved , MT = media of the therapy (sucrose) group , MC = media of the control(sand) group, nT= sample size of the therapy (sucrose) group , nC= sample size of thecontrol (sand) group SD = standard deviation, varT = variance (=SD2) of the therapy(sucrose) group, varC = variance (=SD2) of the control (sand) group.(Note: Only the final value, the value of interest, in this case tobs will be rounded up to the seconddecimal so as to avoid excessive error deviations in calculations)tobs = MT-MC SDpooled= [(varT – varC)/2] (when the 2 groups are SDpooled (1/nT+1/nC) the same size)
WESTERBERG, VM Page 10 of 1318.4 – 17.1 SDpooled = [(SDT2 – SDC2)/2] =SDpooled (1/nT+1/nC) = [(1.1732 – 1.4492)/2] = 1.3181.31.318 (1/10+1/10)1.31.318 (2/10)1.31.318 0.21.31.318 0.4471.3 = 2.207 (= 2.21)0.589tobs= 2.21Let us now evaluate the results: tobs= 2.21 > tcrit= 2.10 Therefore we reject the nullhypothesis and say there is a difference in the mean of the two groups that is not due to achance effect.Step 7: State the outcome of the test.(Note: Values have been rounded up to the second decimal)i.- For a formal reportThe skeletal muscle strength of the sucrose group (M= 18.40, SD= 1.17) was significantlygreater than the skeletal muscle strength of the sand group (M= 17.10, SD= 1.45), t(18),p<0.05 .Therefore we reject the null hypothesis and say there is a difference in the mean of thetwo groups that is not due to a chance effect. The possibility of making a Type I error (saythere is a difference in skeletal muscle strength when in fact there isn´t), is ≤ 5% which,for the purpose of this study, is acceptable.ii.- In ordinary language.
WESTERBERG, VM Page 11 of 13The control group (the sand group) had an average skeletal muscle strength of 17.10compared to an average skeletal muscle strength of 18.40 for the therapy group (sucrosegroup). These results speak in favour of the possibility that somehow sucrose proximityto skeletal muscle affects muscle strength, increasing it, and we believe that this result isnot likely to have happened by chance.Question 3Let us imagine that the machine recording muscle strength could only indicate if a personwas "Strong" or "Not Strong". Strong people are those that can exert a force of 18kg ormore. "Not Strong" people exert a force less than 18kg.Using the interval data in Question 2 transform the participants into "Strong" and "NotStrong" categories, and then carry out an appropriate chi square test to see if there is asignificant difference between the sucrose and sand groups. Again use an alpha level of0.05.Show all seven steps of null hypothesis testing. All mathematical calculations mustbe shown. Report the results as you would in a research report.Explain any discrepancy between the results of the t test and the chi square test.Step 1: State the hypothesesΗ0: f0 = fe Null hypothesis. The effect of sucrose in skeletal muscle strength equals the effect of sand in skeletal muscle strength.Η1: f0≠ fe Research hypothesis. The effect of sucrose in skeletal muscle strength differs from the effect of sand in skeletal muscle strength.Step 2: Select an appropriate alpha level: Use 0.05With an alpha (α) level of 0.05 it is easier to get significant results with small samples butthere is a risk of making a Type I error, that is, the error of rejecting the null hypothesiswhen it is true. Accepting a Type I error, we say we are observing a difference when intruth there is none, thus indicating a test of poor specificity. A Type I error is a falsepositive and can be regarded as an error of excessive credulity.Step 3: Select the correct statistical test.Frequency of occurrence of nominal (discrete) data (categories) : Chi square (χ2) test.Step 4: Check the test statistic assumptions.
WESTERBERG, VM Page 12 of 13 1. Frequency of occurrence (fo) of data (the number of observations per category) 2. The observations must be independent (unrelated data), that is, there must be different people in each category. 3. Expected frequencies (fe) > 5 for every category. That is, there must be a sufficient number of observations in each category. (After the calculations below, two categories do not meet this requirement, and therefore, results should be interpreted cautiously and a re-run of the test with a larger sample is recommended.)Step 5: Calculate the critical value.Alpha level = 0.05Two-tailed test .Degrees of freedom (df) = (#rows-1) (#columns-1) = (2-1) (2-1) = 1 .The degrees of freedom are the number of observations that are free to vary and supplyindependent bits of information. A critical t value table can be consulted to obtain therequested value.(df) = (n1 + n2) – 2 = (10+10)-2 = 18Critical χ2 value (α= 0.05, two-tailed test, df=1)= 3.84χ2 critical = 3.84Step 6: Calculate the test statistic observed value.fe= [(row total)(column total)]/grand totalχ2 observed = [(fo - fe)2]/ fe CELL fo fe fo - fe (fo - fe)2 [(fo - fe)2]/ fe MarginalsSucrose 7 (11x10)/20=5.5 7-5.5 = 1.5 2.25 2.25/5.5= 0.41 0.41strongSucrose 3 (9x10)/20=4.5 3-4.5 = -1.5 2.25 2.25/4.5= 0.50 0.5not strongSand strong 4 (11x10)/20=5.5 4-5.5 = -1.5 2.25 2.25/5.5= 0.41 0.41Sand 6 (9x10)/20=4.5 6-5.5 = 1.5 2.25 2.25/4.5= 0.50 0.5not strongMarginals 20 20 0 χ2 observed= 1.82χ2 observed= 1.82 < χ2 critical = 3.84 Therefore, we accept the null hypothesis and saythere is no difference in skeletal muscle strength between the sucrose group and the sandgroup.
WESTERBERG, VM Page 13 of 13The sample is too small. Two of the data cells had values <5, therefore these findingsshould be treated with caution as one of the statistical assumptions was not met.Step 7: State the outcome of the test.i.- For a formal reportThere is no significant difference in skeletal muscle strength between the sucrose group(35%, N=7) and the sand group (20%, N=4), χ2 (1, N=20), p>0.05The sample is too small. The larger the sample, the greater the likelihood is of getting asignificant result. Additionally, two of the data cells had values <5, that is, one of thestatistical assumptions was not met. Therefore the results should be treated with cautionand it would be advisable to repeat the study with a larger sample.ii.- In ordinary language.Despite the fact that 35% (7) of the participants showed increased strength in thepresence of sucrose proximity to skeletal muscle compared with only a 20% (4) strengthincrease in the sand group, this result is probably due to a chance effect. However, thetrend in the sample results is in favour of strength increase in the sucrose group and inview of the low sample size, it would be prudent to repeat the study with moreparticipants.Explain any discrepancy between the results of the t test and the chi square test.The results of the t test speak in favour of sucrose having a moderate positive influenceon skeletal muscle strength, whereas the results of the chi square test indicate that thedifference may be due to a chance effect. Again, given the reduced number of participantsa re-run of the test with a larger sample may offer more convincing evidence regardingthe influence of sucrose proximity in skeletal muscle strength in one or another direction,taking into account that previous researches had shown an adverse effect on musclestrength of sucrose proximity to skeletal muscle.