Research Methodology
What many do not get about this topic
The people who have changed how we think about science and
the world were often rebels or had very ‘radical ideas’ which
threatened the established order of the predominant world view
Galileo has been called the "father of modern observational astronomy", the "father of modern
physics", the "father of science His observations of the satellites of Jupiter
caused a revolution in astronomy: a planet with smaller planets
orbiting it did not conform to the principles of Aristotelian
cosmology, which held that all heavenly bodies should circle the
Earth and met with opposition from astronomers, who doubted
heliocentrism The matter was investigated by the Roman Inquisition
in 1615, which concluded that heliocentrism was false and contrary
to scripture, placing works advocating the Copernican system on the
index of banned books and forbidding Galileo from advocating
heliocentrism. Galileo was one of the first modern thinkers to clearly
state that the laws of nature are mathematical
The Rebels• The first of the great anatomists was Galen of Pergamon (AD 130-
200) who made vast achievements in the understanding of the heart,
the nervous system, and the mechanics of breathing. Because
human dissection was forbidden, he performed many of his
dissections on Barbary apes, which he considered similar enough to
the human form. The system of anatomy he developed was so
influential that it was used for the next 1400 years. Galen continued
to be influential into the 16th century, when a young and rebellious
physician began the practice of using real human bodies to study the
inner workings of the human body
• Andreas Vesalius who came from a line of four prominent family
physicians. Vesalius and other like-minded anatomy students would
raid the gallows of Paris for half-decomposed bodies and skeletons
to dissect. Rather than considering dissection a lowering of his
prestige as a doctor, Vesalius prided himself in being the only
physician to directly study human anatomy since the ancients.
Although he respected Galen Vesalius often found that his study of
The Rebels
• Like his fellow revolutionary scientists, Vesalius’ masterpiece
was met with harsh criticism. Many of these criticisms
understandably came from the church, but the most strident
of all came from Galenic anatomists. These critics vowed
that Galen was in no way incorrect, and so if the human
anatomy of which he wrote was different from that which
was proved by Vesalius, it was because the human body had
changed in the time between the two. As a response to the
harsh criticisms of his work, Vesalius vowed to never again
bring forth truth to an ungrateful world. In the same year
that he published de humani(1543), he burned the
remainder of his unpublished works, further criticisms of
Galen, and preparations for his future studies. He left
medical school, married, and lived out the rest of his
conservative life as a court physician (source brain blogger)
Not what but who you know
French chemist and microbiologist renowned for his
discoveries of the principles of vaccination, microbial
fermentation and pasteurization. His medical discoveries
provided direct support for the germ theory of disease and
its application in clinical medicine-popularly known as the
"father of microbiology".
In 1847 he was given a 2 year appointment as an assistant in
obstetrics with responsibility for the First Division of the
maternity service of the vast Allgemeine Krankenhaus teaching hospital
in Vienna. There he observed that women delivered by
physicians and medical students had a much higher rate (13–
18%) of post-delivery mortality (called puerperal fever or
childbed fever) than women delivered by midwife trainees or
midwives (2%).
Agree to Disagree (disagreeably)
• This case-control analysis led Semmelweis to
consider several hypotheses. He concluded that the
higher rates of infections in women delivered by
physicians and medical students were associated
with the handling of corpses during autopsies
before attending the pregnant women. This was not
done by the midwives. He associated the exposure
to cadaveric material with an increased risk of
childbed fever, and conducted a study in which the
intervention was hand washing.
Who dares challenge the existing dogma?
• Dr Semmelweis initiated a mandatory hand washing
policy for medical students and physicians. In a
controlled trial using a chloride of lime solution, the
mortality rate fell to about 2%—down to the same
level as the midwives. Later he started washing the
medical instruments and the rate decreased to about
1%. His superior, Professor Klein did not accept his
conclusions. Klein thought the lower mortality was
due to the hospital’s new ventilation system.
• Semmelweis did not get his assistant professorship
renewed in 1849. He was offered a clinical faculty
appointment (privatdozent) without permission to
teach from cadavers. He returned home to Budapest.
Misconception # 2
• The popular believe is that the material and methods in this
course are abstract and have little to do with important
issues in everyday life. My Question-Why do we not use
these methods to examine difficult questions?
• Terrorism (how it develops, how to prevent it-airport security)
• Torture (Is it effective? Does it provide useful information?)
• How can we best prevent rape and assault?
• Are there gun control approaches that reduce gun violence?
• Such questions are not addressed adequately by the ideas
and tools provided by this field mainly because people
maintain a view of this field as ‘academic and irrelevant’opinion
Also many times we make assumptions that go untested and may
turn out to be incorrect-See unregulated radiation doses in CT scans
Rebecca Bindman
Misconception # 3
• Numbers drive the ideas
• Actually it is the ideas that drive the numbers
• Numbers can describe and quantify and also tell us
about differences between individuals and or
groups as well as accurately describe changes that
occur. The research ideas and tools in such a class
as this can also help us distinguish between true
and false claims and identify those claims that are
significant and meaningful.
An Epidemic of False Claims-Scientific
American May 7 2011
• False positives and exaggerated results in peer-
reviewed scientific studies have reached epidemic
proportions in recent years. The problem is rampant
in economics, the social sciences and even the
natural sciences, but it is particularly egregious in
biomedicine.
• Many studies that claim some drug or treatment is
beneficial have turned out not to be true. We need
only look to conflicting findings about beta-carotene,
vitamin E, hormone treatments, Vioxx and Avandia.
Even when effects are genuine, their true magnitude
is often smaller than originally claimed.
An Epidemic of False Claims
• Research is fragmented, competition is fierce and emphasis
is often given to single studies instead of the big picture.
• Much research is conducted for reasons other than the
pursuit of truth. Conflicts of interest abound, and they
influence outcomes. In health care, research is often
performed at the behest of companies that have a large
financial stake in the results. Even for academics, success
often hinges on publishing positive findings.
What is usefulness of this course
• Claims are made all the time regarding some product or
process and sometimes some controversy
• A new study into the efficiency and reliability of wind
farms has concluded that a campaign against them is
not supported by the evidence
• Internet marketers of acai berry weight-loss pills and
colon cleansers will pay $1.5 million to settle charges of
deceptive advertising and unfair billing, the Federal
Trade Commission announced today. The FTC complaint
alleged that two individuals and five related companies
deceptively claimed that their Acai Pure supplement
would cause rapid and substantial weight loss, and that
their Colotox colon cleanser would prevent colon cancer.
The scientific method
• A body of techniques for investigating phenomena,
acquiring new knowledge, or correcting and
integrating previous knowledge. To be scientific, a
method must be based on empirical and
measurable evidence subject to specific principles
of reasoning Empiricism-knowledge comes only or primarily from sensory experience.
• Although procedures vary from one field of inquiry
to another, identifiable features distinguish
scientific inquiry from other methods of obtaining
knowledge.
• Anthropology-Zoology
The scientific method
• The scientific approach recognizes that both intuition and
authority can be sources of ideas but does not
unquestionably accept something as true based on a
person’s prestige or authority pg 3-5
• The fundamental characteristic of scientific method is
empiricism-the idea that knowledge is based on observations
and that these observations can be measured (creating data
or a set of data) pg 5
• Science is adversarial Since a requirement is that hypotheses
must be testable, researchers conduct and then publish their
results, allowing others to review them and decide for
themselves the validity and reliability of the data and the
conclusions drawn from them pg 6
• Scientific evidence is peer reviewed-Editors of the journal
examine the research submitted to determine its validity
Pseudoscience
• Hypothesis generated are typically not testable
• Methodology is not scientific and validity of data is
questionable
• Supportive evidence tends to be anecdotal and/or rely
on “so-called’ experts
• Conflicting evidence is ignored
• Language used sounds scientific
• Claims tend to be vague, rationalize strongly held
beliefs and appeal to preconcieved ideas
• Claims are never revised pg 9
Scientific Inquiry
• Researchers propose hypotheses (a tentative idea
that must be tested) pg19 as explanations of
phenomena, and design experimental studies to test
these hypotheses via predictions which can be
derived from them
• Scientific inquiry is generally intended to be as
objective as possible in order to reduce biased
interpretations of results. Another basic expectation is
to document and share all data and methodology so
they are available for careful scrutiny by other
scientists, giving them the opportunity to verify
results by attempting to reproduce them (replicate
results)
Scientific Inquiry Scientists are funny
• “The history of biochemistry is a chronicle of controversies.
These controversies exhibit a common pattern. There is a
complicated hypothesis, which usually entails an element of
mystery and several unnecessary assumption. This is opposed
by a more simple explanation, which contains no unnecessary
assumptions.
• The complicated one is always the popular one at first, but the
simpler one, as a rule, eventually is found to be correct. This
process frequently requires ten to twenty years, The reason for
this long time lag was explained by Max Planck.
He remarked that scientists never changed
their mind, but eventually they die”
--John Northrup Biochemist
Hypotheses and Theories
• An hypothesis is a conjectural (if-then) statement while a
theory is a systematic body of ideas about a particular topic
or phenomenon pg 19
• A question is asked that may refer to an observation (e.g. Do
aggressive video games increase aggression in adolescents
and young adults?) or may be in the form of an open-ended
question (what strategies are best for coping with natural
disasters?)
• We then make conjectures (hypotheses), and test them to see
if our predictions (specific predictions) conform to what
happens in the real world
• Theories encompass wider domains of inquiry that may bind
many independently derived hypotheses together in a
coherent, supportive structure. Theories, in turn, may help
form new hypotheses or place groups of hypotheses into
context
Basic Steps of Scientific Inquiry
• Define a question
• Gather information and resources (observe)
• Form an explanatory hypothesis
• Test the hypothesis by performing an experiment and
collecting data in a reproducible manner
• Analyze the data
• Interpret the data and draw conclusions that serve as a
starting point for new hypothesis
• Publish results
• Retest (frequently done
by other scientists)
Examples of Pseudoscience
• Expectations that 2012 would bring large-scale disasters or
even the end of the world
• Ancient Astronauts - Proposes that aliens have visited the
earth in the past and influenced our civilization
• Astrology - Belief that humans are affected by the position of
celestial bodies
• Flat Earth Society - Claims the Earth is flat and disc-shaped
• Moon Landing Conspiracy - Contends the original moon
landing was faked
• Bermuda Triangle - An area where unexplained events, like
disappearances of ships and airlplanes, have occurred
• Cryptozoology - The search for Bigfoot (Yeti), the Loch Ness
monster, El Chupacabra and other creatures that biologists believe do
not exist
Some More Controversies
• Mayan Calendar predictions for 2012
• Crystal healing
• Hypnosis – state of extreme relaxation and inner focus in which a
person is unusually responsive to suggestions made by the
hypnotist. The modern practice has its roots in the idea of animal
magnetism, or mesmerism, originated by Franz Mesmer
Mesmer's explanations were thoroughly discredited, and to this
day there is no agreement amongst researchers whether hypnosis
is a real phenomenon, or merely a form of participatory role-
enactment
The Geocentric Model &The Wanderers
• Most of the time we see Mars, Jupiter and Saturn moving around the Sun in
the same direction as the Earth, but during the relatively short time that the
Earth overtakes one of these planets, that planet appears to be moving
backward. As the Greeks noticed discrepancies between the way planets
moved and the basic geocentric model, they began adjusting the model and
creating variations on the original. In these models, planets and other
celestial bodies move in circles that have been superimposed onto circular
orbits around the Earth
• http://www.lasalle.edu/~smithsc/Astronomy/retrograd.html
The Earth Moved
• The solution proposed by Ptolemy, to these discrepancies came in the form
of a mad, but clever proposal: planets were attached, not to the concentric
spheres themselves, but to circles attached to the concentric spheres
• The Ptolemaic system, the most well-known versions of the geocentric
model, was a complex interaction of circles. Ptolemy believed that each
planet orbited around a circle, which was termed an epicycle, and the
epicycle orbits on a bigger circle–the deferent–around the Earth.
• However, in practice, even this was not enough to account for the detailed
motion of the planets on the celestial sphere! In more sophisticated
epicycle models further "refinements" were introduced. In some cases,
epicycles were themselves placed on epicycles
The Day the Earth Stood Still
• Ptolemic geocentric theory describes and correctly predicts-one
could confidently predict when a planet’s apparent motion would
come to a halt and turn around, and for how long it would seem to
move backwards. Theory predicts but does not explain HOW or
WHY the planets move as they do
• Correlation~Prediction Causality
• Navigation unaffected
• Occam’s razor or the law of parsimony
• Once Kepler proposed the theory of elliptical orbits, heliocentrism
became such a simple model compared to Ptolemy's unwieldy
cycles and epicycles, that heliocentrism rapidly gained in popularity
and quickly became the dominant theory
Joy of Researcy
Conducting Research-Library
Research and Journals
Chapter Two
Hypothetically Speaking
Researchers generally test a hypothesis-a tentative idea or
question that can be supported or refuted and then design a
study to test the hypothesis. The researcher also makes a
prediction regarding the outcome of the experiment pg 19
If the prediction is not confirmed the researcher will either
reject the hypothesis or conduct further research using
different methods pg 19
However, if the results of the study confirm the prediction the
hypothesis is supported but not proven
Constructing the study
• Participants in the study are Subjects pg 20
• Participants in survey research are respondents
• Those who help researchers understand a particular
culture or organization are informants
• Participants are often more fully described by
characterizing them as students, employees,
residents, patients etc.
• Other terms for subjects include
respondents, informants pg 20
Sources of Ideas
• Common sense-The things we all believe to be true
although such notions do not always turn out to be
correct (also popular beliefs-the 5 sec rule pg 20-21 )
• Observation- Listening to music with degrading sexual lyrics
predicts a range of early sexual behavior
• Serendipity-Luck? Pg 21 Pavlov? Accidental discovery of
dogs salivating to other stimuli besides food (Otto Loewi
and the discovery of Acetylcholine) it was generally
accepted that neurons were connected by synapses and
initially most neurophysiologists believed that signal
transmission between cells was electrical Other Example
Accidental discovery of medications in 1950’s
• Theories
• Past research
Sense-Common and Otherwise
• Common sense is often made up of much prejudice
and snap judgment, and therefore is not always useful
and can certainly be irrational even when it is useful
• Testing a commonsense idea can be useful since such
ideas do not always turn out to be true
• Stress theory of ulcers: As peptic ulcers became more
common in the 20th century, doctors increasingly
linked them to the stress of modern life. Medical
advice during the latter half of the 20th century was,
essentially, for patients to take antacids and modify
their lifestyle. In the 1980s Australian clinical
researcher Barry Marshal discovered that the
bacterium H. pylori caused peptic ulcer disease,
leading him to win a Nobel Prize in 2005
Another Crazy Idea
• Immovable continents: Prior to the middle of the 20th century
scientists believed the Earth’s continents were stable and did not
move. This began to change in 1912 with Alfred Wegener’s
formulation of the continental drift theory, and later and more
properly the elucidation of plate tectonics during the 1960s
• Accident and Serendipity- Pavlov did not set out to discover classical
conditioning but was studying the digestive
system and found that dogs would salivate
to a neutral stimulus when paired with food
Theories
• Theory-a systematic body of ideas about a particular
topic or phenomenon with a consistent structure that
has two functions pg22
• 1) Theories organize and explain various facts and
descriptions or observations putting them into a
coherent framework (system)
• 2) Theories generate new knowledge by guiding our
observations and generating new hypotheses Theories
are living and dynamic
(and the theory can be modified to account for new data)
• Theories Hypotheses A theory consists of much
more than a simple idea and is grounded in prior
research often with several consistent hypotheses
Theories (and facts) change
• Theories can be modified by new discoveries Example-The
original conception of long term memory as a permanent
fixed storage place was modified when Loftus (1979)
demonstrated that memories could be influenced by how
subjects were questioned pg23 participants viewed a
simulated automobile accident and later asked questions
Did you see the broken headlight? vs. Did you see a
broken headlight? Subjects more likely to answer yes to
first version
• Memories can also be induced so memory is not simply a
record of what happened
• Relevant to Criminal Justice system and police procedures
Theories and data
• Under sources of idea pg23 top Cozby and Bates cite the research
of Buss (2007) proposing that males feel more intense
jealousy when a partner is unfaithful due to the physical
infidelity while females are more jealous due to the
emotional infidelity. This is consistent with evolutionary
theory
• Females are more threatened by men who would form an
emotional bond with another partner and withdraw support
and resources –Males are more threatened that they might
have to care for a child who does
not share any of his genes taken
from evolutionary theory pg23
Past Research
• “Becoming familiar with a body of research on a
topic is perhaps the best way to generate ideas for
new research” pg24
• Becoming familiar with a particular body of research
allows you to see inconsistencies
• What you know about one research area may be
applied to another research area
• Researchers refine and expand on known and
published research
• Replication-An attempt to repeat a finding using
a different setting, a different demographic
group (age, sex etc) or different methodology
• Research is also stimulated by practical problems
that may have immediate applications
Examining Data critically
• Example of facilitated communication in
which a ‘facilitator’ held the hand of an
autistic child to help press keys on a
keyboard or otherwise assist in communication
• Montee et al. 1995 constructed study with three conditions
(1) Both child and facilitator were shown the same picture
and child asked to identify picture (by using keyboard)
assisted by facilitator (2) Only child saw the picture (3) The
child and facilitator saw different pictures (unknown to
facilitator) – Results Pictures were correctly identified only
in condition one
Evaluating web Information
• Is the site associated with a major educational
institution or is it sponsored by one individual or
organization and if so what may be the bias of that
person or organization (e.g. Disabled People's International)
• Is the information provided by those responsible for
the cite? What are their qualifications?
• Is the information current
• Do links from the site lead
to legitimate organizations? Pg 35
Journals and Library Research
• Most papers submitted for publication in major
journals are rejected (during peer review)
• Peer Review-Editors on the journal review the article
and also send it to other experts in the field to
review pg 25 Due to limited space and the number of articles
received most articles submitted are rejected
• Journals usually specialize in one or two articles View pg26
• PsycINFO Science Citation Index Social Sciences Citation
Index pubmed
Literature Review
• A “literature review” reviews the scholarly
literature on a specific topic by summarizing
and analyzing published work on that topic. A
literature review has several purposes:
• 1) To evaluate the state of research on a topic
• 2) To familiarize readers and students with
what has already been done in the field
• 3) To suggest future research directions or
gaps in knowledge
Traditional and Open Access journals
• In traditional, subscriber-pays publishing, the publisher,
who holds the copyright to an article, pays most
printing and distribution costs and, in order to read an
article, the journal subscriber pays fees, whether for
hard-copy or online versions. Sometimes an author is
required to pay printing page charges for complex
graphics or color presentations.
• “Open access” publishing generally
means that the author or publisher,
who holds the copyright to an article,
grants all users unlimited, free access
to, and license to copy and distribute,
a work published in an open access
journal usually on-line
Traditional and Open Access journals
• Traditional publishing - Individuals and libraries are charged fees to access
the article. Depending on the contract you sign as an author, you may not
be able to distribute copies of your article or post it online.
• The now-common usage of the term "open access" means freely available
for viewing or downloading by anyone with access to the internet.
• UK Wellcome Trust(global charitable foundation) assumes that “the benefits
of research are derived principally from access to research results”, and
therefore that “society as a whole is made worse off if access to scientific
research results is restricted”
• Problems of traditional and open access
• Sending papers to reviewers who are sympathetic (traditional)
• Payment for publication (by authors) could create conflicts of interest and
have a negative impact on the perceived neutrality of peer review, as there
would be a financial incentive for journals to publish more articles(open access)
• Open Access is also often seen as a solution to the situation where many
libraries have been forced to cut journal subscriptions because of price
increases
Traditional vs. Open Access Publishing
• Controversies about open access publishing and
archiving confront issues of copyright and
governmental competition with the private sector.
• Traditional publishers typically charge readers
subscriber fees to fund the costs of publishing and
distributing hard-copy and/or online journals.
• In contrast, most open access systems
charge authors publication fees and
give readers free online access to the
full text of articles
Good and Bad sources
Anatomy of a Research Article
Abstract, Introduction, Method
Section, Results Section and
Discussion (Conclusions)
Abstract and Introduction
• Abstract – a summary of the report which
typically runs no more than 120 words. It includes
information about the hypothesis, the procedure
of the study and a summary of results (there may
be some information about the discussion)
• Introduction The researcher outlines the problem
including past research and theories relevant to
the problem. Expectations are listed (usually in
the form of hypotheses) pg35
Method Section
• The method section is divided into subsections as determined
by the author and dependent on the complexity of the study
and its design. Sometimes there is an overview of the design
explained to the reader
• The next section describes the characteristics of the
participants (number of subjects, male/female etc.)
• The next subsection describes the procedure, the materials
or instruments used, how data was recorded.
• Additional subsections are used as necessary to describe
equipment, procedures or other information to be included
• Details of all relevant information must be included to allow
other researchers to replicate the study
Results and Discussion
• Results-In this section the researcher presents the
findings, usually in three ways. First there is a
narrative summary. Second there is a statistical
description. Third tables are presented. “Statistics are
only a tool the researcher uses. . .” Not understanding
how the calculations were performed is not a
deterrent to reading and understanding the logic
behind the design and statistical procedures used
• Discussion-The researcher reviews the research from
various perspectives, determining if the research
supports the hypothesis or not and offer explanations
in either case, what went wrong in the study. There is
also usually a comparison with past research and
there may be suggestions for practical applications of
the research findings
The Quick Guide-copyrighted
• Introduction 1) What is known 2) What is not
known which this study addresses
• Methods Who Where What –Who are the
subjects-(describe them), where did they come
from and what did you do with them (often divide
them into groups such as experimental and
control)
• Results-What happened? (e.g. which group did
better)
• Discussion-What do the results mean.
Interpretation of the study is in this section
Ethical Research-Chapter 3
• Beneficence-The principle which states the need to
maximize benefits and minimize harm pg40
• Risk-Benefit Analysis- what is potential harm?, does
confidentiality hold?, was
there informed consent?
Milgram’s Methodology
• Through a rigged drawing, the participant was assigned the role of
teacher while the confederate was always the learner. The
participant watched as the experimenter strapped the learner to a
chair in an adjacent room and attached electrodes to the learner’s
arm. The participant’s task was to administer a paired associate
learning test to the learner through an intercom system.
• Participants sat in front of an imposing shock generator and were
instructed to administer an electric shock to the learner for each
incorrect answer. Labels above the 30 switches that spanned the
front of the machine indicated that the shocks ranged from 15 to
450 volts in 15-volt increments. Participants were instructed to start
with the lowest switch and to move one step up the generator for
each successive wrong answer.
Milgram’s Methodology
• The subjects believed that for each wrong answer, the learner was
receiving actual shocks. In reality, there were no shocks. After the
confederate was separated from the subject, the confederate set up a
tape recorder integrated with the electro-shock generator, which
played pre-recorded sounds for each shock level. After a number of
voltage level increases, the actor started to bang on the wall that
separated him from the subject. After several times banging on the
wall and complaining about his heart condition, all responses by the
learner would cease
• At this point, many people indicated their desire to stop the
experiment and check on the learner. Some test subjects paused at
135 volts and began to question the purpose of the experiment. Most
continued after being assured that they would not be held
responsible
• After the 330-volt shock, the learner no longer screamed or protested when
receiving a shock, suggesting that he was physically incapable of
responding. The major dependent variable was the point in the procedure
Milgram’s Methodology
Deception
• If at any time the subject indicated his desire to halt the
experiment, he was given a succession of verbal prods by
the experimenter, in this order
• Please continue.
• The experiment requires that you continue.
• It is absolutely essential that you continue.
• You have no other choice, you must go on
• If the subject still wished to stop after all four successive
verbal prods, the experiment was halted. Otherwise, it
was halted after the subject had given the maximum 450-
volt shock three times in succession
• The experimenter also gave special prods if the teacher made specific comments. If the teacher asked whether
the learner might suffer permanent physical harm, the experimenter replied, "Although the shocks may be
painful, there is no permanent tissue damage, so please go on
Ethical Research
• Milgram summarized the experiment in his 1974 article, "The
Perils of Obedience", The legal and philosophic aspects of
obedience are of enormous importance, but they say very
little about how most people behave in concrete situations. I
set up a simple experiment at Yale University to test how
much pain an ordinary citizen would inflict on another person
simply because he was ordered to by an experimental
scientist.. . . The extreme willingness of adults to go to almost
any lengths on the command of an authority constitutes the
chief finding of the study and the fact most urgently
demanding explanation. . . relatively few people have the
resources needed to resist authority
• Milgram (1974) maintained that the key to obedience had
little to do with the authority figure’s manner or style. Rather,
he argued that people follow an authority figure’s commands
when that person’s authority is seen as legitimate.
Data can surprise us
• Before conducting the experiment, Milgram polled fourteen Yale
University senior-year psychology majors to predict the behavior
of 100 hypothetical teachers. All of the poll respondents believed
that only a very small fraction of teachers (the range was from
zero to 3 out of 100, with an average of 1.2) would be prepared to
inflict the maximum voltage. Milgram also informally polled his
colleagues and found that they, too, believed very few subjects
would progress beyond a very strong shock.
• Milgram also polled forty psychiatrists from a medical school and
they believed that by the tenth shock, when the victim demands
to be free, most subjects would stop the experiment. They
predicted that by the 300 volt shock, when the victim refuses to
answer, only 3.73 percent of the subjects would still continue and
they believed that "only a little over one-tenth of one percent of
the subjects would administer the highest shock on the board
The relevance of Milgram
• Milgram sparked direct critical response in the
scientific community by claiming that "a
common psychological process is centrally
involved in both [his laboratory experiments
and Nazi Germany] events
• There are psychological processes which can
disengage morality from conduct
Criticism of Milgram
• In addition to their scientific value, the obedience
studies generated a great deal of discussion because of
the ethical questions they raised (Baumrind, 1964;
Fischer, 1968; Kaufmann, 1967; Mixon, 1972). Critics
argued that the short-term stress and potential long-
term harm to participants could not be justified.
• In his defense, Milgram (1974) pointed to follow-up
questionnaire data indicating that the vast majority of
participants not only were glad they had participated in
the study but said they had learned something
important from their participation and believed that
psychologists should conduct more studies of this type
in the future. Nonetheless, current standards for the
ethical treatment of participants clearly place Milgram’s
studies out of bounds (Elms, 1995).
Mechanisms of moral disengagement.
A.Bandura
• Theory of Moral Disengagement seeks to analyze the means through which
individuals rationalize their unethical or unjust actions
• Moral justification- turns killing into a moral act. when non-violent acts
appear to be ineffective and when there is a serious threat to a person's
way of life. Justification can take many forms and can be considered a
service to humanity or for the greater good of the community
• Displacement of Responsibility- Group decision making can diffuse
responsibility. Personal responsibility is obscured
• Disregard for Consequences- People minimize the consequences of acts they
are responsible for. It's easier to hurt others when they are not visible
• Dehumanization- People find violence easier if they don't consider they
victims as human beings. The road to terrorism is gradual
• Euphemistic labeling- terms that are less negative or might be viewed as
positive — to make actions seem less harmful. This sort of labeling also
serves to limit or reduce their responsibility for their actions
• Advantageous comparison- people who engage in reprehensible acts make
it seem less objectionable by comparing it to something perceived as being
worse
Some criticisms of Milgram• Professor James Waller, Chair of Holocaust and Genocide Studies at Keene
State College, formerly Chair of Whitworth College Psychology Department,
expressed the opinion that Milgram experiments do not correspond well to the
Holocaust events
• The subjects of Milgram experiments, wrote James Waller (Becoming Evil),
were assured in advance, that no permanent physical damage would result
from their actions. However, the Holocaust perpetrators were fully aware of the
finite nature of their hands-on killing and maiming of the victims.
• The laboratory subjects themselves did not know their victims and were not
motivated by racism. On the other hand, the Holocaust perpetrators displayed
an intense devaluation of the victims through a lifetime of personal
development.
• Those serving punishment at the lab were not sadists, nor hate-mongers, and
often exhibited great anguish and conflict in the experiment, unlike the
designers and executioners of the Final Solution who had a clear "goal" on
their hands, set beforehand.
• The experiment lasted for an hour, with no time for the subjects to contemplate the
implications of their behavior. Meanwhile, the Holocaust lasted for years with
ample time for a moral assessment of all individuals and organizations involved.
Risks of Research (continued)
• Procedures that can cause physical harm are rare while
those that involve psychological stress are much more
common (refer to Schacter’s study on stress and affiliation)
If stress is possible the researcher must use all safeguards
possible to assist in dealing with the stress and also include
a debriefing session pg 42
• Loss of privacy/confidentiality- Data should be stored
securely and be made anonymously if possible but if not
care should be taken to separate identifying data from
actual data pg43
• Concealed Observation Is it ethical to use data taken from
public web sites or those which require some identification
Risks of Research- Informed Consent
• Informed Consent Implies that potential subjects
should be provided with all information that might
influence their decision to participate in the study pg44
• Informed consent forms generally include 1) purpose of
research 2) procedures involved 3) risk/benefits
4) any compensation 5) confidentiality 6) assurance of
voluntary participation and permission to withdraw from
study 7) contact information for subjects to ask questions
• To make form easier to understand it should not be written
in the first person – I understand that participation is
voluntary (first person) Instead – Participation in this study
is voluntary pg44
Deception and Informed Consent
• Deception occurs when there is active
misrepresentation of information. In the Milgram
experiment there were two examples pg47
• 1) Subjects were told the study was about memory
and learning while it was actually about obedience
• 2) Subjects were not told they would be delivering
shocks to confederates (Milgram created a false reality for subjects)
• Milgram’s study took place before informed consent
became routine. Might “honest’ informed consent
resulted in a different outcome? Would it have
biased the sample?
Deception and Ethics
• The concepts of informed consent and debriefing have
become standard and more explicit pg48
• While false cover stories are still commonly used especially
in Social Psychology, the use of deception
is decreasing overall for three reasons
• 1) researchers have become more interested in cognitive
variables rather than emotional ones and adopt practices
more similar to those in cognitive studies which involve less
deception (memory research)
• 2) there is greater sensitivity and awareness of ethical
issues and how they should be handled in research
• 3) Review boards at universities are more stringent about
approving research involving deception and want to know if
alternatives are not available
Alternatives to Deception
• Role Playing-different forms. Ss may be described a
situation and asked how they would respond or
predict how real participants would react pg50
• However it is not easy to predict one’s own
behavior especially when there is some undesirable
behavior being studied (e.g. conformity, aggression)
• Most people overstate their
altruistic tendencies
Alternatives to Deception
• Simulations-enactment of some real situation
(can still pose ethical problems)
• Zimbardo prison experiment 1971 Stanford
• “Our planned two-week investigation into the
psychology of prison life had to be ended
prematurely after only six days because of what the
situation was doing to the college students who
participated. In only a few days, our guards became
sadistic and our prisoners became depressed and
showed signs of extreme stress”-Phillip Zimbardo
http://www.prisonexp.org/
Alternatives to Deception
• Honest Experiments-behavior studied without
elaborate deception (e.g. speed dating used to
study romantic attraction)
• Subjects agree to have their behavior studied and
know the hypotheses of the researchers
• Use situations when people seek assistance Assign
students to different conditions of skill
improvement (e.g. on-line or in-class help)
• Use naturally occuring events to test hypotheses
(e.g. New York residents given PTSD checklist to
determine if they were different from Wash D.C.
residents after 9/11 attacks)
Sample selection and ethics• Justice principle- Any decisions to include or exclude certain
people from a research study must be make solely on scientific
grounds (e.g. Tuskegee Syphilis Study 1932-1972) pg 52-54
• According to the rules of the U.S. Dept. of Health and Human
services all institutions that 4receive federal funds must have an
Institutional Review Board (IRB) responsible to review research
proposed and conducted by that institution (even if it is not
conducted on site at that institution)
• IRB must have at least 5 members with at least one member from
outside the institution. Exceptions to IRB review include
• 1) research in there is no risk (anonymous questions, surveys etc.) are
exempt from IRB review
• 2) Research with minimal risk(risk no greater than that encountered in
daily life) are routinely approved by IRB. All other research with
greater than minimal risk is reviewed and requires safeguards such as
informed consent) See Table 3.1 pg 54 Assessment of Risk
IRB impact on Research
• Some researchers may be frustrated over the
sometimes long process of review with numerous
requests for revisions and clarifications.
• These IRB policies apply to all areas of research so that
the caution necessary for some medical research is
applied to other research with less risk
• Some studies indicate that students
who have participated in research
studies are more lenient in their
judgments of the ethics of the
experiment than the researchers
themselves or the IRB members pg55
Risk-Benefits of Clinical Research
• Clinical trials involving new drugs are commonly
classified into four phases
Risk-Benefits of Clinical Research
•Phase I: Researchers test a new drug or treatment in a small
group of people for the first time to evaluate its safety,
determine a safe dosage range, and identify side effects.
•Phase II: The drug or treatment is given to a larger group of
people to see if it is effective and to further evaluate its safety.
•Phase III: The drug or treatment is given to large groups of
people to confirm its effectiveness, monitor side effects,
compare it to commonly used treatments, and collect
information that will allow the drug or treatment to be used
safely.
•Phase IV: Studies are done after the drug or treatment has
been marketed to gather information on the drug's effect in
various populations and any side effects associated with long-
term use (source NIH U.S. Library of Medicine)
Risk-Benefits of Clinical Research
• More common than physical stress is psychological stress
(Schacter’s study (1959) on anxiety and affiliation)-In the
study they had two conditions -- high anxiety and lower
anxiety. In the high anxiety Researchers emphasized the
ominous and expected pain of the electric shock experiment.
In the low anxiety they made it seem nearly painless
• Subjects were to rate their anxiety level, and then decide if
they prefer being alone or with others before the electric
shock tests would begin. Lastly they were given the choice to
be let out of the experiment (without credit for their psych
class).
• Results- 63% of the high anxiety condition wanted to remain
together, but only 33% wanted to be together in the low
anxiety condition
Risk-Benefits of Clinical Research
• Psychological stress-Social psychology experiments
(deception)
• Giving unfavorable feedback about S’s personality
or asking about traumatic or unpleasant events
• The Bystander Intervention Model predicts that
people are more likely to help others under certain
conditions.
Social Psychology-Psychological harm/stress
• Bystander intervention research
• Many factors influence people's willingness to help,
including the ambiguity of the situation, perceived cost,
diffusion of responsibility, similarity, mood and gender,
attributions of the causes of need, and social norms.
• Situational ambiguity. In ambiguous situations, (i.e., it is
unclear that there is an emergency) people are much less
likely to offer assistance than in situations involving a
clear-cut emergency (Shotland & Heinold, 1985). They are
also less likely to help in unfamiliar environments than in
familiar ones
• Perceived cost. The likelihood of helping increases as the
perceived cost to ourselves declines (Simmons, 1991). We
are more likely to lend our class notes to someone whom
we believe will return them than to a person who doesn't
appear trustworthy
Social Psychology-Psychological harm/stress-
Bystander intervention research
• Diffusion of responsibility-The presence of others may diffuse the
sense of individual responsibility. It follows that if you suddenly felt
faint and were about to pass out on the street, you would be more
likely to receive help if there are only a few passers-by present
than if the street is crowded with pedestrians. With fewer people
present, it becomes more difficult to point to the "other guy" as
the one responsible for taking action. If everyone believes the
other guy will act, then no one acts
• Similarity- People are more willing to help others whom they
perceive to be similar to themselves—people who share a
common background and beliefs. They are even more likely to
help others who dress like they do than those in different attire
(Cialdini & Trost, 1998). People also tend to be more willing to help
their kin than to help non—kin (Gaulin & McBurney, 2001).
• Mood- People are generally more willing to help others when they
are in a good mood
Social Psychology-Psychological harm/stress-
Bystander intervention research
• Gender. Despite changes in traditional gender roles, women in need
are more likely than men in need to receive assistance from strangers
• Attributions of the cause of need. People are much more likely to
help others they judge to be innocent victims than those they believe
have brought their problems on themselves (Batson, 1998). Thus,
they may fail to lend assistance to homeless people and drug addicts
whom they feel "deserve what they get."
• Social norms. Social norms prescribe behaviors that are expected of
people in social situations (Batson, 1998). The social norm of "doing
your part" in helping a worthy cause places a demand on people to
help, especially in situations where their behavior is observed by
others (Gaulin & McBurney, 2001). For example, people are more
likely to make a charitable donation when they are asked to do so by
a co-worker in full view of others than when they receive an appeal in
the mail in the privacy of their own home
APA Ethics Code Research with Humans
and Animals
• APA ethics code-Psychologists are committed to
increasing scientific and professional knowledge of
behavior and people’s understanding of themselves
and others and to the use of such knowledge to
improve the condition of individuals, organizations and
society pg55
• Five general principles of the APA ethics code relate to
beneficence, responsibility, integrity, justice and
respect for the rights and dignity of others
• Of the ten ethical standards concerning conduct the
focus is on the 8th Ethical Standard for Research and
Publication
Ethics and Research with Humans
• Institutional approval-IRB
• Informed consent includes purpose of experiment,
right to decline or withdraw from study,
consequences of declining, risks, benefits,
confidentiality, incentives for participation and
contact information
• Psychologist conducting intervention research clarify
the nature of the treatment, services available to
control group, how will treatment and control
groups be formed, alternatives for those wishing to
withdraw or not participate and any compensation
offered for participation pg56
Ethics in Research with Humans (continued)
• 8.05 Psychologists may dispense with informed
consent when there is no risk of harm or only
anonymous questions or observations are used and
confidentiality is protected pg57
• 8.06 Psychologist avoid offering excessive financial
or other inducements and if a professional service is
offered, its nature, risk and obligations are clarified
• 8.07 Psychologists do not use deception unless if
can be justified by prospective scientific or other
value and no reasonable alternatives are available.
No deception is allowed in research that is expected
to cause physical pain or severe emotional distress
Nuremburg Code
• At the end of World War II, 23 Nazi doctors and scientists
were put on trial for the murder of concentration camp
inmates who were used as research subjects. Of the 23
professionals tried at Nuremberg,15 were convicted, 7 were
condemned to death by hanging, 8 received prison sentences
from 10 years to life, and 8 were acquitted
• Ten points describing required elements for conducting
research with humans became known as the Nuremburg Code
• 1) Informed consent is essential 2) Research should be based
on prior animal work. The risks should be justified by the
anticipated benefits. 3) Only qualified scientists must conduct
research. 4) Physical and mental suffering must be avoided.
• 5) Research in which death or disabling injury is expected
should not be conducted
Ethics and Animal Research
• Approximately 7% of articles in Psych Abstracts
(PsychINFO) involve animals
• Animals commonly used to test effects of drugs, to
study physiological mechanisms and genetics
• 95% of animals in research are rats, mice and birds
• Animal Rights groups have become more active
Environmental conditions for animals can be more easily
controlled than for humans
It is more difficult to monitor a human’s behavior than an
animal’s behavior
Most scientists agree that animal research benefits humans
Top Five Reasons to Stop Animal Testing- PETA
• It’s unethical to sentence 100 million thinking, feeling animals to life in a
laboratory cage and intentionally cause them pain, loneliness, and fear.
• It’s bad science. The Food and Drug Administration reports that 92 out of
every 100 drugs that pass animal tests fail in humans.
• It’s wasteful. Animal experiments prolong the suffering of people waiting
for effective cures by misleading experimenters and squandering precious
money, time, and resources that could have been spent on human-relevant
research.
• It’s archaic. Forward-thinking scientists have developed humane, modern,
and effective non-animal research methods, including human-based
microdosing, in vitro technology, human-patient simulators, and
sophisticated computer modeling, that are cheaper, faster, and more
accurate than animal tests.
• The world doesn’t need another eyeliner, hand soap, food ingredient, drug
for erectile dysfunction, or pesticide so badly that it should come at the
expense of animals’ lives.
Ethics and Animal Research
• 8.09 Psychologists acquire, care for, use and dispose of
animals in compliance with federal, state and local
regulations and with professional standards pg59-60
• Psychologists ensure appropriate consideration for animal’s
comfort, health and humane treatment
• All individuals under the supervision of a psychologist using
animals have received instruction in research methods as
well as the care, maintenance and handling of the species
being used
• Surgery is performed under appropriate anesthesia
minimizing infection and pain and subjecting animals to pain
or stress must be justified scientifically
• When an animal’s life must be terminated it must be done
rapidly minimizing pain and according to accepted procedure
Misrepresentation-Fraud and Plagiarism
• Fabrication of data is fraud which is most commonly detected
when other scientists cannot replicate the results of a study pg 62-63
• Fraud is not considered a major problem in science (it is still rare)
in part because researchers know that others will read their
reports and conduct their own studies and if found guilty of
fraud reputations and careers are seriously damaged
• No independent agencies exist to check on the activities of scientists
• Plagiarism-misrepresenting another’s work as your own but
can include a paragraph or even a sentence that is copied
without a reference. Even if you paraphrase you must cite
your source
• Szabo (2004)-50% of British university students believed that
using internet for academically dishonest activates is
acceptable
Fundamental Research Issues-chp4
• Variable – any event, situation, behavior or individual
characteristic that varies. Any variable must have at least
two or more levels or values pg69
• There are two broad classes of variable-those that vary in
quality and those that vary in quantity ;for example
gender is a qualitative variable and intelligence is a
quantitative variable
• Common variables studied are reaction time, memory,
self-esteem, stress etc.
• Discrete variables can have only finite set of values (no
fractional values) sex, political affiliation, number of
children) Continuous variable can take any value
including fractional- Height, weight, some ability, IQ (do
Fundamental Research Issues
• Operational definition- The set of procedures sued to
measure or manipulate a variable pg71
• Many measurements are indirect and we infer from them
(We do not really measure temperature but the length of a
column of mercury and infer temperature from that)
• Pain is a subjective state but we can create measures to infer
how much pain someone is experiencing
• Wong-Baker FACES rating
scale
• To determine an operational definition we often ask “how
does one behave if one possesses that trait?”
• Operational definitions forces scientists to discuss abstract
concepts in concrete terms and communicate with each
other using agreed upon concepts (how good is your operational definition=construct validity)
Relationships between Variables
• Validity which refers to the degree to which a test
or other measure assesses or measures what it
claims to measure is known as construct validity
Does the operational definition reflect the true
meaning of the variable? Pg71
• Validity which refers to whether you can generalize
your results to other populations or situations is
known as external validity (generalizability) pg85
Common Threats to Validity
• History--the specific events which occur between
the first and second measurement.
• Maturation--the processes within subjects which
act as a function of the passage of time. i.e. if the
project lasts a few years, most participants may
improve their performance regardless of treatment.
• Testing--the effects of being measured may change
the behavior or performance of the subject.
• Instrumentation--the changes in the instrument,
observers, or scorers which may produce changes in
outcomes.
Threats to Validity (continued)
• Statistical regression-It is also known as regression
to the mean. This threat is caused by the selection
of subjects on the basis of extreme scores or
characteristics. Give me forty worst students and I
guarantee that they will show immediate
improvement right after my treatment
• Selection of subjects--the biases which may result
in selection of comparison groups. Randomization
(Random assignment) of group membership is a
counter-attack against this threat
Relationships Between Variables
• Relationships between variables 1) Positive Linear
Relationship 2) Negative Linear Relationship
3) No Relationship and 4) Curvilinear Relationship pg72
• Positive linear Relationship Increases in one variable are accompanied by increases
in a second variable
• Negative linear Relationship Increases of one variable are accompanied by
decreases in a second variable
• No Relationship Levels of one variable are not related to levels of a second variable
• Curvilinear Relationship Increases in one variable are accompanied by systematic
increases and decreases in a second variable pg73-74
Correlation Coefficient
• Correlation refers to the degree of how strongly
variables are related to one another
• Correlated variables are those which tend to vary
together; Correlation Causality
• Mexican lemon imports prevent highway deaths Obesity caused debt bubble
• Others- Pirates cause Global warming Number of radios and number of people in asylums
Correlation-Scatter Plot
• 1 is a perfect positive correlation
• 0 is no correlation (the values don't seem linked at all)
• -1 is a perfect negative correlation
The value shows how good the correlation is and if it is positive or negative
The local ice cream shop keeps track of how much ice cream
they sell versus the temperature on that day, here are their
figures for the last 12 days
Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
Correlation example
• You can easily see that warmer weather leads
to more sales, the relationship is good but not
perfect. The correlation is 0.9575
There has been a heat wave! It gets so hot that people aren't
going near the shop, and sales start dropping.
• The correlation calculation only
works well for relationships that
follow a straight line. The
calculated value of correlation
is 0. But we can see the data
follows a nice curve that reaches
a peak around 25° C. But the
correlation calculation is not
"smart" enough to see this
• If you make a Scatter Plot, and
look at it, you may see more
than the correlation value says.
• Make your own scatterplot
• http://www.alcula.com/calculat
ors/statistics/scatter-plot/
Random Variation
• Random variability refers to uncertainty in events pg76
• Random Variability-Variability of a process (which is
operating within its natural limits) caused by many
irregular and erratic (and individually unimportant)
fluctuations or chance factors that (in practical
terms) cannot be anticipated, detected, identified,
or eliminated.
• Research attempts to identify systematic
relationships between variables ( reducing random
variability)
Dispersion Sum of Squares
In statistics, statistical dispersion (also called statistical
variability or variation) is variability or spread in a variable
• Subjects Score X X 2 x X2
• 1 0 0 -5 25
• 2 1 1 -4 16
• 3 2 4 -3 9
• 4 4 16 -1 1
• 5 5 25 0 0
• 6 6 36 1 1
• 7 7 49 2 4
• 8 8 64 3 9
• 9 8 64 3 9 S= SS = ?
• 10 9 81 4 16 N-1
• N=10 T=50 ∑X2= 340 = 0 ∑ = =90
Experimental vs Nonexperimental Methods
• Nonexperimental methods relationships are
studied by observations or by measuring the
variable of interest directly (recording responses to
questions, examining collected data (much of the
data is correlational-e.g. students who work longer
hours have lower GPAs Variables are measured but
not manipulated)
• Experimental method involves direct manipulation
and control of variables. The two variables do not
just vary together but one variable is introduced to
determine how if affects the second variable pg78
Nonexperimental Method
• Two limitations of Nonexperimental method
• 1) We are usually measuring covariation
(correlation) which means it is difficult to determine
the direction of cause and effect (Negative correlation between
anxiety and exercise -does anxiety reduce exercise or does exercise reduce anxiety?
If exercise reduces anxiety than starting an exercise program would be a good way
to reduce anxiety but if anxiety causes people to stop exercising then forcing
someone to exercise may not reduce their anxiety)
• 2)We have the problem of a third variable
(suppressor variable)pg 78-80 (in the example of anxiety and exercise a
third variable such as higher income may lead to both the lowering of anxiety and
increase in exercise) Industrialization birth rate
increase in stork population
Class exercise – Interpret the correlation between shy sons and talkative
mothers (r=Positive correlation-Talkative mothers have shy sons)
Third Variable (suppressor)Problem
• Direction of cause and effect not always crucial. If you
are interested in making predictions while unable to
manipulate variables it is still valuable (e.g. Astronomy)
• Example pg 79- Two causal patterns are possible in the
correlation of Similarity: Liking
• 1) Similarity causes people to like each other
• 2) Liking causes people to become more similar
• However when there is a 3rd variable that is undesirable
because it influences the relationship between the
variables that an experimenter is examining
(extraneous variable) and interpretation of the
relationship is unclear (example of research on wine
drinking and heart protection)
Confounding variables and Correlation
• One limitation of nonexperimental methods is that
measures are indirect (and correlational) making it
difficult to determine the direction of cause and
effect pg80 (A perceived relationship between an independent variable and a
dependent variable that has been misestimated due to the failure to account for a
confounding factor is termed a spurious relationship)
• Most common measure of correlation is the Pearson Product
moment correlation coefficient (r)
http://www.alcula.com/calculators/statistics/correlation-coefficient/
• r= SP SP=∑ XY- (∑X)(∑Y)
SSxSSy N
X= 2,4,4,5,7,8 =30 SP 2x5,4x9,4x9,5x11,7x15,8x17=378
Y=5,9,9,11,15,17=66 SP= 378-(30)(66) =48 r= 48_ =1.00
6 (24)(96)
Confounding Variables
• Confounding variable-is an extraneous
variable(uncontrolled) in a statistical model that
correlates (directly or inversely) with both variables
being studied pg80 (A perceived relationship
between an independent variable and a dependent
variable that has been misestimated due to the
failure to account for a confounding factor is
termed a spurious relationship)
• If you eliminate the confounding variable you
eliminate alternative or competing explanations
Correlation
Correlation and Prediction
• Correlation refers to the degree of relationship
between two variables
• Regression-(Multiple) regression is a statistical tool
used to derive the value of a criterion from several
other independent, or predictor, variables. It is the
simultaneous combination of multiple factors to
assess how and to what extent they affect a certain
outcome (y=X1+X2+X3. . . ETC.)
• “The terms correlation, regression and predication
are so closely related in statistics that they are often
used interchangeably”- J.Roscoe
• Construct regression model of predicting student grades
with student grades as the dependent variable(y)
Latitude is significantly associated with the
prevalence of multiple sclerosis: a meta-analysis
• Background There is a striking latitudinal gradient in multiple sclerosis (MS)
prevalence, but exceptions in Mediterranean Europe and northern
Scandinavia, and some systematic reviews, have suggested that the gradient
may be an artefact. The authors sought to evaluate the association between
MS prevalence and latitude by meta-regression
• Epidemiologic studies have
shown a positive correlation
of multiple sclerosis (MS)
prevalence with latitude.
However, there has not been
a causal association found
• In statistics, a meta-analysis
refers to methods that focus
on contrasting and combining
results from different studies,
in the hope of identifying patterns among study results
Vitamin D and its immunoregulatory role in
multiple sclerosis-Niino M,Drugs Today (Barc). 2010 Apr
• Mapping the distribution of multiple sclerosis (MS) reveals a high
prevalence of the disease in high-latitude areas, suggesting a positive
relationship between vitamin D and MS. Vitamin D is known to play
an important role in bone and mineral homeostasis. It has recently
been reported that several types of immune cells express vitamin D
receptors and that vitamin D has strong immune-modulating effects.
Vitamin D and its analogues inhibited experimental autoimmune
encephalomyelitis (EAE, an animal model of MS) and there have been
reports of small clinical trials on the treatment of MS with vitamin D.
• Furthermore, there have been discussions on the association
between vitamin D levels and MS and about the genetic risk of
vitamin D receptor (VDR) gene polymorphisms in MS. The current
review discusses the immunological functions of vitamin D, the
association between vitamin D and MS and expectations regarding
the role of vitamin D in future treatments of MS
Sunlight and vitamin D for bone health and prevention of
autoimmune diseases, cancers, and cardiovascular disease-Michael F
Holick, Am J Clin Nutr 2004
• Vitamin D is taken for granted and is assumed to be plentiful in a healthy
diet. Unfortunately, very few foods naturally contain vitamin D, and only a
few foods are fortified with vitamin D. This is the reason why vitamin D
deficiency has become epidemic for all age groups in the United States and
Europe. Vitamin D deficiency not only causes metabolic bone disease
among children and adults but also may increase the risk of many common
chronic diseases.
• Solar ultraviolet B photons are absorbed by 7-dehydrocholesterol in the
skin, leading to its transformation to previtamin D3, which is rapidly
converted to vitamin D3
• Once formed, vitaminD3 is metabolized in the liver to 25-hydroxyvitamin D3
and then in the kidney to its biologically active form, 1,25- dihydroxyvitamin
D3. Vitamin D deficiency is an unrecognized epidemic among both children
and adults in the United States.
• Although chronic excessive exposure to sunlight increases the risk of nonmelanoma
skin cancer, the avoidance of all direct sun exposure increases the risk of vitamin D
deficiency, which can have serious consequences.
Vitamin D and multiple sclerosis
Hayes CE et al. Proc Soc Exp Biol Med. 1997 Oct;216(1):21-7
• This theory can explain the striking geographic distribution of
MS, which is nearly zero in equatorial regions and increases
dramatically with latitude in both hemispheres. It can also
explain two peculiar geographic anomalies, one in Switzerland
with high MS rates at low altitudes and low MS rates at high
altitudes, and one in Norway with a high MS prevalence inland
and a lower MS prevalence along the coast.
• Ultraviolet (UV) light intensity is higher at high altitudes,
resulting in a greater vitamin D3 synthetic rate, thereby
accounting for low MS rates at higher altitudes. On the
Norwegian coast, fish is consumed at high rates and fish
oils are rich in vitamin D3.
Experimental Method
• The experimental method reduces ambiguity by
manipulating one variable and measuring the other
• Example in Exercise and Anxiety-One group
exercises daily for a week and another group does
not exercise (Experimental vs Control group),
Anxiety would be measured (discuss limits of this design) pg81
• Experimental method attempts to eliminate the influence
of potentially confounding variables by controlling all
aspects of the experiment except the manipulated variable
which is held constant and ensuring that any variable that
is not held constant are variables whose effects are random
(random variables) give example
Randomization
• The number of potential confounding variables is
infinite but the experimental method attempts to
deal with this problem through randomization
which ensures that the extraneous confounding
variable is as likely to affect one group as it is the
other. Any variable that cannot be held constant can
be controlled by randomization pg82
• Example If experiment is conducted over several
days the researcher can use a random order of
scheduling the sequence of the various
experimental conditions (or can use a cross over) so
that one group is not consistently studied in the
morning or the afternoon
Random assignment• The thing that makes random assignment so powerful is
that greatly decreases systematic error – error that varies
with the independent variable
• Extraneous variables that vary with the levels of the
independent variable are the most dangerous type in terms
of challenging the validity of experimental results. These
types of extraneous variables have a special name,
confounding variables. For example, instead of randomly assigning
students, the instructor may test the new strategy in the gifted
classroom and test the control strategy in a regular class. Clearly, ability
would most likely vary with the levels of the independent variable. In this
case pre-knowledge would become a confounding extraneous variable
Independent and Dependent Variables
• In research the variables are believed to have a cause
and effect relationship so that one variable is
considered the cause (independent) while the other
variable is considered the effect (dependent variable) pg83
• The independent variable is manipulated while the
dependent variable is measured
• The independent variable is manipulated by the
experimenter and the subject has no control over it
(what the subject does is dependent on the variable
manipulated by the experimenter)
What are the independent & dependent variables in the class article?
What are the operational definitions of terms in the study?
Internal and External Validity
• Validity discusses to what extent are you measuring what
you claim to be measuring
• Internal validity is a property of scientific studies which
reflects the extent to which a causal conclusion based on a
study is warranted, and requires three elements pg85
• Temporal precedence-The causal variable (independent) is
manipulated and the effect is observed/measured on the
dependent variable
• Covariation-There must be some covariation between the
two variables which is shown when subjects show some
effect different than the control conditions
• Alternative explanations are eliminated (which means that
confounding variables are eliminated or controlled)
• External validity refers to what extent the results
can be generalized aka Generalizability
• Can the results of a study be replicated with other
operational definitions, different subjects, different
settings
• Researchers most interested in internal validity,
establishing a relationship between two variables,
may more likely conduct the study in a lab setting
with a restricted sample while a researcher more
interested in external validity might conduct a
nonexperimental design with a more diverse
sample
External Validity
Laboratory vs Field Experiments
• Lab experiments require a high degree of control but the
setting may be too artificial and may limit the answering of
some questions or the generality of results
• In Field Experiments the independent variable is
manipulated in a natural setting (see study pg87 top) confederate
coughs or not on passerbys who are then asked to rate
their perceived risk of contracting a serious disease or
having a heart attack)
• While it is more difficult to eliminate extraneous and
confounding variables in field studies there is less danger of
artificiality limiting the conclusions drawn from the study
Ethical and Practical Considerations
• In certain cases experimentation is unethical or
impractical (e.g. child rearing practices) and
variables are observed and measured as they occur
• When certain social variables are studied people
are frequently categorized into groups based on
their experience (example of studying corporal
punishment groups were formed by who was
spanked and who was not as a child-an ex post
facto design (after the fact) Since no random
assignment was made this would not be an
experimental design pg88
Variables and Describing and
Predicting Behavior
• Subject variables are characteristics of the subjects
such as age, gender, ethnic group (categorical) and are
nonexperimental by nature
• Since a major goal is to describe behavior, studies can
be conducted with simple observations and
manipulations (examples of Piaget and Buss’
study(2007) describing the reasons people reported
having sex) pg88
• Multiple methods-Since no study is a perfect test of a
hypothesis , multiple studies using multiple methods
with similar conclusions increase our confidence in the
findings pg89
Statistical Procedures in Measurement
• Good research is inevitably dependent on
measurement
• Measurement devices or tests have at least three
essential attributes
• Standardization-test administered to well-defined
group and their performance represents the norm
(norm group) (standardization often includes the use of standard scores z
scores T scores etc. discussed in a later section)
• Validity-A test is valid when it measures what it is
intended to measure
• Reliability-refers to the test’s precision in measuring
The problem of standardization-Diagnostic CT scans:
assessment of patient, physician, and radiologist awareness of
radiation dose and possible risks-Radiology. 2004 May;231(2):393-8. Epub
2004 Mar 18 Lee,CL et al.
• PURPOSE: To determine the awareness level concerning radiation dose and possible risks associated with computed
tomographic (CT) scans among patients, emergency department (ED) physicians, and radiologists.
• MATERIALS AND METHODS:
• Adult patients seen in the ED of a U.S. academic medical center during a 2-week period with mild to moderate
abdominopelvic or flank pain and who underwent CT were surveyed after acquisition of the CT scan. Patients were
asked whether or not they were informed about the risks, benefits, and radiation dose of the CT scan and if they
believed that the scan increased their lifetime cancer risk. Patients were also asked to estimate the radiation dose for
the CT scan compared with that for one chest radiograph. ED physicians who requested CT scans and radiologists who
reviewed the CT scans were surveyed with similar questions and an additional question regarding the number of years
in practice. The chi(2) test of independence was used to compare the three respondent groups regarding perceived
increased cancer risk from one abdominopelvic CT scan.
• RESULTS:
• Seven percent (five of 76) of patients reported that they were told about risks and benefits of their CT
scan, while 22% (10 of 45) of ED physicians reported that they had provided such information. Forty-
seven percent (18 of 38) of radiologists believed that there was increased cancer risk, whereas only 9%
(four of 45) of ED physicians and 3% (two of 76) of patients believed that there was increased risk
(chi(2)(2) = 41.45, P <.001). All patients and most ED physicians and radiologists were unable to
accurately estimate the dose for one CT scan compared with that for one chest radiograph.
• CONCLUSION:
• Patients are not given information about the risks, benefits, and radiation dose for a CT scan. Patients,
ED physicians, and radiologists alike are unable to provide accurate estimates of CT doses regardless of
their experience level
Radiation Dose Associated With Common Computed Tomography
Examinations and the Associated Lifetime Attributable Risk of Cancer
Rebecca Smith-Bindman Arch Intern Med. 2009;169(22):2078-2086
• Background Use of computed tomography (CT) for diagnostic evaluation has increased dramatically
over the past 2 decades. Even though CT is associated with substantially higher radiation exposure than
conventional radiography, typical doses are not known. We sought to estimate the radiation dose
associated with common CT studies in clinical practice and quantify the potential cancer risk associated
with these examinations.
• Methods We conducted a retrospective cross-sectional study describing radiation dose associated with
the 11 most common types of diagnostic CT studies performed on 1119 consecutive adult patients at 4
San Francisco Bay Area institutions in California between January 1 and May 30, 2008. We estimated
lifetime attributable risks of cancer by study type from these measured doses.
• Results Radiation doses varied significantly between the different types of CT studies. The overall
median effective doses ranged from 2 millisieverts (mSv) for a routine head CT scan to 31 mSv for a
multiphase abdomen and pelvis CT scan. Within each type of CT study, effective dose varied significantly
within and across institutions, with a mean 13-fold variation between the highest and lowest dose for
each study type. The estimated number of CT scans that will lead to the development of a cancer varied
widely depending on the specific type of CT examination and the patient's age and sex. An estimated 1
in 270 women who underwent CT coronary angiography at age 40 years will develop cancer from that
CT scan (1 in 600 men), compared with an estimated 1 in 8100 women who had a routine head CT scan
at the same age (1 in 11 080 men). For 20-year-old patients, the risks were approximately doubled, and
for 60-year-old patients, they were approximately 50% lower.
• Conclusion Radiation doses from commonly performed diagnostic CT
examinations are higher and more variable than generally quoted, highlighting the
need for greater standardization across institutions.
Measurement Concepts Chp 5
• Reliability refers to the consistency, precision or
stability of a measure of behavior pg96 Are the results
the same or very similar each time you measure a variable?
• Measures that change or fluctuate are not reliable
(assuming change is not due to the variable changing)
• Any measure has two parts
1) true score- real value of the variable
and 2) measurement error-is shown by
the greater variability
• Researchers cannot use unreliable measures (Duh!)
• Reliability is increased when we increase the number of
items in our measure, survey or test
Measuring Reliability
• We can measure reliability using the Pearson product
moment correlation coefficient pg98
• To calculate reliability we must have at least two scores on the
measure across individuals. If the measure is reliable the two scores
should be similar for each of the individuals studied (high positive
correlation For most measures coefficient should be at least .80) pg 98
• Types of Reliability
• 1) Test-Retest –Measures the same individuals at least for two points
in time then calculate the Pearson product moment r between the
scores. Test-Retest reliability is sometimes called a coefficient of
stability in that it measures how stable is the trait being measured
(Discuss some threats to validity for this measure) This is not a good
measurement for traits that are considered to be in a state of flux or
events occurring between the two administrations of the test
Measuring Reliability
• 2) Equivalent Form-Can avoid problems associated
with Test-Retest by giving equivalent forms of the
same test to the same set of people, calculating the
correlation between the two scores. You can
administer the two tests close in time (something
you cannot do with Test-Retest).
• However to the extent that the two forms are not
totally equivalent a new source of error is
introduced. Equivalent forms usually yield lower
estimates of reliability than Test-Retest (why?)
see next slide with two forms of Rey Complex Figure
Rey Complex Figures Form A & B
Measuring Reliability
• Split-Half Reliability-Test is administered once, then the test is
split in half, scored separately and a Pearson r is calculated for each
score
• Split-Half-correlation between the first and second half of the measurement
• Odd-Even correlation between the even items & odd items of a measurement
• In either case only one administration is required and the coefficient is
determined by the internal components of the test (aka internal
consistency reliability)
• Split-half not meaningful in speed tests (in which most items are not
difficult and score depends on how many items answered correctly e.g.
algebra test) Coefficient of reliability is inflated*
• Item-Total correlations-Look at the correlation between each item
score with the total score, based on all items (also measures internal consistency)
• Cronbach’s alpha -is a coefficient of internal consistency Averages split-
half coefficients. a function of the number of test items and the average
inter-correlation among the items pg99-100
Interrater Reliability
• In research in which raters observe behaviors and make ratings or
judgments (and then those judgments are compared and agree determines interrater reliability)
• Bandura (1961) conducted a study to investigate if social behaviors (i.e.
aggression) can be acquired by imitation 36 boys and 36 girls were tested
from the Stanford University Nursery School aged between 3 to 6 years
old. The role models were one male adult and one female adult
• Under controlled conditions, Bandura arranged for 24 boys and girls to
watch a male or female model behaving aggressively towards a toy called
a 'Bobo doll'. The adults attacked the Bobo doll in a distinctive manner -
they used a hammer in some cases, and in others threw the doll in the air
and shouted "Pow, Boom“. Another 24 children were exposed to a non-
aggressive model and the final 24 child were used as a control group and
not exposed to any model at all.
• To test the inter-rater reliability of the observers, 51 of the children were
rated by two observers independently and their ratings compared. These
ratings showed a very high reliability correlation (r = 0.89), which
suggested that the observers had good agreement about the behavior of
Construct Validity of Measures pg101
• Construct Validity is concerned with whether our methods
of studying variables is accurate (is our operational
definition valid?) also see pg 90 Does our method actually
measure the construct it was intended to measure
Measures of (construct)Validity/ Valid=True
• Construct Validity
• Refers to the accuracy of our measurements and operational definition-
Indicators of Construct Validity –Is our method of measuring a variable accurate
• Face Validity-The item appears to accurately measure the variable defined.
Appearance is not sufficient to conclude that a measure is accurate. Some
measures, such as surveys in popular magazines have questions that may look
reasonable (have face validity) but tell you very little-Cosmopolitan Surveys
1) What Guys Secretly Think of Your Hair & Makeup: The truth revealed! 2) 20 Dresses He Will Love
3) What He Thinks When He Walks Through Your Door (4) 7 Facebook Habits that Guys Hate 5) 78 Ways to Turn Him On
6) The Secret to Getting Any Guy (7)How to be a Total Man Magnet (8) Sexy Summer Hair Ideas (9)Meet a New Guy by
Summer! (10)How to Decode His Body Language http://www.cosmopolitan.co.uk/quizzes/how-hot-headed-are-you-quiz
Little if any empirical evidence exists to support the conclusions in these articles
Content Validity- How well does the content of a test sample the situations about
which conclusions are drawn. Requires some expertise to define a “universe of interest”,
careful drawing of a sample of ideas from this universe and the preparation of test items
that match these ideas-Compare the content of the measure with the universe of
content that defines that construct pg103 (For example, the content of the SAT Subject Tests™ is evaluated by committees
made up of experts who ensure that each test covers content that matches all relevant subject matter in its academic discipline)
Both face validity and content validity focus on determining if the content of a
Validity continued• Content Validity-Statistical methods may be applied to help
determine content validity. A test constructor may perform a correlation
between the score on each item and the score on the total test. Test
items that are not consistent with the total are either revised or
eliminated
• Predictive Validity (a type of Criterion validity)- A measure is used to
predict performance so that one measure occurs earlier than another
Predictive validity is one type of Criterion Validity (LSAT and performance in
Law School)
• Concurrent Validity applies to validation studies in which the two
measures are administered at approximately the same time (for
example, an employment test may be administered to a group of workers and then
the test scores can be correlated with the ratings of the workers' supervisors taken
on the same day or in the same week. The resulting correlation would be a
concurrent validity coefficient) pg104
• Concurrent validity and predictive validity are two types of criterion-related validity
in which scores are correlated or measured against an external criterion . The
difference between concurrent validity and predictive validity rests solely on the
time at which the two measures are administered.
Validity continued
• Convergent Validity-Defines how well one set of scores on a
measure are related to another set of scores measuring the
same or similar concepts
• measures of constructs that theoretically should be related to each other are, in
fact, observed to be related to each other
• Discriminant Validity-measures of constructs that theoretically should
not be related to each other are, in fact, observed to not be related
to each other pg104 (compare Convergent and Discriminant Validity to differential diagnosis)
• Convergent and discriminant validity are both considered subcategories or
subtypes of construct validity - neither one alone is sufficient for
establishing construct validity
• Imagine you are under the assumption that those that would buy your product again are
satisfied, as that would be what is expected. Testing for convergent validity in a survey may
look like this:
• Question 1: Would you buy product X again if given the chance?
• Question 2: How satisfied are you with product X?
• If they say yes to the first question, but they do not score the product very highly in the
second question, the question may have failed the validity test
Validity continued• Divergent validity is designed to see if you get the expected opposite result,
because that should also help imply that the question is answering in the way
you wanted it to answer. For example:
• Question 1: Do you wish you did not own product X?
• Question 2: Would you buy product X again if given the chance?
• If they answered yes for the first question, and yes for the second question, it
would imply that the question was too confusing, because you did not
receive the opposite response you expected. This would be divergent validity
• A major impetus to the study of validity was provided a half century ago by Campbell &
Fiske (1959), who introduced the multitrait-multimethod (MTMM) matrix as a means for
construct validation. The MTMM method can be used when multiple traits are examined
simultaneously and each of them is assessed by a given set of measures or measurement
methods (e.g., Eid, 2000; Marsh & Hocevar, 1983). As shown initially by Campbell and
Fiske, and further elaborated by subsequent authors, two types of validity coefficients are
of special interest when the MTMM matrix is utilized in the validation process—
convergent validity and discriminant validity coefficients.
• Reactivity-A measure is reactive if awareness of being measured
changes an individual's behavior This is what threat to validity?
• History? Maturation? Testing? Selection (of subjects)? Regression?
Relationship between Reliability
and Validity
• Validity is the extent to which a test measures what
it is supposed to measure while reliability is how
well it measures the variable(s)
• You can have reliability without validity but you
cannot have validity without reliability
Association of Facilities of Medicine of Canada AFMC
• Validity of concepts such as illness or disease
• Cultural conventions affect where the boundary between
disease and non-disease is placed: menopause may be
considered a health issue in North America, but symptoms
are far less commonly reported in Japan.
• Improvements in health have not reduced the demands on
doctors. Instead, doctors are called on to broaden the
scope of what they treat. Conditions, previously not
regarded as medical problems, such as hyperactivity in
children, infertility in young couples, weight gain in middle-
aged adults, or the various natural effects of aging, now
commonly lead patients to consult their doctor; the list is
likely to expand.
Validity of Diagnostic Labels• ?Non-Disease
• In 2002, the British Medical Journal stimulated a debate over the
appropriate expectations to place on doctors and on how to define the
limits of medicine. Richard Smith, editor of the Journal, surveyed readers
to collect examples of non-diseases, and found almost two hundred.
• He defined non-disease in terms of "a human process or problem that
some have defined as a medical condition but where people may have
better outcomes if the problem or process was not defined in that way."
Examples include burnout, chemical sensitivity, genetic deficiencies,
senility, loneliness, bags under the eyes, work problems, baldness, freckles,
and jet lag.
• Smith’s purpose was to emphasize that disease is a fluid concept with no
clear boundaries. He noted various dangers in being over-inclusive in
defining disease:
• when people are diagnosed with a disease and become patients they could
be denied insurance, lose their job, have their body invaded in the name of
therapy, or be otherwise stigmatised.
• The debate is covered in the British Medical Journal, April 13, 2002; vol. 324: pages 859-866 and 883-907.
Measures of Validity (continued)
• Predictive Validity-extent to which a score on a
scale or test predicts scores on some criterion
measure
• Predictive Validity Concerns
tests that are intended to
predict future performance
(GRE, LSAT). The construct
validity of the measure is
shown if it predicts future behavior
False Positives-False Negatives
• Biomedical Research Imaging Center at the University of North
Carolina at Chapel Hill School of Medicine-Etta Pisano
• American Cancer Society issued new guidelines that recommend an
annual MRI screen in addition to an annual mammography for
women at high risk of breast cancer.
• But, because the false-positive rate of MRIs was relatively high --
about 11 percent in the new study -- the authors don't recommend
MRI as a screening tool for the general population.
• National Cancer Institute-Even though breast cancer is the most
common noncutaneous cancer in women, fewer than 5 per 1,000
women actually have the disease when they are screened. Therefore,
even with a specificity of 90%, most abnormal mammograms are
false-positives
Effectiveness of Positron Emission Tomography for the
Detection of Melanoma Metastases ANNALS OF SURGERY Vol. 227, No. 5, 764-771 1998 Holder,W et. al
• The purpose of this study was to determine the sensitivity, specificity, and clinical
utility of 18F 2-fluoro-2-deoxy-D-glucose (FDG) total body positron emission
tomography (PET) scanning for the detection of metastases in patients with
malignant melanoma (melanoma causes the majority (75%) of deaths related to skin cancer).
• Introduction-Recent preliminary reports suggest that PET using FDG may be more
sensitive and specific for detection of metastatic melanoma than standard
radiologic imaging studies using computed tomography (CT). PET technology is
showing utility in the detection of metastatic tumors from multiple primary sites
including breast, lung, lymphoma, and melanoma. However, little information is
available concerning the general utility, sensitivity, and specificity of PET scanning
of patients with metastatic melanoma.
• Methods One hundred three PET scans done on 76 nonrandomized patients having
AJCC (American Joint Committee on Cancer) stage II to IV melanoma were prospectively evaluated.
Patients were derived from two groups. Group 1 (63 patients) had PET, CT (chest and
abdomen), and magnetic resonance imaging (MRI; brain) scans as a part of staging
requirements for immunotherapy protocols. Group 2 (13 nonprotocol patients) had PET,
CT, and MRI scans as in group 1, but for clinical evaluation only. PET scans were done
using 12 to 20 mCi of FDG given intravenously. Results of PET scans were compared to
CT scans and biopsy or cytology results.
Effectiveness of PET tumor detection
• Malignant tumors generally have greater rates of glucose
utilization and overall metabolism than normal tissues. FDG
is a glucose analogue that is taken up by rapidly dividing cells.
• Most melanomas are rapid users of glucose; in fact,
melanoma cells in vitro demonstrate a higher FDG uptake
than any other tumor type.
• PET scanning uses tracers that emit positrons (positively charged
electrons) that are very short-lived. They are produced in medical
cyclotrons or accelerators to be used quickly after
preparation. The half-life of 18F is 109 minutes.
• Positrons rapidly combine with negative electrons and are
annihilated. This process produces a pair of 511-KeV photons
emitted 1800 to one another that are then detected by the PET
scanner. A computer then processes the images so that they can
PET False Positives False Negatives
• False negatives occur in 1) patients who have
hyperglycemia 2) Tumors that are slow-growing or
have a large necrotic component may have decreased
FDG uptake.
• False positives are caused by 1) urinary excretion of
the isotope Administered radioiodine is excreted mainly by the urinary system, and so all dilations,
diverticuli and fistulae of the kidney, ureter and bladder may produce radioiodine retention.(Shapiro, Rufini et al. 2000)
2) Patients who are unusually muscular or have an
increased resting muscle tone take up FDG at a much
higher rate than persons with relaxed musculature.
• Back to the study- The purpose of this study was to determine prospectively the
sensitivity, specificity, and clinical utility of FDG total body PET scanning for the
detection of metastases in patients with malignant melanoma by comparing PET to
double-contrast CT scans and histologically or cytologically correlating these
findings.
Effectiveness of PET in Melanoma Detection
• Methods (continued)
• Sensitivity was defined as the proportion of patients
with metastatic melanoma who
had a positive PET scan.
• Specificity was defined as the proportion of patients
who did not have metastatic melanoma who had a
negative PET scan
• FDG was synthesized using the Siemens RDS negative ion cyclotron and CPCU
automated chemistry module. 18 Fluorine as fluoride was produced using a proton-
neutron reaction on 95% enriched'8 oxygen water. 18 Fluorine-FDG was synthesized in
the CPCU using the modified Hamacher synthesis (mannose triflate/18F-fluoride
reaction). The product was delivered pure, sterile, and in an injectable form. Each lot of
18Fluorine-FDG was analyzed to confirm radionuclide, radiochemical, and chemical
purity as well as sterility and pyrogenicity. The product conformed with United States
Pharmacopeia monograph standards. Huh?
• Results
• The accuracy of CT scanning for melanoma
lung metastases was equivalent to that of PET
scanning. However, PET scanning was superior
to CT scanning in identifying melanoma
metastases to regional and mediastinal lymph
nodes, liver, and soft tissues. (The mediastinum is the
cavity that separates the lungs from the rest of the chest. It contains the
heart, esophagus, trachea, thymus, and aorta)
Results (continued)
• PET CT
• Total scans 103 92
• Evaluable scans 100 92
• True-positive scans 49 26
• False-positive scans 8 7
• True-negative scans 40 38
• False-negative scans 3 21
Discussion
• CT scanning is widely used for the detection of metastases in a variety of
malignant neoplasms, including melanoma. The primary value of CT scanning is
the clear delineation of anatomic detail. A particular problem with CT scanning is
that small lymph nodes or small metastases may not be detectable or may appear
to be of normal size and configuration, while enlarged nodes and other masses
may be due to inflammation and nonmalignant processes. These findings
contribute to both the false-positive and false negative rates reported for CT scans.
CT scanning for detection of both primary and metastatic disease in the lung is
generally very good for lesions in the lung parenchyma.
• PET scanning as currently done does not reveal the anatomic detail of CT scanning.
However, imaging of even extreme anatomic detail often cannot discern benign
from malignant processes particularly with smaller 1cm lesions. The value of PET
scanning lies in the visualization of high metabolic activity of rapidly growing
tumors such as melanoma With close clinical correlation and tissue confirmation,
PET scanning is an extremely useful tool to evaluate high-risk melanoma patients
for the development of metastases
• Conclusion PET is superior to CT in detecting melanoma metastases and has a role
as a primary strategy in the staging of melanoma.
Accuracy and reliability of forensic latent fingerprint decisions
• The criminal justice system relies on the skill of latent print examiners as expert
witnesses. Currently, there is no generally accepted objective measure to assess
the skill of latent print examiners
• The interpretation of forensic fingerprint evidence relies on the expertise of latent
print examiners. The National Research Council of the National Academies and the
legal and forensic sciences communities have called for research to measure the
accuracy and reliability of latent print examiners’ decisions. Here, we report on the
first large-scale study of the accuracy and reliability of latent print examiners’
decisions, in which 169 latent print examiners each compared approximately 100
pairs of latent and exemplar fingerprints from a pool of 744 pairs.
• Latent prints (“latents”) are friction ridge impressions (fingerprints, palmprints, or
footprints) left unintentionally on items such as those found at crime scenes
Exemplar prints (“exemplars”), generally of higher quality, are collected under
controlled conditions from a known subject using ink on paper or digitally with a
livescan device . Latent print examiners compare latents to exemplars, using their
expertise rather than a quantitative standard to determine if the information content is
sufficient to make a decision.
Proceedings of the National Academy of Sciences of the United States of America
PNAS Ulery,B et al MARCH 2011
Accuracy and reliability of forensic
latent fingerprint decisions
• Latent print examination can be complex because latents are often small,
unclear, distorted, smudged, or contain few features; can overlap with
other prints or appear on complex backgrounds; and can contain artifacts
from the collection process. Because of this complexity, experts must be
trained in working with the various difficult attributes of latents
• Five examiners made false positive errors for an overall false positive rate
of 0.1%. Eighty-five percent of examiners made at least one false negative
error for an overall false negative rate of 7.5%. Independent examination of
the same comparisons by different participants (analogous to blind
verification) was found to detect all false positive errors and the majority of
false negative errors in this study. Examiners frequently differed on
whether fingerprints were suitable for reaching a conclusion.
Types of Variables Discrete vs Continuous
• Discrete vs. Continuous
• A discrete variable is one with a well defined finite set of
possible values, called states. Examples are: the number of
dimes in a purse, a statement which is either “true” or
“false”, which party will win the election, the country of
origin, voltage output of a digital device, and the place a
roulette wheel stops.
• A continuous variable is one which can take on a value
between any other two values, such as: indoor
temperature, time spent waiting, water consumed, color
wavelength, and direction of travel. A discrete variable
corresponds to a digital quantity, while a continuous
variable corresponds to an analog quantity
Variables and Measurement Scales
• We want to determine if there is a relationship between our
independent variable (chosen and/or manipulated by the
Experimenter) and the dependent variable (measuring some aspect
or behavior of our subject(s)
• Four Kinds of Measurement Scales
• Nominal scales- When measuring using a nominal scale, one simply
names or categorizes responses (nominal variables are categorical).
Gender, handedness, favorite color, and religion are examples of
variables measured on a nominal scale. The essential point about
nominal scales is that they do not imply any ordering among the
responses. For example, when classifying people according to their
favorite color, there is no sense in which green is placed "ahead of"
blue. Responses are merely categorized. Nominal scales embody the
lowest level of measurement. In an experiment the independent
variable is often a nominal or categorical variable pg106 (example on pg
107 Group 1 participated in meditation Group 2 did not All subjects underwent MRI. The
independent variable was participation/no participation, a nominal (categorical) variable
Variables and Measurement Scales• Ordinal Scales- allow us to rank order the levels of a variable
(category) being studied. However nothing is specified about
the magnitude of the interval between the two measures so
that in a rank order no particular value is attached to the
intervals between numbers (horse race; First, Second, Third)
• Ordinal scales fail to capture important information that will be
present in the other scales we examine. In particular, the
difference between two levels of an ordinal scale cannot be
assumed to be the same as the difference between two other
levels. In a satisfaction scale ranking a customer’s satisfaction
for a product, the difference between the responses "very
dissatisfied" and "somewhat dissatisfied" is probably not
equivalent to the difference between "somewhat dissatisfied"
and "somewhat satisfied.“
• Example pg107 Movie rating system from one to four checks
Variables and Measurement Scales
• Interval scales are numerical scales in which intervals have the same
interpretation throughout in that the intervals between the numbers
are equal in size. As an example, consider the Fahrenheit scale of
temperature. The difference between 30 degrees and 40 degrees
represents the same temperature difference as the difference
between 80 degrees and 90 degrees. This is because each 10-degree
interval has the same physical meaning However there is no absolute
zero on the scale (in this case the zero does not indicate an absence
of temperature but is only an arbitrary reference point) pg107
• Since an interval scale has no true zero point, it does not make sense
to compute ratios of temperatures. For example, there is no sense in
which the ratio of 40 to 20 degrees Fahrenheit is the same as the
ratio of 100 to 50 degrees; no interesting physical property is
preserved across the two ratios-it does not make sense to say that 80
degrees is "twice as hot" as 40 degrees
Variables and Measurement Scales
• Ratio scales- The ratio scale of measurement is the most
informative scale. It is an interval scale with the additional property
that its zero position indicates the absence of the quantity being
measured. Often these include physical measures such as length,
weight or time (Since ratios are allowed you can say someone is
twice as fast or slow as someone else)pg108
• With interval and ratio scales your can make quantitative
distinctions that allow you to talk about amounts of the variable
• Since money has a true zero point, it makes sense to say that
someone with 50 cents has twice as much money as someone with
25 cents (weight, time, length are also ratio scale measures)
• Since many variables in behavioral science are less precise ratio
scales are often not achieved. However since statistical tests for
Interval and Ratio variables are the same the real question becomes
if you can achieve an interval scale of measurement for your study
so that you can use (usually) more powerful statistical tests
Cramped Synchronized General Movements in Preterm
Infants as an Early Marker for Cerebral Palsy Ferrari,F Arch Pediatr
Adolesc Med. 2002
• Objective To ascertain whether specific abnormalities (ie,
cramped synchronized general movements [GMs]) can
predict cerebral palsy and the severity of later motor
impairment in preterm infants affected by brain lesions.
• Design Traditional neurological examination was
performed, and GMs were serially videotaped and blindly
observed for 84 preterm infants with ultrasound
abnormalities from birth until 56 to 60 weeks'
postmenstrual age. The developmental course of GM
abnormalities was compared with brain ultrasound findings
alone and with findings from neurological examination, in
relation to the patient's outcome at age 2 to 3 years.
Cramped Synchronized General Movements in Preterm Infants as an
Early Marker for Cerebral Palsy
• An early prediction of cerebral palsy will lead to earlier enrollment in
rehabilitation programs. Unfortunately, reliable identification of
cerebral palsy in very young infants is extremely difficult.10 It is
generally reported that cerebral palsy cannot be diagnosed before
several months after birth11-15 or even before the age of 2 years.16
• A so-called silent period, lasting 4 to 5 months or more, and a period of
uncertainty until the turning point at 8 months of corrected age have
also been identified.12-13 The neurological symptoms observed in the
first few months after birth in preterm infants who will develop
cerebral palsy are neither sensitive nor specific enough to ensure
reliable prognoses.
• Irritability, abnormal finger posture, spontaneous Babinski reflex,17-18
weakness of the lower limbs,19 transient abnormalityof tone,12-13,20-24
and delay in achieving motor milestones11 are some of the neurological
signs that have been described in these high-risk preterm infants
Early Marker for Cerebral Palsy continued
• Results Infants with consistent or predominant (33 cases) cramped
synchronized GMs developed cerebral palsy. The earlier cramped
synchronized GMs were observed, the worse was the neurological
outcome. Transient cramped synchronized character GMs (8 cases)
were followed by mild cerebral palsy (fidgety movements were
absent) or normal development (fidgety movements were present).
Consistently normal GMs (13 cases) and poor repertoire GMs (30
cases) either lead to normal outcomes (84%) or cerebral palsy with
mild motor impairment (16%). Observation of GMs was 100%
sensitive, and the specificity of the cramped synchronized GMs was
92.5% to 100% throughout the age range, which is much higher than
the specificity of neurological examination.
• Conclusions Consistent and predominant cramped synchronized GMs
specifically predict cerebral palsy. The earlier this characteristic
appears, the worse is the later impairment
Observational Methods Chp 6
• Observational methods are generally either
quantitative (focus on behaviors that can be
quantified) or qualitative (focus on people behaving
in natural settings-samples usually smaller than for
quantitative methods)
• Naturalistic observation-individuals observed in
their natural environment=field work/field
observation-researchers do not attempt to
influence events pg116
• Researcher interested in first, describing people,
setting and events and second, analyze what was
observed Naturalistic observation=qualitative
Observational Methods
• Researcher decides if will be participant or nonparticipant
observer-Field research often very time consuming and
inconvenient also often in unfamiliar environments
• Jane Goodall Instead of numbering the chimpanzees she
observed, she gave them names Claiming to see individuality
and emotion in chimpanzees, she was accused of
anthropomorphism
• Hunter Thompson and the Hell’s Angels- became converted to their
motorcycle mystique, and was so intrigued, as he puts it, that 'I was no
longer sure whether I was doing research on the Hell's Angels or being
slowly absorbed by them’ he remained close with the Angels for a year,
but ultimately the relationship waned. It ended for good after several
members of the gang gave him a savage beating or "stomping" over a
remark made by Thompson to an Angel named Junkie George, who was
beating his wife. Thompson said: "Only a punk beats his wife." The
beating stopped only when senior members of the club ordered it
Methodological Issues in Observation
• Coding-researcher chooses behavior and describes and measures
that behavior with a coding system pg119 In systematic observation
usually two or more raters are used to code behaviorpg120
• Sampling-Event recording simply tallies the frequency of a given
behavior during the observation period. Interval recording
similarly captures frequency, but divides the observation period
into segments and counts the number of segments in which the
target behavior is displayed, either throughout the interval or at a
particular time point in the interval. Duration recording measures
the length time a behavior lasts
• Functional behavior assessment, an observational strategy,
assesses antecedents, frequency, duration, and consequences of
the aggressive behavior for the target child and others in the
environment to determine the functions that the aggressive
behavior serves for the child. In spite of obvious benefits of direct
observation, the strategy can be limited by several problems
Methodological Issues in Observation
• Behaviors must be clearly defined, and observers must be trained to
fully understand the exact behaviors that are to be captured.
Observer bias or the tendency to see what one expects to see is
especially troublesome in direct observation of aggression
• In a study conducted by Baron (1976) an accomplice failed to move
his vehicle for 15 seconds after the traffic signal at preselected
intersections turned green. The reactions of passing motorists to this
unexpected delay were recorded by two observers seated in a
second parked car at the intersection using a tape recorder to
determine the frequency, duration and latency of horn honking of
motorists (Video recording has become very popular)
• Reactivity- the possibility that the presence of the observer will
affect behavior can be minimized by concealed observation with
small cameras and microphones pg120 What threat to validity does this represent?
Methodological Issues in Observation
• Case study-observational method applied to an individual
Presents individual’s history, symptoms, characteristic
behavior response to treatment pg121
• Case studies may or may not include
naturalistic observation-In
Psychology/Psychiatry
the case study is usually a description
of the patient with an historical
account of some event pg121
• Case study often done when individual
possesses a rare, unusual or unusual
condition especially about some condition involving memory, language, social function
• Mania after termination of epilepsy treatment:
a case report see file
Archival Research
• Uses previously compiled information to answer
research questions and researcher does not collect
original data Use of public records, databases or
other written records (e.g. Census Bureau)
• Survey Archives-stored surveys from Political
surveys from polling organizations, National
Science Foundation-Researcher may not be able to
afford collecting and tabulating all this data
• Two major problems with archival data- May be
difficult to obtain desired records It is difficult to be
certain of how accurate is the information
collected by others pg124
Survey Research Chp 7
• Survey research uses questionnaires and interview to ask
people to give information about themselves about attitudes,
beliefs, demographic variables (age, gender, income etc.)
Assume that people are willing and able to provide truthful and
accurate answers pg130
• Survey research can be a good compliment
to experimental research
• Some researcher ask questions without
considering what useful information will be gained by such questions
• Response Set-Tendency to respond to all questions from a particular
point of view “Faking good”-social desirability leads respondent to
answer in most socially acceptable way
• If researcher communicates honestly,
assures confidentiality and promises
feedback participants can be expected
to provide honest answers pg131
Survey Research• Attitudes and Beliefs surveys ask people to evaluate certain issues/situations/people
• Consumer Reports We conduct many surveys by selecting a random sample
from the approximately 7 million readers who subscribe to Consumer Reports
and/or to ConsumerReports.org, who are some of the most consumer-savvy
people in the nation.
• Some surveys focus on behavior(how many times did you exercise this week?)
• Question Wording-Many of the problems in surveys stem from the wording
and include 1) use of unfamiliar technical terms 2) vague or imprecise terms
3) ungrammatical sentences 4) run on sentences that overload memory
5) using misleading information
• Subtle wording differences can produce great differences in results.
“Could,” “should,” and “might” all sound about the same, but may
produce a big differences in agreement to a question.
• Strong words such as “force” and “prohibit” represent control or action
and can bias your results “The government should force you to pay
taxes” Different cultural groups may respond differently. One recent study
found that while U.S. respondents skip sensitive questions, Asian
respondents often discontinue the survey entirely-source qualtrics.com
Survey Research
• Questions need to be Simple and easy to understand “And,”
“or”, or “but” within a question usually make it overly complex pg132-133
• Avoid 1) double barreled questions-questions that ask two
things at once 2) Loaded questions leading people to
respond in a certain way “Do you favor eliminating the wasteful
excesses in the public school budget”? Do you approve of the President’s
oppressive immigration policy? A leading question suggests to the
respondent that the researcher expects or desires a certain answer.
The respondent should not be able to discern what type of answer
the researcher wants to hear 3) Negative Wording- Do you feel the
city should not approve the proposed women’s shelter? -Agreeing with the
question means disagreement with the proposal and can
confuse people 4) Yea-saying and Nay-saying-Response Set-
A tendency to agree or disagree with all questions when a
respondent notices that they have answered several questions the same way, they
assume the next questions could be answered that way too-can reverse wordingpg133
http://www.surveymonkey.com/s.asp?u=952783415975
Responses to Questions• Closed ended questions-have a limited number of responses,
more structured ,easier to code written answers are the same for
all respondents (yes-no agree-disagree) Fixed number of response alternatives
• Open-Ended questions harder to categorize and code. Frequently
the different type of questions give different response patterns
and different conclusions pg134-135
• In a poll conducted after the presidential election in 2008, people responded very
differently to two versions of this question: “What one issue mattered most to you
in deciding how you voted for president?” One was closed-ended and the other
open-ended. In the closed-ended version, respondents were provided five options
(and could volunteer an option not on the list). When explicitly offered the
economy as a response, more than half of respondents (58%) chose this answer;
only 35% of those who responded to the open-ended version volunteered the
economy. Moreover, among those asked the closed-ended version, fewer than
one-in-ten (8%) provided a response other than the five they were read; by
contrast fully 43% of those asked the open-ended version provided a response not
listed in the closed-ended version of the question. Pew Research Center Researchers
will sometimes conduct a pilot study using open-ended questions to discover which answers are most common.
They will then develop closed-ended questions that include the most common responses as answer choices
Responses to Questions
• In addition to the number and choice of response options offered, the
order of answer categories can influence how people respond to
closed-ended questions. Research suggests that in telephone surveys
respondents more frequently choose items heard later in a list (a
“recency effect”).
• in the example discussed above about what issue mattered most in
people’s vote (previous slide), the order of the five issues in the
closed-ended version of the question was randomized so that no one
issue appeared early or late in the list for all respondents.
Randomization of response items does not eliminate order effects,
but it does ensure that this type of bias is spread randomly
• Questions with ordinal response categories – those with an underlying order
(e.g., excellent, good, only fair, poor OR very favorable, mostly favorable,
mostly unfavorable, very unfavorable) – are generally not randomized
because the order of the categories conveys important information to help
respondents answer the question. Generally, these types of scales should be
presented in order so respondents can easily place their responses along the
Wording and Order of Questions• "Thinking of your teachers in high school, would you say that the female
teachers were more empathetic with regard to academic and personal
problems than the male teachers, or were they less empathetic?" The other
group responded to a question with the direction reversed: "Thinking of
your teachers in high school, would you say that the male teachers were
more empathetic with regard to academic and personal problems than the
female teachers, or were they less empathetic?" Responses were measured
on a nine-point scale ranging form "less empathetic" (1) to "more
empathetic" (9). Not only were the mean ratings statistically different, but
when female teachers were the subject, 41 percent of respondents felt that
the female teachers were more empathetic than male teachers; when male
teachers were the subject, only 9 percent of respondents felt that female
teachers were more empathetic than the male teachers. The direction of
comparison significantly affected the results obtained when the authors
compared soccer with tennis and tennis with soccer on which was the more
exciting sport-Wanke, Schwarz and Noelle-Neumann (1995)-authors
concluded that respondents generally "focus on the features that
characterize the subject of comparison and make less use of the features
that characterize the referent of the comparison."
Wording and Order of Questions• A researcher wishing to increase the variability and thereby make it harder
for statistics to demonstrate significant differences among stimuli (e.g.,
comparing different brands of tissues) can accomplish this by using scales
with too many points. A two-point scale, on the other hand, used with a
stimulus that subjects can actually rate on many gradations will result in a
very imprecise measurement. This will make it very difficult to find
differences among means. For example, will there be a significant difference
between the mean ratings for the presidencies of Abraham Lincoln and
William Clinton if the scale consists of only two points, "good" and "bad“?
• Waddell (1995) suggested that traditional customer satisfaction
measurement scales ask the wrong question by focusing on "How am
I doing?" rather than "How can I improve?" He claims that consumers
usually rate products/services as being better when using
performance or satisfaction scales and that these scales often
produce high average scores. Neal (1999) posited that satisfaction
measures cannot be used to predict loyalty since loyalty is a behavior
and satisfaction is an attitude-RATING THE RATING SCALES-
H.Friedman Journal of Marketing Management, Vol. 9:3, Winter 1999
Rating Scales
• Rating scales ask people to provide quantity or
“how much” Rating scales provide a set of
categories designed to elicit information about
a quantitative or a qualitative attribute.pg135
• Simplest form presents people with five or seven
response alternatives with he endpoint on the scale
labeled to define the extremes
• Am I the greatest professor ever?
strongly agree __ __ __ __ __ __ __ strongly disagree
• Graphic rating scale- requires a mark along a continuous 100
millimeter line that is anchored at either end with descriptors
Rating Scales
• Semantic differential scale-
respondents rate any concept
on a series of bipolar adjectives
using a 7 point scale
• Almost anything can be measured using this technique-
concepts are measured along three basic dimensions 1)
evaluation (good-bad) 2) activity (fast-slow) 3) potency (weak-strong)
• Non verbal scales for children
• Labeling response alternatives
Researchers may provide labels to more clearly define the meaning of each
alternative-the middle alternative is a neutral point
half-way between the endpoints
Rating Scales
• There are instances in which you may not want a
balanced scale
• Example pg137 In comparison with other graduates
how would you rate this student’s potential
Lower 50% upper 50% upper 25% upper 10% upper 5%
_________ _________ _________ _________ ________
• Most of the alternatives ask to rate someone within the upper 25% as
students in this group tend to be highly motivated and professors tend
to rate them positively
• High frequency vs. Low frequency scales –alternatives indicate different
frequencies of variable
How often do you exercise
Less than once a month about once a month once every two weeks once a week
________ _______ ________ _______
Questionnaires & Surveys
• Questionnaires should be professional and neatly
typed with clear response alternatives In
sequencing the questions it is best to ask the most
interesting questions first, questions on a particular
topic grouped together and demographic questions
presented last pg138
• Administer the questionnaire first to a small group
of friends, colleagues for their feedback
• Questionnaires are in written form and may be
given to groups or individuals while surveys can be
written or given as interviews
Questionnaires & Surveys
• Questionnaires given to groups(classes, meetings, job
orientation) have the advantage of having ‘captive audiences’
that are likely to complete the questionnaire and the
researcher is usually present to answer questions pg139
• Mail questionnaires/surveys- Inexpensive but often with a
low return rate due to distractions. Low interest and no one
being present to answer questions or provide clarification
• Internet questionnaires/surveys-Responses are sent
immediately to researcher Problems exist with 1) sampling
People interested in the topic can complete the form and
polling organizations sample from collected databases-Are
the results similar to traditional methods? (2) Do people
misrepresent themselves (seems unlikely but no way to know
Questionnaires & Surveys
• Interviews-Because an interview involves interaction
between people it is more likely that a person will
agree to answer questions versus a mailed interview
pg140
• The interviewer can answer questions and provide
clarification
• Problems with interviewer bias-interviewer may
react positively or negatively to answers
(inadvertently) or might influence answer due to
characteristics (age,sex,race etc.)
or bias could lead interviewers
to see what they want to see
Types of Interviews• Face to Face interviews -Expensive and time consuming Interviewer
may have to travel to person’s home or person to office-Likely to be
used when sample size is small
• Telephone interviews- Most large scale surveys are done via telephone
which are less expensive than face-o-face interviews and allow data to
be collected relatively quickly as many interviewers can work on the
same survey at once-In computer assisted telephone interview (CATI)
systems the questions appear on the computer screen and the data
are entered directly for analysis
• Focus group interviews- 6-10 persons together for 2-3 hours usually
selected because they share a particular interest or knowledge of a
topic Often receive an incentive to compensate for time and traveling.
Questions often open-ended and asked of everyone-plus advantage of
group interaction. Interviewer must be skilled in dealing with
individuals who wish to dominate discussion or hostility between
members. Discussions often recorded and later analyzed. Although they
provide a great deal of data they are also costly and time consuming pg142
Surveys to study changes over time
• Surveys usually study one point in time but because
some questionnaires are given every year can track
changes (also can use a panel study of the same
group of people over time)
Autism rating items
• Before age 3, did the child ever imitate another person?
• 1. Yes, waved bye-bye
• 2. Yes, played pat-a-cake
• 3. Yes, other ( ___________________________ )
• 4. Two or more of above (which? 1____2____3____ )
• 5. No, or not sure_______________________________
• Age 2-4) Does child hold his hands in strange postures?
• 1. Yes, sometimes or often 2. No________________
• (Age 3-5) Does child sometimes line things up in precise
evenly-spaced rows and insist they not be disturbed?
• 1. No 2. Yes 3. Not sure
CARS Childhood Autism Rating Scale (sample item)
0 No evidence of difficulty or abnormality in relating to people. The child's behavior is
appropriate for his or her age. Some shyness, fussiness, or annoyance at being told
what to do may be observed, but not to an atypical degree.
1.5 (if between these points)
2 Mildly abnormal relationships. The child may avoid looking the adult in the eye, avoid
the adult or become fussy if interaction is forced, be excessively shy, not be as
responsive to the adult as is typical, or cling to parents somewhat more than most
children of the same age.
2.5 (if between these points)
3 Moderately abnormal relationships. The child shows aloofness (seems unaware of
adult) at times. Persistent and forceful attempts are necessary to get the child's attention
at times. Minimal contact is initiated by the child.
3.5 (if between these points)
4 Severely abnormal relationships. The child is consistently aloof or unaware of what the
adult is doing. He or she almost never responds or initiates contact with the adult. Only
the most persistent attempts to get the child's attention have any effect.
ADHD rating scale
• ADHD
Sampling
• One way to describe the amount of possible sampling error is to use
interval estimation. Assuming that sampling errors are normally distributed
you can establish a range of values on either side of the point estimate
(sample) and then determine the probability that the parameter (value) lies
within this range. This probability is expressed as a percentage and is called
the level of confidence
• 95% of the total area under the cure lies within plus or minus two standard
deviations with less than 5% outside those values. If the point estimate
(sample) were 30 and the standard deviation were 4 you could be 95%
certain that the population value is within 22-38 (95% confidence interval)
Sampling from a population• Since studying entire populations would be an enormous
undertaking we sample from the population and infer what the
population is like based on the data obtained from the sample
(using statistical significance)
• Simple Random Sampling Every member of the population has
an equal probability of being selected-if 1,000 people in
population everyone has 1/1000 chance to be selected. In
conducting phone interviews researcher have computer
generated list of phone numbers
Random Number Generator Assume we have a population of 500 subjects and we want a
sample of 30 Select column and row starting point and use 3 digits to include all possible outcomes
Sampling
• Stratified random sampling- The population is divided into
subgroups (strata) and members from each strata are randomly
selected. The subgroups should represent a dimension that is
relevant to the research e.g. If you are conducting a survey of
sexual attitudes you may want to stratify on the basis of age,
gender and amount of education as these factors are related to
sexual attitudes (attributes such as height are not relevant to
the research) pg146
• Stratified sampling also has the advantage of building in
representation of all groups. Out of 10,000 students on campus
10% foreign students on a student visa then you will need at
least 100 from this group in a sample of 1,000 students
• Sometimes researchers will “oversample” from a small
subgroup to ensure their representation in the sample
Sampling distributions• If we have a very large population we may draw a random
sample of 30 from this population and determine some statistic
(e.g. mean) . Then we repeat the process 1,000 times producing
1,000 random samples of size 30 with the corresponding 1,000
sample statistics. A frequency distribution can be drawn up,
similar to a frequency distribution of any type of score resulting
in a model called the (theoretical) sampling distribution of the
statistic (in this case the sampling distribution of the means)
• The expected value of any statistic is the predicted value which
would give the least error for many samples is the mean of the
sampling distribution. The standard error of any statistic is the
standard deviation of its sampling distribution (Source Roscoe chapter 19)
• the standard error of the sample is an estimate of how far the
sample mean is likely to be from the population mean, whereas
the standard deviation of the sample is the degree to which
individuals within the sample differ from the sample mean
Sampling Distribution
• If you took all of these separate means and calculated an overall
mean for the whole lot, you would end up with a value that was
the same as the population mean (the mean you’d get if you
could measure every one of them)
• The arithmetic mean of a sufficiently large number of iterates of
independent random variables, each with a well-defined expected value
and well-defined variance, will be approximately normally distributed-
Central Limit Theorem. In terms of the Central Limit Theorem, as the
sample size increases, the variance decreases, thus creating a relatively
normal distribution
https://www.khanacademy.org/math/probability/statistics-
inferential/sampling_distribution/v/central-limit-theorem
Central Limit Theorem
• The Central Limit Theorem (CLT for short) basically says
that for non-normal data, the distribution of the sample
means has an approximate normal distribution, no
matter what the distribution of the original data looks
like, as long as the sample size is large enough (usually
at least 30) and all samples have the same size.
• The use of an appropriate sample size and the central
limit theorem help us to get around the problem of data
from populations that are not normal. Thus, even
though we might not know the shape of the distribution
where our data comes from, the central limit theorem
says that we can treat the sampling distribution as if it
were normal
Sampling Distribution
• Cluster Sampling-is a sampling technique where the entire
population is divided into groups, or clusters, and a random
sample of these clusters are selected. After the clusters are
chosen all observations/indviduals in the selected clusters
are included in the sample. pg147
• Cluster sampling is typically used when the researcher
cannot get a complete list of the members of a population
they wish to study but can get a complete list of groups or
'clusters' of the population. It is also used when a random
sample would produce a list of subjects so widely scattered
that surveying them would prove to be far too expensive,
(for example, people who live in different postal districts in the UK)
• You could get a list of all classes taught (each class is a cluster),
take a random sample of classes from this list and have all
members (students) of the chosen classes complete your survey
Nonprobability Sampling
• In probability sampling where the probability of
every member is knowable in nonprobability
sampling the probability of being selected is not
known-techniques are arbitrary. A population may
be defined but little effort is expended to ensure
the sample accurately
• nonprobability sampling does
not involve random selection
• Nonprobability sampling is
cheap and convenient
• Three types 1) Haphazard 2) Purposive 3) Quota
Nonprobability Sampling
• Haphazard or Convenience Sampling (Accidental, Judgment)
• Select a sample that is convenient e.g. students walking into the campus café
• Seen in the traditional "man (person) on the street" interviews
conducted frequently by television news programs to get a quick
(although nonrepresentative) reading of public opinion. (use of college
students in much psychological research is primarily a matter of
convenience).
• In clinical practice, we might use clients who are available to us as our
sample. In many research contexts, we sample simply by asking for
volunteers. Clearly, the problem with all of these types of samples is
that we have no evidence that they are representative of the
populations we're interested in generalizing to -- and in many cases we
would clearly suspect that they are not. People sampled such as TV
viewers may be different from the general population (Fox News,
MSNBC) and are often asked about controversial issues such as
abortion, taxes, gun regulation, and wars which induce certain people
Nonprobability Sampling
• In purposive sampling, we sample with a purpose in mind. We usually
would have one or more specific predefined groups we are seeking.
For instance, have you ever run into people in a mall or on the street
who are carrying a clipboard and who are stopping various people
and asking if they could interview them? Most likely they are
conducting a purposive sample (and most likely they are engaged in
market research). They might be looking for Caucasian females
between 30-40 years old. They size up the people passing by and
anyone who looks to be in that category they stop to ask if they will
participate. One of the first things they're likely to do is verify that the
respondent does in fact meet the criteria for being in the sample.
• Purposive sampling can be very useful for situations where you need
to reach a targeted sample quickly and where sampling for
proportionality is not the primary concern. With a purposive sample,
you are likely to get the opinions of your target population, but you
are also likely to overweight subgroups in your population that are
more readily accessible
Nonprobability Sampling
• A sample is chosen that reflects a numerical composition of
various subgroups in the population(technique is similar to
stratified sampling without random sampling-you are
collecting data in a haphazard way pg 148
• Quota sampling is a method of sampling widely used in
opinion polling and market research. Interviewers are each
given a quota of subjects of specified type to attempt to
recruit for example, an interviewer might be told to go out
and select 20 adult men and 20 adult women, 10 teenage
girls and 10 teenage boys so that they could interview them
about their television viewing.
• It suffers from a number of methodological flaws, the most
basic of which is that the sample is not a random sample and
therefore the sampling distributions of any statistics are
unknown
Evaluating Samples
• Even using random sampling does not ensure sample is
representative. Error derives from two sources-
1) Sampling frame used 2) poor response rates
• Sampling frame- The actual population of individuals(or
clusters) from which a random sample will be drawn. Rarely
will this perfectly coincide with the population of interest as
some biases will be introduced- You are compiling a list of
phone numbers to call during the day from the directory and
will exclude those with unlisted numbers, those without
phones and those who are not home during the day
• Response rate- percentage of people in sample who respond
(complete phone or mail survey) Mail surveys have lower
response rates than phone surveys. Can increase response
rate with explanatory postcard before survey arrives, send a
second mailing of the survey or provide SSAEstamped self addressed pg 150
Experimental Design-Chapter 8
• Researcher manipulates the independent variable (usually to
create groups) and then compares the groups in terms of
their scores on the dependent variable (outcome measure)
while keeping all other variables constant through direct
experimental control or randomization- If score on the
dependent variable are different then the researcher can
conclude that the difference was due to the difference
between groups and no other cause (and the experiment will
have internal validity) pg157-8
• A Confounding variable varies along with the independent
variable. Confounding occurs when the effects of the
independent variable and an uncontrolled variable are
intertwined so you cannot determine which causes the effect
Basic Experiments
• The simplest experimental design has two variables, the
independent and dependent with the independent
variable having a minimum of two levels, an experimental
and control group This type of experiment can take one of
two possible forms 1) posttest only design or 2) pretest-
posttest design
• Obtain two equivalent groups (random selection), introduce
independent variable and then measure the effect of the independent
variable on the dependent variable random assignment to groups or
assign same subjects to both groups (CIT study with cross-over design)
Posttest only vs Pretest-Posttest design
• After groups formed (experimental and control)
must choose two levels of the independent variable
(treatment for the experimental group and no
treatment for the control group) e.g. Experimental
group gets treatment to stop smoking and control
group does not
• Pretest-Posttest designs- the only difference
between the posttest only and pretest-posttest
design is that in the latter a pretest is given before
the experimental manipulation is introduced
Posttest only vs Pretest-Posttest
• The pretest-posttest design makes it easier to
assume the groups are equal at the beginning of the
experiment. However if you have randomly
assigned subjects to the different groups using a
sufficiently large sample the groups should be equal
without using a pretest
• Generally need a minimum
of 20-30 Subjects pg160
Posttest only vs Pretest-Posttest
advantages and disadvantages
• Advantages Pretest-Posttest
• While randomization is expected to produce equivalent
groups this assumption may go unmet with small
sample sizes and a pretest can increase the likelihood
of equivalency
• Pretest may be necessary for assignment to groups so
that those that score low or high on any pretest can be
randomly assigned to conditions
• The comparison of pretest to posttest allows each
subject to be evaluated in terms of change between the
measures (with no pretest such comparison is not
possible)
Posttest only vs Pretest-Posttest
advantages and disadvantages
• Pretests helps determine the effects of attrition
(dropout) –Can examine pretest scores of dropouts
to determine if their scores differed from those
completing the study
• Disadvantages Pretest
• A pretest may be time consuming
• A pretest may sensitize (alert) the subjects to the
hypothesis which can result in changing a subject’s
behavior in the study (can disguise the pretest as
part of another study or embed the pretest in a
series of irrelevant measures-time consuming)
Posttest only vs Pretest-Posttest
advantages and disadvantages• Solomon four group design- Half the
subjects receive only the posttest and
the other half receive both pretest and
posttest. If there is no impact of the
pretest, the posttest scores will be the same in the two
control groups (with and without pretest) see table 8.1 pg 162
• Repeated measures has advantage of needing fewer subjects
which decreases the effects of natural variation between
individuals upon the results. Repeated subject designs are
commonly used in longitudinal studies, over the long term, in
educational tests where it is important to ensure that
variability is low and in research on such functions as
perception involving only a few subjects often receiving
extensive training pg164
Between group design vs.
Repeated Measures design
• Between-group design is an
experiment that has two or
more groups of subjects each
being tested by a different
testing factor simultaneously-
each subject is in either the
treatment (experimental) group or the control group pg163
• A repeated-measures design
is one in which multiple, or
repeated, measurements are
made on each subject.
weekly blood pressures
each subject measured after receiving
each level of independent variable
Between group design vs.
Repeated Measures design
• In the between groups design subjects are assigned to each
of the conditions using random assignment
http://www.randomizer.org/form.htm
• In repeated measures the same individual participates in all
of the groups. These studies are more sensitive to finding
statistically significant results-Even if you have randomly
selected and assigned subjects to conditions in the
between groups design there is still individual variation
(naturally occurring “random error”-differences between the subjects
assigned to the different groups) which may make the effect of the
independent variable unclear but when testing the same
person in different conditions (versus different persons in
different conditions) this random error is eliminated
Between group design vs.
Repeated Measures design
• One limitation of repeated measures is that the conditions must
be presented in a particular sequence which could result in an
order effect-the order of presenting the treatments affects the
dependent (outcome) variable (maybe a subject performs
better in the second condition because of practice in the first
condition (practice effect) or performed poorer in the second
condition due to fatigue (fatigue effect) or that the first
treatment influences the second treatment (carryover effect)
• Carryover effect occurs when the first condition produces a
change that is still influencing the person when the second
condition is introduced
Between group design vs.
Repeated Measures design
• Experiment- Subjects are presented with a list of words and
asked to recall as many words as they can. In one condition, the
words are presented one word per second; in the other
condition, the words are presented two words per second. The
question is whether or not having performed in one condition
affects performance in the second condition. Perhaps learning
the first list of words will interfere with learning the second list
because it will be hard to remember which words were in each
list. Or maybe the practice involved learning one list will make it
easier to learn a second list. In either case, there would be a
carryover effect: performance on the second list would be
affected by the experience of being given the first list
• Such effects are dealt with through counterbalancing or
extended time intervals between conditions presented serially
Repeated Measures-types of counterbalancing
• Complete counterbalancing-
All possible orders of
presentation are included in
the experiment pg165-166
• Latin Square-A Latin square is an table filled with n x n different symbols in such a way
that each symbol occurs exactly once in each row and exactly once in each column.
Each condition appears at each ordinal position (1st 2nd 3rd etc.)
and occurs exactly once in each row and once in each
column
• Using a Latin square controls for most
order effects without having to include
all possible orders (each condition preceeds and
follows each condition one time)
• Time Interval-longer rest periods counteract fatigue,practice
effects but require a greater commitment to participate
Matched Pairs Design
• Rather than using random assignment to groups you can first
match subjects on a variable (achieving equivalency in this manner
rather than through randomization) and avoid repeated
measures/counterbalanced designs pg169
• Example study 1000 subjects each receive one of two treatments -
a placebo or a cold vaccine. The 1000 subjects are grouped into
500 matched pairs. Each pair is matched on gender and age.
For example,Pair 1 might be two women,
both age 21. Pair 2 might be two men,
both age 21. Pair 3 might be two women,
both age 22
• matched
•
1) matched
Conducting Experiments Chp9
• Selecting research participants- Determining sample
size. Sampling error is a function of sample size and
the error tends to be smaller for larger samples-The
larger your sample size, the more sure you can be
that their answers truly reflect the population. This
indicates that for a given confidence level, the
larger your sample size, the smaller your confidence
interval.
• http://www.raosoft.com/samplesize.html
Manipulating the Independent variable
• Straightforward manipulations- Subjects are selected and
assigned to conditions. The conditions are constructed to
represent different levels (e.g. high versus low level of
difficulty for material to be learned, high versus low levels of
subject motivation, subjects are categorized as ‘experts’ or
‘naïve’)
• Generally easier to interpret results when the manipulation
is straightforward (without accounting for possible subtleties
in staged manipulations –experimenter effects etc.) pg179
• Most research uses this type of manipulation pg177
• Cost of Manipulation- Straightforward manipulations involve
less presentation of verbal or written material while running
the study with groups of subjects-this is less costly pg181
Staged Manipulations and Confederates
• Staged manipulations used to create some psychological
state (frustration, anger etc.) Zitek et al. and ‘sense of
entitlement’ Subjects playing a video game “lost” when
the game crashed (unfair condition) or because the game
was too difficult (fair condition) Subjects in the unfair
condition later claimed more money than other subjects
when competing against others on a different task
• Confederates frequently used in
staged manipulations-Conformity
experiments-Asch study in which
confederates gave incorrect
judgments on line length before
subjects responded pg178
Strength of the Manipulation
• The simplest design has two levels of the independent
variable. The stronger the manipulation the more likely
differences will be greater between the groups
• Social psychology experiment in which subjects interact with
similar or dissimilar confederates to determine relationship
between similarity and liking. If you have a 10 point scale of
similarity the strongest manipulation would be to assign
subjects to interact with either confederates of level 1
similarity (group A) or level 10 (group B)- When attempting to
determine if a relationship exists a strong manipulation may
be the best choice-However the strongest manipulation may
not represent real-life situations and therefore show low
external validity Also ethically a strong manipulation on
variables such as fear or anxiety may hold ethical concerns
(what is the threat to validity in strong manipulations)
Measuring the Dependent variable
• Types of Measures
• Self-report measures- used to measure attitudes, judgments,
emotional states, attributions
• Behavioral measures-direct observations of behaviors-rate of
behavior, reaction time, duration pg181
• Physiological measures-recordings of bodily responses-
EEG,EMG.GSR,MRI,fMRI
• Multiple measures- Most studies
use more than one measure
(what were they in the studies discussed in class?)
Study of health related behaviors
multiple measures were taken on #
illness days, doctor visits and medication(aspirin) taken pg183
• Multiple measures common everyday experience- people who are considering buying a
house look at the house's age, condition, location, style, features, and construction, as well as the price of nearby homes.
Doctors diagnosing an illness use multiple assessments: the patient's medical history, lab tests, pt answers to questions
Multiple Measures
• Sensitivity of the dependent variable-The dependent
variable should be sensitive enough to detect differences
between groups. Simple yes or no questions are much less
sensitive than scaled question items (in forced choice yes-no people
tend to say yes even if they have some negative feelings and gradations of feelings
are not detected) pg183-4
• Tasks can be made too difficult or too easy Ceiling effect-
task is so easy that everyone does
well and the independent variable
seems to have no effect Floor effect-
task so difficult that almost nobody
does well-Freedman et al. Crowding
did not have an effect on cognitive performance but in
later research when subjects asked to perform more
complex tasks crowding did lower performance
Measures-Cost & Additional controls
• Some measures are more costly than others
• While self-report measures involve generally
inexpensive measures (paper and pencil, ready
questionnaires) other measures more costly-
interrater observations require video equipment
and at least two observers to view tapes and code
behavior-physiological measures require often
expensive equipment
• While a control group is considered the minimum
requirement for a true experiment (RCT) other
types of controls are often needed to address
potentially confounding factors
Subject and Experimenter Effects
• Demand characteristics-some aspect of the
experiment which might convey the purpose
of the study which and the subject may act to
confirm or disconfirm your hypothesis
• This may be countered by deception/cover
stories, use of unrelated filler items in a
questionnaire, use of field studies or
observation. Can also question subjects about
their perception of the study pg185
Experimental controls
• Placebo groups-groups not receiving the treatment
in the study
• The placebo effect refers to the phenomenon in
which some people experience some type of
benefit after the administration of a placebo (a
substance with no known medical effects)
• In certain instances when the benefits of a drug or
treatment are evident you must give the treatment
to the control (placebo) group as soon as those
subjects/patients in the group have completed their
part in the study Has the placebo effect gotten
stronger over time?
Placebos without Deception: A Randomized Controlled Trial in
Irritable Bowel Syndrome-Kaptchuk,T., et al. PLoS One. 2010; 5(12)
• Placebo treatment can significantly influence subjective
symptoms. However, it is widely believed that response to placebo
requires concealment or deception. We tested whether open-label
placebo (non-deceptive and non-concealed administration) is
superior to a no-treatment control with matched patient-provider
interactions in the treatment of irritable bowel syndrome (IBS)
• Open-label placebo produced significantly higher mean (±SD)
global improvement scores (IBS-GIS) at both 11-day midpoint
(5.2±1.0 vs. 4.0±1.1, p<.001) and at 21-day endpoint (5.0±1.5 vs. 3.9±1.3, p=.002
• Placebos administered without deception may be an effective
treatment for IBS. Further research is warranted in IBS, and
perhaps other conditions, to elucidate whether physicians can
benefit patients using placebos consistent with informed consent
• http://www.cbsnews.com/news/treating-depression-is-there-a-
placebo-effect/
Subject and Experimenter Effects
• Experimenter's bias or experimenter effects, is a subjective bias
towards a result expected by the human experimenter. These
effects may occur when the experimenter knows which condition
the subjects are in
• Experimenter might unintentionally treat subjects in the different
groups differently (verbally or non-verbally) or the experimenter
may record or interpret the data and results of the different
groups differently (Rosenthal study of ‘bright’ vs. ‘dull’ rats (1966)
–Langer & Abelson 1974 Psychologists rated person in video as
more disturbed when told it was a patient versus a job applicant
pg187
• Can minimize effect by running all conditions simultaneously,
automating procedures or by making observations single-blind
(subject unaware of condition he/she is in) or double-blind neither
subject or experimenter knows the condition of any subject
Experimental controls-additional considerations
• Writing of research proposal allows you to organize and
plan a study (Introduction & Methods) pg189
• Pilot studies-a limited trial with a small number of
subjects-can ask subjects for feedback
• Manipulation check-by using self-report,
behavioral or physiological measures
you can measure the strength of the
manipulation in the pilot study
(while it might be distracting in
the actual study) and determine if you
obtain non significant results was it due to a problem in
defining/manipulating the independent variable pg190
• Debriefing also provides you with subject feedback
Complex Experimental Designs Chp 10
• Experimental Designs with only two levels of the
independent provides limited information about the
relationship between the independent and dependent
variables (review High (medium) Low anxiety and test
performance and curvilinear relationships)
• If a curvilinear relationship is predicted then at least three
levels of a variable must be used as many curvilinear
relationships exist in psychology (example of fear and
attitude change-increasing the amount of fear aroused by a
persuasive message increases attitude change only up to a
moderate level after which further increases in fear arousal
actually reduce attitude change) pg 198
Factorial Designs
• Designs with multiple levels of the independent variable are
more representative of actual events
• Factorial designs are designs with more than one independent
variable (factor) All levels of each independent variable are
combined with all levels of the other independent variable(s)pg199
• A researcher might be interested in the effect of whether or not a
stimulus person (shown in a photograph) is smiling or not on
ratings of the friendliness of that person. The researcher might
also be interested in whether or not the stimulus person is
looking directly at the camera makes a difference.
• In a factorial design, the two levels of the first independent variable
(smiling and not smiling) would be combined with the two levels of the
second (looking directly or not) to produce four distinct conditions:
smiling and looking at the camera, smiling and not looking at the
camera, not smiling and looking at the camera, and not smiling and not
looking at the camera
Interpretation of Factorial Designs
• Two types of effects are studied in a factorial design
• Main effect and Interaction effect If there are two
independent variables there is a main effect for each of
them pg200
• Main effect-is the overall effect of one independent
variable and the dependent variable,-the overall effect of
each independent variable. In the example of Therapy
type and Therapy Duration there is a main effect for
Therapy type and a main effect for duration of therapy
• Interaction effects occur when the is an interaction
between the two independent variables such that the
effect of one independent variable depend on the level
of the other independent variable
Factorial Designs
Type of Therapy (B) Factorial design
2x2 with four
experimental
conditions
Behavioral Cognitive
Short
Duration of Therapy (B)
n = 50 n = 50
Long n = 50 n = 50
A design with two independent variables with one variable at two levels and
the other at three is a 2 x 3 factorial design with six conditions. A 3 x 3 design
will have nine conditions
Factorial Designs
Type of Therapy (B)
Behavioral Cognitive
Short
Duration of Therapy (B)
n = 50 n = 50
Long n = 50 n = 50
In the above experiment the type of psychotherapy (cognitive vs. behavioral) is
one main effect for the first independent variable (Therapy type and the duration
of psychotherapy (short vs. long)a second main effect of Therapy duration)
Interpretation of Factorial Designs
• In the experiment, the main effect of type
(cognitive vs. behavioral) is the difference between
the average score for the cognitive group and the
average score for the behavioral group … ignoring
duration. That is, short-duration subjects and long-
duration subjects are combined together in
computing these averages. The main effect of
duration is the difference between the average
score for the short-duration group and the average
score for the long-duration group … this time
ignoring type.
Interpretation of Factorial Designs
We see that the subjects in the cognitive
conditions scored higher on average than
the subjects in the behavioral conditions
indicating a main effect for Therapy type
This 2x 2 factorial design has four
experimental conditions-short duration
behavioral therapy, long duration
behavioral therapy, short duration cognitive
therapy and long duration cognitive therapy
Interpretation of Factorial Designs
• Interaction effect- whenever the effect of one
independent variable depends on the level of the other
pg201-If cognitive psychotherapy is better than behavioral
psychotherapy when the therapy is short but not when
the therapy is long, then there is an
interaction between type and duration
of therapy When we say “it depends”
we are indicating that some type
of interaction is at work. You
would like to go to Vegas if you have
enough money and you have
completed your assignments pg202
Interpretation of Factorial Designs
• Effects are all independent of each other. A 2x2
factorial experiment might result in no main effects
and no interaction, one main effect and no
interaction, two main effects and no interaction, no
main effects and an interaction, one main effect
and an interaction, or two main effects and an
interaction. In looking at results presented in a
design table or (more importantly) a graph, you can
interpret what happened in terms of main effects
and interactions.
Factorial Designs with Manipulated
and Nonmanipulated variables
• One common type of factorial design includes both
experimental (manipulated) and nonexperimental
(nonmanipulated) variables These designs
investigate how different people respond to certain
situations. They investigate how the manipulated
(independent) variable affects certain personal
characteristics or attributes (age, gender,personality types etc.)
• Person X Situation studies
• Extroverts get excited about parties Introverts get
anxious
Person X Situation Effects Type D personality in
patients with coronary artery disease Vukovic et al. Danubina 2014 Mar;26
BACKGROUND: During the past decade studies have shown that Type D personality is associated
with increased risk of cardiac events, mortality and poor quality of life. Some authors suggested
that depression and Type D personality have substantial phenomenological overlap.
SUBJECTS AND METHODS: The sample consisted of non-consecutive case series of seventy nine
patients with clinically stable and angiographically confirmed coronary artery disease (CAD),
who had been admitted to the Clinic of Cardiology, University Clinical Centre, from May 2006 to
September 2008. The patients were assessed by the Type-D scale (DS14), The Beck Depression
Inventory (BDI), and provided demographic information. Risk factors for CAD were obtained
from cardiologists. (Type D (distressed) Negative affect (worry,anxiety) and social inhibition)
RESULTS: The findings of our study have shown that 34.2% patients with CAD could be classified
as Type D personality. The univariate analysis has shown that the prevalence of Type D
personality was significantly higher in individuals with unstable angina pectoris and myocardial
infarction (MI) diagnoses (p=0.02). Furthermore, some components of metabolic syndrome
were more prevalent in patients with Type D personality: hypercholesterolemia (p=0.00),
hypertriglyceridemia (p=0.00) and hypertension (p=0.01). Additionally, the distribution of
depression in patients with a Type D personality and a non-Type D personality were statistically
significantly different (p=0.00).
CONCLUSION: To our knowledge, this study is the first one to describe the prevalence and
clinical characteristics of the Type D personality in patients with CAD in this region of Europe.
We have found that the prevalence of Type D personality in patients with CAD is in concordance
with the other studies.
Person by Situation Interaction effects
Furnham et al. examined distracting effect of television
on cognitive processing (studying) in introverts and
extroverts. Both extraverts and introverts performed
better in silence but extraverts performed better than
introverts in the presence of television distraction
Is there a main effect? Is there an interaction effect?
Factorial designs with both manipulated independent
variables and subject variables
recognize that a better
understanding of behavior
requires knowledge of both
situational variables and personal attributes of people pg204
Interactions and Moderator Variables
• Moderator variables influence the relationship
between two other variables A moderator is a variable
(z) whereby x and y have a different relationship
between each other at the various levels of z. Note
that this is essentially what is entailed in an
interaction. a moderator variable is one that influences the
strength of a relationship between two other variables, and
a mediator variable is one that explains the relationship
between the two other variables
• Whereas moderator
variables specify when certain effects will hold,
mediators speak to how or why such effects occur
• (Baron & Kenny, 2986, p. 1176).
Mediate vs. ModerateMediating variable-Synonym for intervening variable.
Example: Parents transmit their social status to their children
directly, but they also do so indirectly, through education:
Parent’s status ➛ child’s education ➛ child’s status- education
is a mediating variable (mediators explain)
Moderating variable A variable that influences,
or moderates, the relation between two other
variables and thus produces an interaction effect. a moderator
is a third variable that affects the correlation of two variables
if we were to replicate the Asch Experiment experiment with a
female subject and found that her answers (Y variable) were
not affected by confederate’s answers (X variable), then we
could say that gender is a Moderator (M) in this case
https://www.youtube.com/watch?v=3ymkfDBwel0
Moderators vs. Confounders
• Moderator: A moderator is a variable (z) whereby x and y have a
different relationship between each other at the various levels of z.
Note that this is essentially what is entailed in an interaction. A
variable that influences, or moderates, the relation
between two other variables and thus produces an
interaction effect.
• Confounder: A third variable that is related to x in a non-causal
manner and is related to y either causally or correlationally. The third
variable (z) is related to y even when x is not present. A confounding
variable is an extraneous variable (i.e., a variable that is not a focus of
the study) that is statistically related to (or correlated with) the
independent variable. A variable that obscures the effects of
another variable.
Let’s review How to control for
confounding variables
• Confounding variable (continued)This is bad because the point
of an experiment is to create a situation in which the only
difference between conditions is a difference in the
independent variable. This is what allows us to conclude that
the manipulation is the cause of differences in the dependent
variable. But if there is some other variable that is changes
along with the independent variable, then this confounding
variable could be the cause of any difference
• Controlling confounding variables-Essentially all person
variables can be controlled by random assignment. If you
randomly assign subjects to conditions, then on average they will be
equally intelligent, equally outgoing, equally motivated, and so on
• variablehttps://www.youtube.com/watch?v=B7QdNYLp_E0
confounding variables
Moderator variables
• A moderator variable changes the strength of an effect or
relationship between two variables. Moderators indicate
when or under what conditions a particular effect can be
expected. A moderator may increase the strength of a
relationship, decrease the strength of a relationship, or
change the direction of a relationship. In the classic case, a
relationship between two variables is significant (i.e, non-
zero) under one level of the moderator and zero under the
other level of the moderator. For example, work stress
increases drinking problems for people with a highly
avoidant (e.g., denial) coping style, but work stress is not
related to drinking problems for people who score low on
avoidant coping (Cooper, Russell, & Frone, 1990).
Example of Moderation
• Stress Depression
Social Support
One of the clearest examples of moderation was presented by
Cohen and Wills (1985). They argued that the social support
literature (to that point in 1985) had neglected to consider the role
of social support as a moderator of the stress to adjustment
relationship. This moderation relationship is often depicted as shown above
• This schematic suggests that the relationship between stress and
depression may differ in strength at different levels of social
support. In other words, stress may be more strongly associated
with depression under conditions of low social support
compared to conditions of high social support.
Outcomes of a 2 X 2 Factorial Design
• Two levels to each of two
independent variables We must
determine if there is a significant
main effect for variables A, B and an
interaction effect between the variables
• In the example to the right there
is a Main Effect for Both Room
Temperature and Test Difficulty
but no interaction effect.
Main effects and interaction effects
• We see that the six subjects in the
cognitive conditions scored three
points higher on average than the
six subjects in the behavioral
conditions. This is the main effect
of the type of psychotherapy.To see
the main effect of the duration of
psychotherapy, we compare the
average score in the short condition with the average score in the
long condition, now computing these averages across subjects in
the cognitive and behavioral conditions. We see that the six
subjects in the long conditions scored three points higher on
average than the six subjects in the short conditions. This is the
main effect of the duration of psychotherapy
Main Effects Therapy Type X Duration
Below are the same results plotted in the form of a bar graph. The main effect of type is
indicated by the fact that the two cognitive bars are higher on average than the two
behavioral bars. The main effect of duration is indicated by the fact that the two long-
duration (dark) bars are higher on average than the two short-duration (light) bars
Main Effects and Interaction Effects
Parallel lines in these types of graphs
indicate that there are main effects in the
results, but no interactions. If the lines are
not parallel this is indicative of an
interaction.
"Do students do better on hard tests or
easy tests?" "It depends, in a fifty degree
room there is no difference, but in a ninety
degree room they do much better on easy
tests.“ Interaction effect
Students do best when the test is easy and
the temperature is 90 degrees. Interaction
effect
Music is as distracting as noise: the differential distraction of
background music and noise on the cognitive test performance
of introverts and extraverts Furnham, 2002
• Previous research has found that introverts' performance on complex cognitive
tasks is more negatively affected by distracters, e.g. music and background
television, than extraverts' performance. This study extended previous research by
examining whether background noise would be as distracting as music. In the
presence of silence, background garage music and office noise, 38 introverts and 38
extraverts carried out a reading comprehension task, a prose recall task and a
mental arithmetic task. It was predicted that there would be an interaction
between personality and background sound on all three tasks: introverts would do
less well on all of the tasks than extraverts in the presence of music and noise but
in silence performance would be the same. A significant interaction was found on
the reading comprehension task only, although a trend for this effect was clearly
present on the other two tasks. It was also predicted that there would be a main
effect for background sound: performance would be worse in the presence of
music and noise than silence. Results confirmed this prediction. These findings
support the Eysenckian hypothesis of the difference in optimum cortical arousal in
introverts and extraverts.
• What was the subject variable? What was the manipulated variable? Was there a
main effect? Was there an interaction effect?
ANOVA
• A procedure known as the Analysis of Variance
(ANOVA) is used to assess the statistical significance
of main effects and interaction in a factorial design pg207
• the ANOVA can be used for factorial designs (or
designs which employ more than one IV). Note that,
in this context, an IV is often referred to as a factor.
The factorial design is very popular in the social
sciences. It has a few advantages over single variable
designs. The most important of these is that it can
provide some unique and relevant information about
how variables interact or combine in the effect they
have on the dependent variable
ANOVA example
• The human literature had shown that children diagnosed
with Fetal Alcohol Syndrome (FAS) were more active and
impulsive than children not receiving this diagnosis. They
also seemed to have a more difficult time controlling
themselves (i.e., self restraint). These problems typically
become less severe as the child ages. Were the behavioral
abnormalities observed in the children with FAS due to the
fact that their mothers consumed alcohol while they were
pregnant or due to nutritional factors (since the diet of an
alcoholic is typically not wholesome & well balanced)?
Another possible causal factor of the abnormalities
observed is spousal abuse. Offspring of rodents given
alcohol when pregnant show similar morphological and
behavioral changes to that observed in humans
Study of Alcohol on Learning
• We will have two IVs or factors and each will
have two levels (or possible values). The table
below illustrates the design. Note that EDC
refers to Ethanol Derived Calories
Age (factor B)
Adolescent Adult
Maternal
Diet
(factor A)
Chocolate
Milk
(0% EDC)
n=5 n=5
White
Russian
(35% EDC)
n=5 n=5
• This is an example of a 2x2 factorial design with 4 groups (or cells),
each of which has 5 subjects. This is the simplest possible factorial
design. The Dependent Variable (DV) used was a Passive Avoidance
(PA) task. Rats are nocturnal, burrowing creatures and thus, they
prefer a dark area to one that is brightly lit. The PA task uses this
preference to test their learning ability. The apparatus has two
compartments separated by a door that can be lifted out. One of the
compartments has a light bulb which is controlled by the
experimenter. The floor can be electrified and the rat receives a brief,
mild electric shock
Age (factor B)
Adolescent Adult
Maternal
Diet
(factor A)
Chocolate
Milk
(0% EDC)
n=5 n=5
White
Russian
(35% EDC)
n=5 n=5
ANOVA example
• The first trial The rat is placed in the compartment with the
light bulb as shown below. When the trial begins, three
things happen. The door is raised, the light is turned on, and
a stopwatch is started
Within a few seconds of the door
being raised, the rat will typically
sniff around and begin to move
into the darker compartment
(without the light). When the rat
has completely entered the darker
compartment, the door is closed
and the brief, mild shock is
administered. The goal is for the
rat to learn not to move into the
darker compartment. In other
words, by remaining passive, the
rat can avoid the shock, hence the
term passive avoidance
ANOVA example
• For our purposes, we will use a criteria of 180 seconds as our
operational definition of learning PA. That is, when the rat
remains in the brightly lit compartment for 3 minutes, we will say
that it has learned the task and what we measure is the number
of trials it takes the
rat to do this. (Note that a
smart rat will take less trials to learn.)
Thus, the PA task
was chosen as the DV
because it can be thought
of as a measure of
"self restraint.“ The first
possibility is that nothing
is significant
Age (factor B)
A
marginalsAdolescen
t
Adult
Maternal
Diet
(factor A)
(0% EDC) 3 3 3
(35% EDC) 3 3 3
B marginals 3 3
ANOVA example continued
• The second possibility is that the main effect of
factor A is significant. Here is one possible
representation of this outcome
Age (factor B) A
marginalsAdolescent Adult
Maternal Diet
(factor A)
(0% EDC) 2 2 2
(35% EDC) 4 4 4
B marginals 3 3
Notice that the A marginals show a
difference of two and thus the main
effect of factor A is significant. The
animals receiving alcohol in utero took
more trials to learn PA than controls.
The fact that the effect is consistent
across both levels of factor B tells us
that there is no interaction. In
graphical form:
ANOVA example continued
• The next possibility is that the main effect of factor B is
significant. Here is
one possible
representation of
this outcome
Age (factor B) A
marginalsAdolescent Adult
Maternal
Diet
(factor A)
(0% EDC) 4 2 3
(35% EDC) 4 2 3
B marginals 4 2
Notice that the B marginals
show a difference of two and
thus the main effect of factor B
is significant. The older animals
took fewer trials to learn PA
than the younger animals.
The fact that the effect is
consistent across both levels
of factor A tells us that there is no
interaction
ANOVA example continued
• The next possibility is that both main effects are significant.
Here is one possible representation of this outcome
Age (factor B) A
marginalsAdolescent Adult
Maternal Diet
(factor A)
(0% EDC) 3 1 2
(35% EDC) 5 3 4
B marginals 4 2
Notice that both sets of marginals
show a difference of two and thus
main effects are significant. The
animals receiving alcohol in utero
took more trials to learn PA than
controls and the older animals took
less trials to learn PA than the
younger animals. The fact that both
of these main effects are consistent
across the levels of the remaining
factor tells us that there is no
interaction
ANOVA example continued
• The next possibility is that the interaction is significant.
Here is one possible representation of this outcome
Age (factor B) A
marginalsAdolescent Adult
Maternal Diet
(factor A)
(0% EDC) 2 4 3
(35% EDC) 4 2 3
B marginals 3 3
Notice that both sets of marginals
show no difference, thus neither main
effect is significant. However, some of
the cell means do differ by two. The
animals receiving alcohol in utero
took more trials to learn PA when
young and less when older than
controls. In other words, the effects of
prenatal alcohol depended on the
age of the animal when tested.
Whenever the effect of one factor
depends upon the levels of another,
there is an interaction.
ANOVA example continued
• The next possibility is the interaction and the main effect of
factor A are significant as shown below
Age (factor B) A
marginalsAdolescent Adult
Maternal Diet
(factor A)
(0% EDC) 1 3 2
(35% EDC) 5 3 4
B marginals 3 3
Notice that the B marginals show no
difference, thus the main effect of B is
not significant. The A marginals do
show a difference of two which
demonstrates a main effect of factor A.
This tells us that the animals that
received alcohol in utero took longer to
learn PA than the animals that didn't.
However, the cell means tell the real
story here. That is, the effect depends
on age. The animals receiving alcohol
in utero took more trials to learn PA
when young but were normal when
older when compared to controls.
Independent Groups, Repeated
Measures and Mixed Factorial designs• In a 2 x 2 Factorial design with four conditions for an
Independent Group (between-subjects) design, a different
group of subjects will be assigned to each of the four conditions.
Following the example on pg208 if you have a 2 x 2 design with
10 subjects in each condition you will need 40 subjects total
Level 1 Var B Level 2
Var A
• Level 1
• Level 2
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
S11,S12,S13,S14,
S15,S16,S17,S18,
S19,S20
S21,S22,S23,S24,
S25,S26,S27,S28,
S29,S30
S31,S32,S34,S35,
S36,S37,S38,
S39,S40
2 x 2 Independent
Groups ( Between
Subjects Design
Independent Groups, Repeated
Measures and Mixed Factorial designs
• In a repeated measures (within-subjects) design the same
subjects will participate in ALL conditions
• Level 1 Var B Level 2
Var A
Level 1
Level 2
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
2 x 2 Repeated
measures
(within-groups)
design
Independent Groups, Repeated
Measures and Mixed Factorial designs
• In a 2 x 2 mixed Factorial design ten different subjects are
assigned to Levels 1 and 2 of Variable A but Variable B is a
repeated measures with subjects assigned to each of the two
levels of Variable A receiving both Levels of Variable B
• Level 1 Variable B Level 2
• Var A
• Level 1
• Level 2
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
S1,S2,S3,
S4,S5,S6,S7,S8,
S9,S10
S11,S12,S13,S14,S
15,S16,S17,S18,
S19,S20
S11,S12,S13,S14,S
15,S16,S17,S18,
S19,S20
2 x2 Mixed
Factorial
Design
Increasing the Number of Levels of
an Independent Variable
• You can increase the complexity of the basic 2 x 2
Factorial design by increasing the number of levels
of one or more of the independent variables pg209
Example of 2 x 3 Factorial Design
• Dr. Sy Cottick investigated driver frustration under
low, medium, and high density traffic conditions
and under traffic flow controlled by a police officer
or a traffic signal ( 2 conditions of Traffic Control X 3
conditions of Traffic Density. The measure of
frustration was the
number of horns honked
by drivers before receiving
the right-of-way at a
controlled intersection.
2 X 3 Factorial Example
• Is there a Main Effect for Traffic Density?
• Yes The average number of horn honks increases as
traffic density increases
• Is there a main effect of type of controlled intersection?
• Yes People honk more often at signal controlled
intersections than at officer controlled intersections
2 4 Mean = 3
4 6 Mean = 5
8 10 Mean = 9
Mean = 4.67 Mean = 6.67
Traffic
Type of controlled intersection
Density Officer Signal
Low
Medium
High
2 X 3 Factorial Example
• Is there an interaction between traffic density and
type of controlled intersection?
• No The same difference in horn honks between officer
and signal exists at each level of traffic density, so there is
no interaction.
12
10
8
Number of 6
Officer
Signal
horn honks
4
2
0
Low Medium High
Traffic Density
it is not always possible or
practical to do an RCT
(randomized clinical trial). It may
not be ethical to do a RCT in
some cases (for example,
tobacco use), it may be too
expensive, especially for early or
exploratory studies. This 2 x
2factorial design has four
experimental conditions
Single-Case, Quasi-Experimental and
Developmental Research Chapter 11
• While the classic experimental design includes
randomly assigned subjects to the and
independent variable conditions with a dependent
variable (outcome) measure with all other variables
held constant three types of special research
situations exist
• 1) Single-Case 2) Quasi-Experimental and
3)Developmental Research
Single-subject, N=1 Designs
• Single-subject research is experimental rather than
correlational or descriptive, and its purpose is to
document causal, or functional, relationships
between independent and dependent variables.
Single-subject research employs within- and
between-subjects comparisons to control for major
threats to internal validity and requires systematic
replication to enhance external validity. (Martdia,
Nelson, & Marchand-Martella, 1999).
• (Each participant serves as his or her own control).
• Single-subject research requires operational
descriptions of the participants, setting, and the
process by which participants were selected (Wolery& Ezell, 1993)
Single Case Experimental Designs
• Early work in single subject
designs credited to B.F. Skinner
with many case studies or
single case designs in clinical
counseling and educational
settings
• Single case studies begin with
a baseline measure (control)
followed by a manipulation
• In order to determine if the
treatment was effective there
is a reversal design A-B-A pg216
Single Case Designs ABA Designs
• A baseline and Observation
• B Treatment or Intervention
• A Withdrawal of Treatment
• The ABA design can be further improved by ABAB design
and can be
extended out even
further ABABAB as
a single reversal
may not be
powerful enough
Single Case Designs
• A single reversal may not be enough
but in addition the observed effect
may have been due to a random
fluctuation in behavior which would
justify multiple withdrawals and treatments pg 217-218
Unlike Group studies Single case designs frequently involve multiple
repeated observations of the subject(s)
• Multiple Baseline Designs
• In certain instances it is unethical to reverse treatment
that reduces dangerous or illegal behaviors such as
drug/alcoholism or sexual deviancy. In such cases it
may be necessary to demonstrate the effectiveness of
treatment with a multiple baseline design
Multiple Baseline Designs
One variation of multiple baseline
designs is across subjects in which
the behavior of several subjects is
measured over time and the
treatment is introduced at a
different time for each subject.
Change takes place over various
subjects ruling out random effects
Another version is a multiple baseline across behaviors
Several different behaviors of a single subject are
measured over time. At different times the same
manipulation is applied to each of the behaviors
Multiple Baseline Designs
• Multiple baselines across behaviors- A reward or token
system could be applied to different behaviors of the
same subject/patient Different ones for grooming,
socialization, appropriate speech pg219
• A third variation of the multiple baseline is across
situations in which the same behavior is measured in
Single-Case Designs
• Procedures with any one subject can be replicated with
other subjects enhancing the generalizability or external
validity (or replicated across settings). This is often done in
research
• Sidman (1960) suggests to present data from each single
case design separately and not try to group the means of all
the individuals as such means may be misleading (e.g. The
treatment may have been effective in changing the
behavior of some individuals but not others)
• Within education, single-subject research has been used
not only to identify basic principles of behavior (e.g.,
theory), but also co document interventions (independent
variables) that are functionally related to change in socially
important outcomes (dependent variables; Wolf, 1978).
Help Line Evaluation
• An evaluation was conducted of the impact of different
methods of agency outreach on the number of phone calls
received by a help line (information and referral). The
baseline period represented a time in which there was no
outreach; rather, knowledge about the help line seemed to
spread by word of mouth. The B phase represented the
number of calls after the agency had sent notices about its
availability to agencies serving older adults and families.
During the C phase, the agency ran advertisements using
radio, TV, and print media. Finally, during the D phase,
agency staff went to a variety of different gatherings, such
as community meetings or programs run by different
agencies, and described the help line.
Evaluation of Help Line-Glatthorn
NumberofCalls
EXHIBIT 7 -1 4 Multiple Treatment Design
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Week
Phone calls did not increase appreciably after notices were sent to other
professionals or after media efforts, but it did increase dramatically in the
final phase of the study. This graph demonstrates how tricky the
interpretation of single-subject data can be. A difficulty in coming to a
conclusion with such data is that only adjacent phases can be compared so
that the effect for nonadjacent phases cannot be determined. One
plausible explanation for the findings is that sending notices to
professionals and media efforts at outreach were a waste of resources in
that the notices produced no increase in the number of calls relative to
doing nothing, and advertising produced no increase relative to the
notices. Only the meetings with community groups and agency-based
presentations were effective, at least relative to the advertising. An
alternative interpretation of the findings is that the order of the activities
was essential. There might have been a carryover effect from the first two
efforts that added legitimacy to the third effort. In other words, the final
phase was effective only because it had been preceded by the first two
efforts. If the order had been reversed, the impact of the outreach efforts
would have been negligible. A third alternative is that history or some
other event occurred that might have increased the number of phone calls.
ASSESSMENT OF DEVIANT AROUSAL IN ADULT MALE SEX OFFENDERS
WITH DEVELOPMENTAL DISABILITIES- Reyes et al. JOURNAL OF APPLIED
BEHAVIOR ANALYSIS, 2006,39,173-188
• Some statistics regarding very broad characteristics of sex offenders
are available (i.e., age, race, etc.), but are limited due to the wide
variability in this population. In general, the demographic
characteristics of sex offenders seem to match those of nonoffenders
• Ten individuals, residing in a treatment facility specializing in the
rehabilitation of sex offenders with developmental disabilities,
participated in an arousal assessment involving the use of the penile
plethysmograph. All of these individuals had been accused of
committing one or more sexual offenses and had been found
incompetent to stand trial. The arousal assessments involved
measuring change in penile circumference to various categories of
stimuli both appropriate (adult men and women) and inappropriate
(e.g., 8- to 9-year-old boys and girls). Before each session, the
technician was required to calibrate the penile strain gauge to ensure
accurate measurement. The video clips were presented one at a time
in one of three predetermined orders
ASSESSMENT OF DEVIANT AROUSAL
• Differentiated deviant arousal was characterized as
showing arousal in the presence of a particular age and
gender category that was higher than the arousal to other
categories and to the neutral stimulus. differentiated
arousal patterns were also consistently higher than arousal
levels to the neutral stimulus. Undifferentiated deviant
arousal was characterized as showing
similar arousal levels to deviant and
non deviant stimuli that was higher
than the arousal in the presence of the
neutral stimulus. The arousal assessments
showed that not all of the participants were differentially
aroused by the deviant stimuli
ASSESSMENT OF DEVIANT AROUSAL
• Specific targets for teaching are identified. Thus, skills
training can be conducted to teach avoidance of high-risk
situations (e.g., being in situations with children of a certain age group)
• Second, the assessment results could be used to
evaluate the effects of commonly used, but poorly
validated, treatments. For example, classical
conditioning, which typically involves pairing unpleasant
odors with deviant arousal, has been commonly used but
has not been validated.
• Thirdly The effects of presession masturbation could be
tested to determine whether ejaculation serves as an
establishing operation or an abolishing operation for
sexual stimuli as reinforcing (or at least as arousing stimuli)
Program Evaluation
• Program evaluation is a method for collecting,
analyzing, and using information to answer
questions about projects, policies and programs,
particularly about their effectiveness and efficiency.
• The question that needs to
be answered is whether or
not the programs people are
funding, implementing, voting for, receiving or
objecting to are producing
the intended effect. The main
focus is outcome evaluation
which determines if the
program was effective pg221
Program Evaluation
• Evaluation is the systematic application of scientific
methods to assess the design, implementation,
improvement or outcomes of a program (Rossi &
Freeman, 1993; Short, Hennessy, & Campbell, 1996).
The term "program" may include any organized action
such as media campaigns, service provision,
educational services, public policies, research projects.
• Rossi et al. (2004) identified five types of evaluations
each attempting to answer different questions
1) Needs Assessment 2) Program Theory Assessment
3) Process Evaluation 4) Outcome Evaluation
5) Efficiency Assessment
Needs Assessment
• A needs assessment is a part of planning processes
determining if there are problems that need to be addressed
in a target population( Is adolescent drug abuse a problem in
the community?) – A general 12 step process-Data may come
from surveys, interviews, statistical data provided by various
agencies pg221
• Confirm the issue and audiences
• Establish the planning team
• Establish the goals and objectives
• Characterize the audience
• Conduct information and literature search
• Select data collection methods
• Determine the sampling scheme
• Design and pilot the collection instrument
• Gather and report data; Analyze data; Manage data
• Synthesize data and create report
Program Theory
• Program evaluation often involves collaboration of
researchers, service providers and prospective
client of the program to determine that the
proposed program does actually address the needs
of the target population in appropriate ways.
• Example cited in assessing the needs of homeless
men and women in NYC men needed help with
drinking or drug problems, handling money and
social skills while women needed help with heath
and problems- Any designed program must take
these factors into account and provide a rationale
for how homeless individuals will benefit from the
program
Process Evaluation
• When the program is under way the evaluation researcher
monitors it to determine if it is being effective. Is the
program doing what it is supposed to do? The types of
questions asked when designing a process evaluation are
different from those asked in outcome evaluation. The
questions underlying process evaluation focus on how well
interventions are being implemented. Typical questions
asked include, but are not limited to:
• What intervention activities are taking place?
• Who is conducting the intervention activities?
• Who is being reached through the intervention activities?
• What inputs or resources have been allocated or mobilized for program
implementation?
• What are possible program strengths, weaknesses, and areas that need
improvement?
Outcome Evaluation (Impact Assessment)
• Outcome evaluations measure to what degree
program objectives have been achieved (i.e. short-
term, intermediate, and long-term objectives). This
form of evaluation assesses what has occurred
because of the program, and whether the program
has achieved its outcome objectives. pg223
An outcome evaluation focused on tobacco prevention
activities can measure the following elements
Changes in intended and actual tobacco-related behaviors
Changes in people’s attitude toward, and beliefs about, tobacco
Changes in people’s awareness and support for interventions
and policy or advocacy effort
True experimental designs may not always be possible in these conditions and
quasi-experimental designs and single-case designs may offer good alternatives
Program Evaluation Efficiency Assessment
Final program evaluation question addresses efficiencypg222
assessment. Once shown that a program does have its
intended effect, researcher must determine if it is worth
the resources that must be dedicated to it Cost vs Benefits
When Bad things Happen to Good Intentions
• The Drug Abuse Resistance Education DARE reviewed
• When it became known that the prestigious American
Journal of Public Health planned to publish the study,
DARE strongly objected and tried to prevent publication.
"DARE has tried to interfere with the publication of this.
They tried to intimidate us," the publication director
reported (also see pg230 text)
• The U.S. Department of Education prohibits schools from
spending its funding on DARE because the program is
completely ineffective in reducing alcohol and drug use.
DARE was declared as ineffective by U.S. General
Accounting Office, the U.S. Surgeon General, the National
Academy of Sciences, and the U.S. Department of
Education-David J. Hanson, Ph.D. http://www.alcoholfacts.org/DARE.html
An outcome evaluation of Project DARE
Christopher Ringwalt1, Susan T. Ennett2 and Kathleen D. Holt2 Health Educ. Res. (1991) 6 (3): 327-337
• This paper presents the results of an evaluation of the effects of the
Drug Abuse Resistance Education (DARE)Project, a school-based drug
use prevention program, in a sample of fifth and sixth graders in
North Carolina. DARE is distinguished by its use of specially trained,
uniformed police officers to deliver 17 weekly lessons in the
classroom. The evaluation used an experimental design employing
random assignment of 20 schools to either a DARE or no-DARE
condition, pre- and post-testing of both groups, attrition assessment,
adjustments for school effects, and control for non-equivalency
between comparison groups.
• DARE demonstrated no effect on adolescents' use of alcohol,
cigarettes or inhalants, or on their future intentions to use these
substances. However, DARE did make a positive impact on adolescents'
awareness of the costs of using alcohol and cigarettes, perceptions of the
media's portrayal of these substances, general and specific attitudes towards
drugs, perceived peer attitudes toward drug use, and assertiveness.
How effective is drug abuse resistance education? A
meta-analysis of Project DARE outcome evaluations S
T Ennett et al. Am J Public Health. 1994 September; 84(9): 1394–1401
This study used meta-analytic techniques to review eight methodologically
rigorous DARE evaluations
INTRODUCTION Project DARE (Drug Abuse Resistance Education) is the
most widely used school-based drug use prevention program in the United
States, but the findings of rigorous evaluations of its effectiveness have not
been considered collectively. METHODS. We used meta-analytic techniques
to review eight methodologically rigorous DARE evaluations. Weighted
effect size means for several short-term outcomes also were compared
with means reported for other drug use prevention programs. RESULTS.
The DARE effect size for drug use behavior ranged from .00 to .11 across
the eight studies; the weighted mean for drug use across studies was .06.
For all outcomes considered, the DARE effect size means were substantially
smaller than those of programs emphasizing social and general
competencies and using interactive teaching strategies. CONCLUSIONS.
DARE's short-term effectiveness for reducing or preventing drug use
Effect Size
• Consider an experiment conducted by Dowson (2000) to investigate time of
day effects on learning: do children learn better in the morning or
afternoon? A group of 38 children were included in the experiment. Half
were randomly allocated to listen to a story and answer questions about it
(on tape) at 9am, the other half to hear exactly the same story and answer
the same questions at 3pm. Their comprehension was measured by the
number of questions answered correctly out of 20.
• The average score was 15.2 for the morning group, 17.9 for the afternoon
group: a difference of 2.7. But how big a difference is this? If the outcome
were measured on a familiar scale, such as GCSE grades, interpreting the
difference would not be a problem. If the average difference were, say, half
a grade, most people would have a fair idea of the educational significance
of the effect of reading a story at different times of day. However, in many
experiments there is no familiar scale available on which to record the
outcomes. The experimenter often has to invent a scale or to use (or adapt)
an already existing one - but generally not one whose interpretation will be
familiar to most people
Effect Size
• One way to get over this problem is to use the amount of variation in scores
to contextualize the difference. If there were no overlap at all and every
single person in the afternoon group had done better on the test than
everyone in the morning group, then this would seem like a very substantial
difference. On the other hand, if the spread of scores were large and the
overlap much bigger than the difference between the groups, then the
effect might seem less significant. Because we have an idea of the amount
of variation found within a group, we can use this as a yardstick against
which to compare the difference. This idea is quantified in the calculation of
the effect size. effect size is a measure of the strength of a phenomenon
• The concept of effect size already appears in everyday language. For example, a weight loss
program may boast that it leads to an average weight loss of 30 pounds. In this case, 30
pounds is the claimed effect size. Another example is that a tutoring program may claim that
it raises school performance by one letter grade. This grade increase is the claimed effect
size of the program CALCULATE EFFECT SIZE http://www.uccs.edu/~lbecker/
Robert Coe University of Durham
http://www.leeds.ac.uk/educol/documents/00002182.htm
effect size r
Small 0.10
Medium 0.30
Large 0.50
Effect Size
• The concept is illustrated in Figure 1, which shows two possible ways the difference
might vary in relation to the overlap. If the difference were as in graph (a) it would be
very significant; in graph (b), on the other hand, the difference might hardly be
noticeable. In Dowson's time-of-day effects experiment, the standard deviation (SD) =
3.3, so the effect size was (17.9 - 15.2)/3.3 = 0.8. An effect size is exactly equivalent to a
'Z-score' of a standard Normal distribution. For example, an effect size of 0.8 means that
the score of the average person in the experimental group is 0.8 standard deviations
above the average person in the control group, and hence exceeds the scores of 79% of
the control group. With the two groups of 19 in the time-of-day effects experiment, the
average person in the 'afternoon' group (i.e. the one who would have been ranked 10th
in the group) would have scored about the same as the 4th highest person in the
'morning' group The basic formula to calculate the effect size is to subtract the mean
of the control group from that of the experimental group and, then, to divide the
numerator by the standard deviation of the scores for the control group
Quasi-Experimental Designs
• The experimental method received a big boost in the 1920s from
a young Englishman named Ronald Fisher. Fisher's modern
experimental methods were applied in agricultural research for 20
years or so before they began to be applied in psychology and
eventually in education.
• In the early 1960s, a psychologist, Donald Campbell, and an
educational researcher, Julian Stanley (Campbell & Stanley, 1963),
published a paper that was quickly acknowledged to be a classic.
They drew important distinctions between experiments of the
type Fisher devised and many other designs and methods being
employed by researchers with aspirations to experiments but
failing to satisfy all of Fisher's conditions. Campbell and Stanley
called the experiments that Fisher devised "true experiments."
The methods that fell short of satisfying the conditions of true
experiments they called "quasi-experiments," quasi meaning
seemingly or apparently but not genuinely so.
Quasi-Experimental Designs
• Quasi-experimental designs address the need to study the
effect of an independent variable in settings in which the
controls of true experimental designs cannot be achieved pg222
• A quasi-experiment is an empirical study used to estimate the
causal impact of an intervention on its target population. Quasi-
experimental research share similarities with the traditional
experimental design or randomized controlled trial, but they
specifically lack the element of
random assignment to treatment or
control. Instead, quasi-experimental
designs typically allow the researcher
to control the assignment to the
treatment condition, but using some
criterion other than random
assignment (e.g. an eligibility cutoff mark)
Quasi-Experimental Designs
• A short-hand proposed by Cook and Campbell and
adopted by many others uses the following code to
describe quasi-experimental design (not used in text but very common)
• R = randomization
On = observation at time n
X = intervention (i.e. surgery or giving a drug)
The One-Shot Case Study (one group postest-only design)
•No control group. This design has virtually no
internal or external validity
there is no means for determining whether change
occurred as a result of the treatment or program
Example-Training program for employees has only one
group with one intervention and one observation(after the fact)
Treatment Post-test
X O
.
Quasi-Experimental Designs
• For example, you want to determine whether praising
primary school children makes them do better in arithmetic.
You measure mathematics achievement with a test. To test
this idea, you choose a class of 2nd grade pupils and increase
praising of children and you find that their mathematics score
did increase. You conclude that praising children, increases
their mathematics score.
X O
(praise) (math scores)
• What are the weaknesses of this design?
• 1) Selection: It is possible that the students you selected
as subjects were already good in mathematics.
2) History: If the school had organized a motivation course on
mathematics for these students, it might influence their
performance
Quasi-Experimental Designs
• One-Group Pretest-Posttest Design
• Minimal Control. There is somewhat more structure, there is
a single selected group under observation, with a careful
measurement being done before applying the experimental
treatment and then measuring after. This design has minimal
internal validity, controlling only for selection of subject and
experimental mortality. It has no external validity
O1 X O2
(pretest) (praise) (posttest)
• Using the previous study on praise and math scores we want to
ensure that there was no pre-existing characteristic among the
pre-school children, a pretest may be administered. If the children
became more attentive after praising compared to the pretest,
then you can attribute it to the practice of praising
Quasi-Experimental Designs• O1 X O2
(pretest) (praise) (posttest)
• What are the weaknesses for this design?
• 1) Maturation: If time between the pretest and posttest is
long, it is possible that the subjects may have matured
because of developmental changes.
• 2) Testing: Sometimes the period between the pretest and
the posttest is too short and there is the possibility that
subjects can remember the questions and answers
(carryover effect)
• It may not be ethical to do a RCT (e.g. tobacco use)
• Although Campbell and Stanley used the term control group
others prefer the term comparison group to emphasize the
difference between this and RCT
Quasi-Experimental Designs
• Nonequivalent Control Groups
uses a control group but it is
selected from existing
natural groups
• Example- one group is given a medicine, whereas the
control (comparison) group is given none. If different
dosages of a medicine are tested, the design can be
based around multiple groups. Such a design is limited in
scope and contains many threats to validity. It is very
poor at guarding against assignment bias since it does
not use random assignment and is also subject to
selection bias. Because it's often likely that the groups
are not equivalent, this designed was named
the nonequivalent groups design to remind us
Nonequivalent Control Group
Pretest-Posttest Design
In general, however, non-equivalent groups are usually chosen to
be as similar as possible to each other, which helps to control
extraneous variables. For example, if we are comparing
cooperative learning to standard learning classroom techniques
we probably would not use a daytime class as our cooperative
learning group and an evening class as our standard lecture
group pg228
However if we add a pretest we can improve this design. This
Nonequivalent Control Group Pretest-Posttest design gives us the
advantage of comparing the control group to the experimental
group but this is still not a true RCT as assignment to groups is
not random
Nonequivalent Control Group
Pretest-Posttest Design
• The nonequivalent control
group design still lacks
random assignment but can
be improved by matching
subjects (similar to matched
pairs designs). If we match
subjects on multiple variables and combine the scores we
produce a propensity score (propensity score matching)
• Matching attempts to mimic randomization by making the
groups receiving treatment and not-treatment more
comparable pg229
A story of Nonequivalence
• Two heart surgeons walk into a room.
• − The first surgeon says, “Man, I just finished my 100th
heart surgery!”.
− The second surgeon replies, “Oh yeah, I finished my
100th heart surgery last week. I bet I'm a better surgeon
than you. How many of your patients died within 3
months of surgery? Only 10 of my patients died.”
− First surgeon smugly responds, “Only 5 of mine died,
so I must be the better surgeon.”
− Second surgeon says, “My patients were probably
older and had a higher risk than your patients.”
Propensity Score
• In the statistical analysis of observational data,
propensity score matching (PSM) is a statistical
matching technique that attempts to estimate the
effect of a treatment, or other intervention by
accounting for the covariates that predict receiving
the treatment. PSM attempts to reduce the bias
due to confounding variables that could be found in
an estimate of the treatment effect obtained from
simply comparing outcomes among units that
received the treatment versus to those that did not.
The technique was first published by Paul Rosenbaum and Donald Rubin in
1983
2007Jan05 GCRC Research-Skills Workshop 301
Publications in Pub Med with phrase "Propensity Score"
0
20
40
60
80
100
120
140
160
180
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Numberofpublications
2007Jan05 GCRC Research-Skills Workshop 302
Propensity Score example
Consider an HIV database:
– E+: patients receiving a new antiretroviral drug (N=500). Exper. Gr.
– E-: patients not receiving the drug (N=10,000). Control gr.
– D+: mortality. Dependent variable
Need to manually measure CD4.(CD4=T-Helper Cells send signals to other types of immune
cells, including CD8 killer cells. CD4 cells send the signal and CD8 cells destroy the infectious particle
May be potential confounding by other HIV drugs as well as
other prognostic factors
• Limitations
Propensity score methods work better in larger samples to
attain distributional balance of observed covariates.
– In small studies, imbalances may be unavoidable.
Including irrelevant covariates in propensity model may reduce
efficiency; Bias may occur; Non Uniform Treatment effect
2007Jan05 GCRC Research-Skills Workshop 303
Propensity Score example
• Option 1:
– Collect blood samples from all 10,500 patients.
– Costly & impractical.
• Option 2:
– For all patients, estimate Pr(E+|other HIV drugs & prognostic
factors).
– For each E+ patient, find E- patient with closest propensity score.
– Continue until all E+ patients match with E- patient.
– Collect blood sample from 500 propensity-matched pairs.
• Panel of 7 specialists in critical care specified variables
related to decision
• age, sex, yrs of education, medical insurance, primary & secondary
disease category, admission dx
• Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus
propensity score when the number of events is low and there are multiple confounders. Am
Interrupted Time Series
• A time series is simply a
set of measurements of
a variable taken at various
points in time
• In an interrupted
time-series design, a time series (the dependent
variable) is interrupted (usually near the middle) by
the manipulation of the independent variable.
• This design uses several waves of observation before
and after the introduction of the independent
(treatment) variable X
• O1 O2 O3 O4 X O5 O6 O7 O8
Interrupted Time Series Control Series Design
• Control Series Design pg230
• The addition of a second time series for a
comparison group helps to provide a check on some
of the threats to validity of the Single Interrupted
Time Series Design(previous slide),especially history
• Group A: O1 O2 O3 O4 X O5 O6 O7 O8
Group B: O1 O2 O3 O4 - O5 O6 O7 O8
• This design is like a pretest-posttest design but with
multiple pretests and multiple posttests. The advantage
of this approach is that it provides greater confidence
that the change in the dependent variable was caused
by the manipulation and is not just a random
fluctuation.
Developmental Research Designs
• Developmental research studies how individuals change
as a function of age. Can adopt two general approaches
to studying individuals of different ages Researchers
might select groups of people who are remarkably
similar in most areas, but differ only in age
• Cross-sectional studies are designed to look at a
variable at a particular point in time. Longitudinal
studies involve taking multiple measures over an
extended period of time, while cross-sectional research
is focused on looking at variables at a specific point in
time Cross sectional designs are more common as they
cost less and provide immediate results allowing
comparisons across various groups
Developmental Research Designs
• Disadvantages of cross-sectional research. Researcher must
infer that that the
differences among age
groups are due to
development but this
variable (development)
is not directly observed
but is based on comparisons
of different cohorts of
individuals
• A cohort is a group of people who share a common
characteristic or experience within a defined period
(e.g., are born on a certain day or period, are exposed to a drug
or vaccine or pollutant, or undergo a certain medical procedure)
Cross-Sectional, Longitudinal & Sequential Studies
• Longitudinal studies are the best way to study
changes as people grow older and also the best way
to study how scores on a variable at one age are
related to another variable at a later age although
attrition (loss of subjects) from the study is often
problematic
• Sequential method combines the cross-sectional
and longitudinal methods. In a Study by Orth et al.
different age groups were formed and compared
(e.g. 25-34;35-44;45-54 etc.) (cross-sectional) but
then each person is measured a second time
(longitudinal)
Self-Esteem Development From Young Adulthood to
Old Age:A Cohort-Sequential Longitudinal Study
Orth, Trzesniewski & Robins , JPSP, 2010, Vol. 98, No. 4, 645–658
• The authors examined the development of self-esteem from young
adulthood to old age. Data came from the Americans’ Changing Lives
study, which includes 4 assessments across a 16-year period of a
nationally representative sample of 3,617 individuals aged 25 years to
104 years. Latent growth curve analyses indicated that self-esteem
follows a quadratic trajectory across the adult life span, increasing during
young and middle adulthood, reaching a peak at about age 60 years, and
then declining in old age. No cohort differences in the self-esteem
trajectory were found. Women had lower self-esteem than did men in
young adulthood, but their trajectories converged in old age. Whites and
Blacks had similar trajectories in young and middle adulthood, but the
self-esteem of Blacks declined more sharply in old age than did the self-
esteem of Whites. More educated individuals had higher self-esteem
than did less educated individuals, but their trajectories were similar.
Moreover, the results suggested that changes in socioeconomic status
and physical health account for the decline in self-esteem that occurs in
Quadratic Trajectory
Controlling for Threats to Validity pg224-227
• 1) History: did some other current event effect the change in the
dependent variable?
• 2) Maturation: were changes in the dependent variable due to normal
developmental processes?
• 3) Statistical Regression: did subjects come from very low or high
performing groups?
• 4) Selection: were the subjects self-selected or non randomly selected
into experimental and control groups, which could affect the dependent variable?
• 5) Experimental Mortality: did some subjects drop out? did this affect
the results?
• 6) Testing: Did the pre-test affect the scores on the post-test?
• 7) Instrumentation: Did the measurement method change during the
research?
• 8) Design contamination: did the control group find out about the
experimental treatment? did either group have a reason to want to make
the research succeed or fail?
Odds Ratio• In statistics, the odds ratio (usually abbreviated “OR”) is one of three main ways to
quantify how strongly the presence or absence of property A is associated with the
presence or absence of property B in a given population. If each individual in a
population either does or does not have a property “A”, (e.g. "high blood pressure”),
and also either does or does not have a property “B” (e.g. “moderate alcohol
consumption”) where both properties are appropriately defined, then a ratio can be
formed which quantitatively describes the association between the presence/absence
of "A" (high blood pressure) and the presence/absence of "B" (moderate alcohol
consumption) for individuals in the population. This ratio is the odds ratio (OR) and
can be computed following these steps:
• 1) For a given individual that has "B" compute the odds that the same individual has
"A"
• 2) For a given individual that does not have "B" compute the odds that the same
individual has "A"
• 3) Divide the odds from step 1 by the odds from step 2 to obtain the odds ratio (OR)
• If the OR is greater than 1, then having “A” is considered to be “associated” with
having “B” in the sense that the having of “B” raises (relative to not-having “B”) the
odds of having “A”. Note that this is not enough to establish that B is a contributing
cause of “A”: it could be that the association is due to a third property, “C”, which is a
contributing cause of both “A” and “B”
Understanding Research Results Chp 12
• Because experimenters must calculate the size of differences
that chance is likely to produce and compare them with the
differences they actually observe, they necessarily become
involved with probability theory and its application to
statistics
• True experiments satisfy three conditions: the experimenter
sets up two or more conditions whose effects are to be
evaluated subsequently; persons or groups of persons are
then assigned strictly at random, that is, by chance, to the
conditions; the eventual differences between the conditions
on the measure of effect (for example, the pupils'
achievement in each of two or more learning conditions) are
compared with differences of chance or random magnitude
G.Glass Arizona State University
Understanding Research Results
• Statistics used in two ways to understand and
interpret research
• 1) Statistics are used to describe data
• 2) Statistics are used to draw inferences
• Review Scales of Measurement (which have important implications
for the way data are described and analyzed)
• Nominal scales–categorical, do not imply any ordering among the responses
• Ordinal Scales-rank order the levels of a variable (category) being studied.
nothing is specified about the magnitude of the interval between the two
measures
• Interval scales -intervals have the same interpretation throughout in that the
intervals between the numbers are equal in size. However there is no
absolute zero on the scale
• Ratio scales- most informative scale. An interval scale with the additional property
that its zero position indicates the absence of the quantity being measured
Understanding Research Results
• Three basic ways to describe results of variables studied
• 1) Comparing group percentages (e.g. percent of males vs females who
like to travel)
• 2) Correlating scores of individuals on two variables
(e.g. do students sitting in the front of the class receive better grades)
• 3) Comparing group means (mean number of aggressive acts by children who
witnessed an adult model aggression compared to mean number of aggressive acts by children
who did not witness an adult model be aggressive)
• Frequency Distributions- indicate the number of
individuals who receive each possible score on a
variable (pg243) Often these distributions are graphed
Raw Data -Data collected in original form.
Frequency- The number of times a certain value or class of values occurs.
Frequency Distribution The organization of raw data in table form with classes
and frequencies
Graphing Frequency Distributions
• Pie Charts-The frequency determines the size of the slice
Graphing Frequency Distributions
• Bar Graphs-separate bar for
each piece of information
X Axis-Horizontal Y axis-vertical bar
graphs used when x-axis variable nominal
• Frequency polygons- a line used to
represent the distribution of
frequency scores line graphs used
when x-axis values numeric pg247
Graphing Frequency Distributions
• Histogram- uses bars to display a frequency
distribution. Values are continuous (versus bar
graph) with bars drawn next to each other
Descriptive Statistics
Descriptive statistics is the discipline of quantitatively
describing the main features of a collection of data or the
quantitative description itself. Descriptive statistics are
distinguished from inferential statistics (or inductive
statistics), in that descriptive statistics aim to summarize a
sample, rather than use the data to learn about the
population that the sample of data is thought to represent
Must have at least two statistics (characteristic of a
sample) to describe a data set 1) measures of central
tendency and measures of variability or dispersion.
Measures of central tendency include the mean, median
and mode, while 2) measures of variability include the
standard deviation (or variance) pg245-6
Descriptive Statistics
• The mean is an appropriate indicator of central
tendency only when scores are measured on an
interval or ratio scale because the actual values
of the numbers are used in calculating the
statistic
Common Symbols (Greek)
• Μ mu refers to a population mean; and x, to a
sample mean.
• σ sigma (lower case)refers to the standard deviation
of a population; and s, to the standard deviation of
a sample
• N is the number of elements in a population.
n is the number of elements in a sample
• Σ is the summation symbol, used to compute sums
over a range of values. Σx or Σxi refers to the sum of
a set of n observations. Thus, Σxi = Σx = x1 + x2 + . . .
Common Symbols (greek)• Letter Name
• Α α alpha
• Β β beta
• Γ γ gamma
• Δ δ delta
• Ε ε epsilon
• Ζ ζ zeta
• Θ θ theta
• Κ κ kappa
• Λ λ lambda
• Μ μ mu
• Π π pi
• Ρ ρ rho
• Letter Name
• Σ σ sigma
• Κ κ kappa
• Λ λ lambda
• Μ μ mu
• Π π pi
• Ρ ρ rho
• Σ σ sigma
• Φ φ phi
• Χ χ chi
• Ψ ψ psi
• Ω ω omega
Central Tendency and Variability (dispersion)
• A measure of central tendency is a single value that
attempts to describe a set of data by identifying the
central position within that set of data. The mean
(or average) is the most popular and well known pg245
measure of central tendency. its use is most often
with continuous data. The mean is equal to the sum
of all the values in the data set divided by the
number of values in the data set. So, if we have n
values in a data set and they have values x1, x2, ..., xn,
the sample mean, usually denoted by
Median and Mode
Table 1.
• 37 33 33 32 29 28 28 23 22 22 22 21 21 21 20
20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6
The median is the midpoint of a distribution: the same
number of scores is above the median as below it. For the
data in Table 1, there are 31 scores. The 16th highest
score (which equals 20) is the median because there are
15 scores below the 16th score and 15 scores above the
16th score. The median can also be thought of as the
50th percentile. The mode is the most frequently
occurring value. For the data in Table 1, the mode is 18
Variability (Dispersion)
• The terms variability, spread, and dispersion are
synonyms, and refer to how spread out a distribution is.
• Range- The difference between
the highest and lowest score
• Variance- Variability can also be defined in terms of
how close the scores in the distribution are to the
middle of the distribution. Using the mean as the
measure of the middle of the distribution, the pg 246
variance is defined as the average squared difference
of the scores from the mean. The standard deviation
is simply the square root of the variance
Correlation and Prediction
• Correlation refers to the degree of relationship
between two variables
• Regression-(Multiple) regression is a statistical tool
used to derive the value of a criterion from several
other independent, or predictor, variables. It is the
simultaneous combination of multiple factors to assess
how and to what extent they affect a certain outcome
(y=X1+X2+X3. . . ETC.)
• “The terms correlation, regression and predication are
so closely related in statistics that they are often used
interchangeably”- J.Roscoe
• How would you test the hypothesis that “enhanced
interrogation” results in useful intelligence What model
would you use? RCT? Correlation? Regression?
Correlation and strength of relationships
• A correlation coefficient is a statistic that describes
how strongly variables are related to one another
The most familiar correlation coefficient is the
Pearson-product-moment coefficient. Pearson's r is
a measure of the linear correlation (dependence)pg248
between two variables X and Y (does not describe
curvilinear relationships)
Correlation-Scatter Plot
• 1 is a perfect positive correlation
• 0 is no correlation (the values don't seem linked at all)
• -1 is a perfect negative correlation
The value shows how good the correlation is and if it is positive or negative
Correlation and strength of relationships
Restriction of range-One issue is that one variable or the
other is sampled over too narrow of a range. This
restriction of range, as it is called, makes the relationship
seem weaker than it is Suppose we want to know the
correlation between a test such as the SAT and freshman
GPA. We collect SAT test scores from applicants and
compute GPA at the end of the freshman year. If we use
the SAT in admissions and reject applicants with low
scores, we will have range restriction because there will
be nobody in the sample with low test scores. If
individuals in your sample are very similar you will have a
restriction of range. Trying to understand the correlates of
intelligence will be difficult if everyone in your sample is
very similar in intelligence
Effect Size
• Effect Size refers to the strength of association between variables.
The Pearson r correlation coefficient is one indicator of effect size; it
indicates the strength of the linear association between two variables
pg 252 Cozby & Bates
• The concept of effect size already appears in everyday language. For
example, a weight loss program may boast that it leads to an average
weight loss of 30 pounds. In this case, 30 pounds is the claimed effect
size. Another example is that a tutoring program may claim that it
raises school performance by one letter grade. This grade increase is
the claimed effect size of the program. These are both examples of
"absolute effect sizes", meaning that they convey the average
difference between two groups without any discussion of the
variability within the groups. For example, if the weight loss program
results in an average loss of 30 pounds, it is possible that every
participant loses exactly 30 pounds, or half the participants lose 60
pounds and half lose no weight at all
Effect Size
Effect size-a measure of the strength of a phenomenon (for example
the change in an outcome after experimental intervention).
The Pearson r correlation coefficient is one indicator of effect size;
it indicates the strength of the linear association between two
variables
Correlation coefficients indicating small effects range from.10-.20
medium effects ~.30 large effects above .40 (others say .50)
Sometimes the squared value of r is reported which transforms the
value into a percentage (this is also referred to as the percent of
shared variance between the two variables. The correlation
between gender and weight is about .70 (males weighing more
than females) squaring the value of .70 results in .49% Therefore
49% of the difference in weight between males and females is
accounted for by gender
Effect Size
• An effect size is a measure that describes the
magnitude of the difference between two groups. Effect
sizes are particularly valuable in best practices research
because they represent a standard measure by which all
outcomes can be assessed
• An effect size is typically calculated by taking the difference
in means between two groups and dividing that number by
their combined (pooled) standard deviation. Intuitively, this
tells us how many standard deviations’ difference there is
between the means of the intervention (treatment) and
comparison conditions; for example, an effect size of .25
indicates that the treatment group outperformed the
comparison group by a quarter of a standard deviation.
Effect Size continued
• An effect size of 0.33 denotes that a treatment led
to a one-third of a standard deviation improvement
in outcome. Similarly, an effect size of 0.5 denotes a
one-half of a standard deviation increase in
outcome. Because effect sizes are based upon
these mean and standard deviation scores it allows
direct comparisons across studies
• Cohen's d is an effect size used to indicate the standardized difference between two means
• http://www.uccs.edu/~lbecker/
Regression Equations
• The terms correlation, regression and prediction are so
closely related in statistics that they are often used
interchangeably-J.Roscoe Regression equations are
calculations used to predict a person’s score on one
variable when that person's score on another variable are
already known- Cozby&Bates pg 253
• Linear regression attempts to model the relationship between two
variables by fitting a linear equation to observed data. One variable is
considered to be an explanatory variable, and the other is considered
to be a dependent variable. For example, a modeler might want to
relate the weights of individuals to their heights using a linear
regression model. A linear regression line has an equation of the form
Y = a + bX, where X is the explanatory variable and Y is the
dependent variable. The slope of the line is b, and a is the intercept
(the value of y when x = 0).
https://www.youtube.com/watch?v=ocGEhiLwDVc
Linear Regression
• Intercept-the value at which the
fitted line crosses the y-axis
Multiple Correlation/Regression
• Multiple linear regression attempts to model the
relationship between two or more explanatory
variables and a response variable by fitting a linear
equation to observed data. The dependent variable is
affected by more than one independent variable
• In simple linear regression, a criterion variable (Y)is
predicted from one predictor variable(X). In multiple
regression, the criterion is predicted by two or more
variables Y=a + b1X1 + b2X2 + b3X3
• Example Y= Health rating of chosen city.
X1 = death rate per 1000 residents
X2 = doctor availability per 100,000 residents
X3 = hospital availability per 100,000 residents
X4 = annual per capita income in thousands of dollars
X5 = population density people per square mile
Multiple Correlation/Regression
• Pew Research Center survey on Happiness (Y)-Results
of multiple regression: Married people are happier
than unmarrieds. People who worship frequently are
happier than those who don’t. Republicans are happier
than Democrats. Rich people are happier than poor
people. Whites and Hispanics are happier than blacks.
Sunbelt residents are happier than those who live in
the rest of the country.
• Also found some interesting non-correlations. People
who have children are no happier than those who
don’t, after controlling for marital status. Retirees are
no happier than workers. Pet owners are no happier
than those without pets
Correlation/Regression
Path Diagrams
Simple correlation:
r = .38
R = .38
Multiple Regression:
r = .38
r = .30 R = .45
Parental
Support
Happiness
Parental
Support
Happiness
Self-esteem
Partial Correlation
• Extraneous or confounding variables are controlled
in experimental research by keeping them constant
or through randomization. This is harder to do in
non experimental research. pg256
• One technique to control for such variables in non
experimental research is to use partial correlation
• A partial correlation is a correlation between the
two variables of interest with the influence of the
third variable removed from or “partialed out of”
the original correlation –which tells you what the
correlation between the primary variables would be
if the third variable were held constant pg 256
Partial Correlation
• In simple correlation, we measure the strength of the linear
relationship between two variables, without taking into
consideration the fact that both these variables may be
influenced by a third variable.
• The calculation of the partial correlation co-efficient is
based on the simple correlation co-efficient. However,
simple correlation coefficient assumes linear relationship.
Generally this assumption is not valid especially in social
sciences, as linear relationship rarely exists in such
phenomena
• It may be of interest to know if there is any
correlation between X and Y that is NOT due
to their both being correlated with Z. To do
this you calculate a partial correlation.
Partial Correlation
• If you calculate the correlation for subjects on each of three
variables, X, Y, and Z and obtain the following
• X versus Y: rXY = +.50 r2XY = .25
• X versus Z: rXZ = +.50 r2XZ = .25
• Y versus Z: rYZ = +.50 r2YZ = .25
• For each pair of variables—XY, XZ, and YZ—
the variance overlap, is 25%
• Partial correlation is a procedure that allows us to measure the region
of three-way overlap precisely, and then to remove it from the picture
in order to determine what the correlation between any two of the
variables would be (hypothetically) if they were not each correlated
with the third variable. Alternatively, you can say that partial
correlation allows us to determine what the correlation between any
two of the variables would be (hypothetically) if the third variable
were held constant.
Partial Correlation
•
or
• rXY·Z =_____rXY—(rXZ)(rYZ)_______
sqrt[1—r2XZ] x sqrt[1—r2YZ]
• rXY·Z =___ .50—(.50)(.50)___
sqrt[1—.25] x sqrt[1—.25]
• rXY·Z =+.33 (therefore r2XY·Z = .11)
Structural Equation Modeling (SEM)
• SEMs are suited to both theory testing and theory
development. Measurement is recognized as difficult and
error-prone. Compared to regression and factor analysis,
SEM is a relatively young field, having its roots in papers
that appeared only in the late 1960s. As such, the
methodology is still developing, and even fundamental
concepts are subject to challenge and revision. This rapid
change is a source of excitement for some researchers
and a source of frustration for others.
• Researchers typically construct path diagrams to
represent the model being tested. Path Diagrams play a
fundamental role in structural modeling. Path diagrams
are like flowcharts. They show variables interconnected
with lines(arrows) that are used to indicate causal flow
Structural Equation Modeling (SEM)
• Structural equation models go beyond
ordinary regression models to incorporate
multiple independent and dependent
variables as well as hypothetical latent
constructs that clusters of observed variables might represent
http://www.youtube.com/watch?v=ZuX_QzZGjf0 start at 4’23” end at 11”30’
Structural Equation Modeling (SEM)• Interpretation of path coefficients: First of all, they are not
correlation coefficients. X and Y are converted to z-scores before
conducting a simple regression analysis
(path coefficients are regression coefficients
converted into standardized z scores).
• Interpreting path coefficients-Suppose we have a network with a
path connecting from region A to region B. The meaning of the
path coefficient (e.g., 0.81) is this: if region A increases by one
standard deviation from its mean, region B would be expected to
increase by 0.81 its own standard deviations from its own mean
while holding all other relevant regional connections constant.
With a path coefficient of -0.16, when region A increases by one
standard deviation from its mean, region B would be expected to
decrease by 0.16 its own standard deviations from its own mean
while holding all other relevant regional connections constant
• One of the nice things about SPSS is that it will allow you to start with
a correlation matrix (you don’t need the raw data)
Score Transformations-A score has meaning
only as it is related to other scores
Feet Inches
5.00
6.25
5.50
5.75
60
75
66
69
• Often it is necessary to transform data from one
measurement scale to another. For example
you might want to convert height measured in
in inches. The table shows the heights of four people measured in
both feet and inches. To transform feet to inches, you simply
multiply by 12. (Similarly, to transform inches to feet, you divide by 12)
Some conversions require that you multiply by a number and then add a
second number. A good example of this is the transformation between
degrees Centigrade and degrees Fahrenheit. The table below converts F to
C temperatures of 4 US cities Houston 54 12.22
Chicago 37 2.78
The formula to transform Minneapolis 31 -0.56
Centigrade to Fahrenheit Miami 78 25.56
Score Transformations
The figure below shows a plot of degrees Centigrade as a
function of degrees Fahrenheit. Notice that the points form a
straight line. Such transformations are therefore called linear
transformations Many transformations are not linear. With
nonlinear transformations, the
points in a plot of the transformed
variable against the original variable
would not fall on a straight line.
Examples of nonlinear transformations
are: square root, raising to a power, or
logarithm. Question- Transforming
distance in miles into distance in feet is
a linear transformation. True or False
This is a linear transformation because you multiply the distance in miles by 5,280
feet/mile
Linear vs Nonlinear Score Transformations
• Transforming a variable involves using a mathematical operation to
change its measurement scale.
• Linear transformation. A linear transformation preserves linear
relationships between variables. Therefore, the correlation
between x and y would be unchanged after a linear
transformation. Examples of a linear transformation to variable x
would be multiplying x by a constant, dividing x by a constant, or
adding a constant to x.
• Nonlinear transformation. A nonlinear transformation changes
(increases or decreases) linear relationships between variables
and, thus, changes the correlation between variables. Examples of
a nonlinear transformation of variable x would be taking the
square root of x or the reciprocal of x. A logarithmic scale is a scale
of measurement that displays the value of a physical quantity
using intervals corresponding to orders of magnitude, rather than
a standard linear scale
Linear vs Nonlinear Score Transformations
The Richter magnitude scale (often shortened to Richter
scale) was developed to assign a single number to quantify
the energy that is released during an earthquake.
The scale is a base-10 logarithmic scale. An earthquake that
measures 5.0 on the Richter
scale has a shaking amplitude
10 times larger than one that
measures 4.0, and
corresponds to a 31.6 times
larger release of energy
http://www.matter.org.uk/schools/Content/S
eismology/richterscale.html
Linear vs Nonlinear Score Transformations
• Transforming scores from raw scores into transformed
scores has two purposes: 1) It gives meaning to the
scores and allows some kind of interpretation of the
scores, 2) It allows direct comparison of two scores
• Linear transformation-As one side changes the other
changes in equal proportions. Converting the score into
percentile ranks is one way of transforming scores The
scale of the percentile rank is a non-linear
transformation of that of the raw score, meaning that
at different regions on the raw score scale, a gain of 1
point may not correspond to a gain of one unit or the
same magnitude on the percentile rank scale
Percentile Rank Transformation
• PR=100/N (cf-f/2) ; PR of 17=100/150 (64-21/2)=36
Linear Score Transformations
• By itself, a raw score or X value provides very little
information about how that particular score compares
with other values in the distribution.
• A score of X = 53, for example, may be a relatively low
score, or an average score, or an extremely high score
depending on the mean and standard deviation for the
distribution from which the score was obtained.
• If the raw score is transformed into a z-score, however,
the value of the z-score tells exactly where the score is
located relative to all the other scores in the
distribution. The formula for computing the z-score for
any value of X is z = X – μ
σ
Linear Score Transformations-Z Scores
• z = 0 is in the center (at the mean), and the extreme
tails correspond to z-scores of approximately –2.00
on the left and +2.00 on the right.
• Although more extreme z-score values are possible,
most of the
distribution is
contained between
z = –2.00 and
z = +2.00.
• M=0,SD=1
357
z-Scores as a Standardized Distribution
The advantage of standardizing distributions is that
two (or more) different distributions can be made the
same.
– For example, one distribution has μ = 100 and σ = 10, and
another distribution has μ = 40 and σ = 6.
– When these distribution are transformed to z-scores, both
will have μ = 0 and σ = 1.
– A z-score of +1.00
specifies the same
location in all
z-score distributions.
Understanding Research Results
Statistical Inference Chp 13
• Inferential statistics allow researchers to assess
1) how their results reflect the larger population
(Do the differences observed in the sample means
reflect the difference in the population means?)and
2) the likelihood that their results are repeatable
(replicable)
• Even in establishing the equivalence between
groups (via controlling certain variables and
randomization) the difference between the sample
means is almost never zero (equivalence is not
perfect)
Statistical Inference
• In using statistical inference we begin with a null
and a research hypothesis
• Null hypothesis H0 - there is no relationship
between two measured phenomena (it is assumed
true until evidence indicates otherwise) H0: μ1 = μ2
• Research or alternative hypothesis H1 μ1 μ2
can be just the negation of the null hypothesis
• If we can determine that the null hypothesis is
incorrect then we can accept the alternate
(research) hypothesis which is that the independent
variable did have an effect on the dependent
variable
Statistical significance, probability and
sampling distributions
• A significant result is one that has a very low
probability of occurring by chance if the population
means are equal
• Using probability theory and the normal curve, we can
estimate the probability of being wrong
• Probability is the likelihood of the occurrence of some
event. The probability required for significance is called
the alpha level with the most common alpha
probability used being set at .05 (the outcome of the
study is considered significant when there is a probability
of .05 or less that the results were due to chance-statistical
significance is based on probability distributions)
Statistical significance, probability
and sampling distributions
• The Sampling distribution is the probability
distribution of a given statistic based on a random
sample
• The more observations sampled the more likely you
are to obtain an accurate estimate of the true
population value
• http://onlinestatbook.com/stat_sim/sampling_dist/
Statistical Tests t-Test and F test
• The t-distribution is a family of
continuous probability distributions that arise when
estimating the mean of a normally
distributed population in situations where
the sample size is small and population standard
deviation is unknown
• t-Test assumes continuous data
(interval or ratio)
Statistical Tests t-Test and F test
• The t-value is calculated using the
formula as shown; t-value equals
the mean difference divided by the
difference in standard deviations
• Degrees of freedom-The number of degrees of freedom is
equal to the number of observations minus the number of
algebraically independent linear restrictions placed on them
• In an array of four scores 2,3,5,and 6 and knowing the mean
(M=4) only the first three scores are free to vary while the
last score drawn is not free to vary. Therefore df=3 (df=n-1)
• http://web.mst.edu/~psyworld/texample.htm best
• http://faculty.clintoncc.suny.edu/faculty/michael.gregory/fil
es/shared%20files/Statistics/Examples_t_Test.htm Use # 3
Statistical Tests t-Test and F test
• One-tailed versus two-tailed tests- If the test statistic is
always positive (or zero), only the one-tailed test is
generally applicable, while if the test statistic can assume
positive and negative values, both the one-tailed and two-
tailed test are of use-if you are hypothesizing a difference
but not predicting direction then it will be a two tailed test
• An example of when one would want to use a two-tailed
test is at a candy production/packaging plant. Let's say the
candy plant wants to make sure that the number of
candies per bag is around 50. The factory is willing to
accept between 45 and 55 candies per bag. It would be too
costly to have someone check every bag, so the factory
selects random samples of the bags, and tests whether the
average number of candies exceeds 55 or is less than 45
Example of t-Test
• Hypothesis-people who are allowed to sleep for only four
hours will score significantly lower than people who are
allowed to sleep for eight hours on a cognitive skills test.
Sixteen subjects are recruited in the sleep lab and randomly
assigned to one of two groups. In one group subjects sleep
for eight hours and in the other group subjects sleep for four
and all are given a cognitive test the next day
groups Scores
• df=n-1+n-1=14
8 hours sleep
group (X)
5 7 5 3 5 3 3 9
4 hours sleep
group (Y)
8 1 4 6 6 4 1 2
• Mx=5 My=4
α (1 tail) 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
α (2 tail) 0.1 0.05 0.02 0.01 0.005 0.002 0.001
df
1 6.3138 12.7065 31.8193 63.6551 127.3447 318.4930 636.0450
2 2.9200 4.3026 6.9646 9.9247 14.0887 22.3276 31.5989
3 2.3534 3.1824 4.5407 5.8408 7.4534 10.2145 12.9242
4 2.1319 2.7764 3.7470 4.6041 5.5976 7.1732 8.6103
5 2.0150 2.5706 3.3650 4.0322 4.7734 5.8934 6.8688
6 1.9432 2.4469 3.1426 3.7074 4.3168 5.2076 5.9589
7 1.8946 2.3646 2.9980 3.4995 4.0294 4.7852 5.4079
8 1.8595 2.3060 2.8965 3.3554 3.8325 4.5008 5.0414
9 1.8331 2.2621 2.8214 3.2498 3.6896 4.2969 4.7809
10 1.8124 2.2282 2.7638 3.1693 3.5814 4.1437 4.5869
11 1.7959 2.2010 2.7181 3.1058 3.4966 4.0247 4.4369
12 1.7823 2.1788 2.6810 3.0545 3.4284 3.9296 4.3178
13 1.7709 2.1604 2.6503 3.0123 3.3725 3.8520 4.2208
14 1.7613 2.1448 2.6245 2.9768 3.3257 3.7874 4.1404
15 1.7530 2.1314 2.6025 2.9467 3.2860 3.7328 4.0728
16 1.7459 2.1199 2.5835 2.9208 3.2520 3.6861 4.0150
17 1.7396 2.1098 2.5669 2.8983 3.2224 3.6458 3.9651
18 1.7341 2.1009 2.5524 2.8784 3.1966 3.6105 3.9216
19 1.7291 2.0930 2.5395 2.8609 3.1737 3.5794 3.8834
20 1.7247 2.0860 2.5280 2.8454 3.1534 3.5518 3.8495
21 1.7207 2.0796 2.5176 2.8314 3.1352 3.5272 3.8193
Statistical Tests t-Test and F test
• The F test is an extension of the t test. If a study has only one
independent variable with two groups then F and t are
basically identical. With more than two levels of an independ.
variable and when there are two or more independent
variables in a factorial design. Similar to the t, the larger the F
ratio the more likely it is that the results are significant
• The F-test is designed to test if two population variances are
equal. It does this by comparing the ratio of two variances
(Analysis of Variance-ANOVA) Each Mean Square = SS/df
•
• http://www.chem.utoronto.ca/coursenotes/analsci/StatsTuto
rial/ftest.html
Zebras Taking Flight
• A z-test is used for testing the mean of a population
versus a standard, or comparing the means of two
populations, with large (n ≥ 30) samples whether you
know the population standard deviation or not
• A t-test is used for testing the mean of one population
against a standard or comparing the means of two
populations
• An F-test is used to compare 2 populations’ variances.
The samples can be any size. It is the basis of ANOVA.
The F-test is designed to test if two population
variances are equal This is the F-test, and plays an
important role in the analysis of variance
Chi-square test
• The Chi-square test is intended
to test how likely it is that an
observed distribution is due to chance
• The "t" test and the F test are called parametric tests. They
assume certain conditions about the parameters of the
population from which the samples are drawn(assume
interval or ratio data).
• Parametric and nonparametric statistical procedures test
hypotheses involving different assumptions
• Parametric statistics test hypotheses based on the
assumption that the samples come from populations that
are normally distributed. Nonparametric tests make fewer and
less stringent assumptions than their parametric counterparts.
Nonparametric tests usually result in loss of efficiency
Chi-Square example
• Suppose that the ratio of male to female students in the
Science Faculty is exactly 1:1, but in the Pharmacology
Honors class over the past ten years there have been 80
females and 40 males. Is this a significant departure from
expectation? Now we must compare our X2 value with a(chi
squared) value in the X2 table with n-1 degrees of freedom
(where n is the number of categories, i.e. 2 in our case -
males and females) If our calculated value of X2 exceeds the
critical value of then we have a significant difference
Female Male Total
Observed numbers
(O)
80 40 120
Expected numbers
(E)
60*3 60*3 120 *1
O - E 20 -20 0 *2
(O-E)2 400 400
(O-E)2 / E 6.67 6.67 13.34 = X2
Freedom
Probability, p
0.99 0.95 0.05 0.01 0.001
1 0.000 0.004 3.84 6.64 10.83
2 0.020 0.103 5.99 9.21 13.82
3 0.115 0.352 7.82 11.35 16.27
4 0.297 0.711 9.49 13.28 18.47
5 0.554 1.145 11.07 15.09 20.52
6 0.872 1.635 12.59 16.81 22.46
7 1.239 2.167 14.07 18.48 24.32
8 1.646 2.733 15.51 20.09 26.13
9 2.088 3.325 16.92 21.67 27.88
10 2.558 3.940 18.31 23.21 29.59
11 3.05 4.58 19.68 24.73 31.26
12 3.57 5.23 21.03 26.22 32.91
13 4.11 5.89 22.36 27.69 34.53
14 4.66 6.57 23.69 29.14 36.12
15 5.23 7.26 25.00 30.58 37.70
16 5.81 7.96 26.30 32.00 39.25
Statistical Significance• The goal of a test is to allow you to make a decision about
your results. Significance levels show you how likely a result is
due to chance. The most common level, used to mean
something is good enough to be believed, is .95 (.05)This
means that the finding has a 95% v
chance of being true. When you
have a large sample size, very
small differences will be detected as significant (.05 is the
traditional level chosen).
• The more analyses you perform on a data set, the more results
will meet "by chance" the conventional significance level. For
example, if you calculate many correlations between different
variables then you should expect to find by chance that one in
every 20 correlation coefficients are significant at the p .05 level,
even if the values of the variables were totally random and those
variables do not correlate in the population
Type I and Type II Errors
• The decision to reject the null hypothesis is based on
probabilities rather than certainties. In reviewing the decision
matrix below there are two possible decisions (reject or
accept the null Hypothesis) and two possible truths (the null
hypothesis is true or false). There are also two correct
decisions (correctly accepting the H0 when
it is true and correctly rejecting the H0 when
it is false) and two errors
• Type I error-we reject the H0 when it is true and
Type II error we accept the H0 when it is false
• Decision
matrix
Type I and Type II Errors
• A test's probability of making a type I error is denoted by α. A test'
probability of making a type II error is denoted by β
• Type I errors occur when we obtain a large value (t or F) by chance
and we incorrectly decide that the Ind.Var. had an effect when the
significance level set to reject the H0 is .05 then the probability of a
Type I error is .05 (α). The rate of the type II error is denoted by the
Greek letter β (beta) and related to the power of a test (which
equals 1−β). The power of a
statistical test is the probability
that it correctly rejects the null
hypothesis when the null
hypothesis is false (i.e. the probability
of not committing a Type II error).
-blood tests for a disease will falsely
detect the disease in some proportion of
people who don't have it, and will fail to detect the disease in some proportion of
people who do have it
Type I and II errors
• If a jury in a criminal trial must decide guilt or innocence the example of
error type remains same pg 274-5 H0 = person is innocent
• H0 true H0 false
• Reject H0
Accept H0
• Type I error reject the null when it is true- We may obtain a large t of F value by
chance Type I error is determined by choice of significance level (α) With .05 α
then 5 out of 100 times (1 out of 20) may make mistake. Can change α to .01 to
lessen error. Type II error occurs when we accept the null but the null is incorrect.
Probability of Type II is β and is low. If we lower the significance level (e.g. .001)
makes it more difficult to reject the null hypothesis decreasing chances of Type II
error but it increases chances of Type I error (Use decision grid for marriage- Which error is worse?
Guilty
Type 1 error
α
Guilty
Correct
decision 1-β
Innocent
Correct
decision 1-α
Innocent
Type II error
β
Choosing a Significance Level
• Researchers traditionally use either a .05 or a .01
significance level For a juror which type error is more
serious? Type I or Type II; for a physician Type I or Type II?
• false positive false negative false positive false negative
Found guilty
(incorrect)
Type I
Found guilty
(correct)
Found innocent
(correct)
Found innocent
(incorrect)
Type II
H0 = not
guilty
Operate
incorrectly
Type I error
Operate correctly
Don’t operate
correctly
Don’t operate
incorrectly
Type II error
H0 = no
operation
needed
Significance
• Research is designed to demonstrate that there is a
relationship between variables not to say that the variables
are unrelated (i.e. accepting the null Hypothesis)
• A study may come up with nonsignificant results when
there is an effect (type II error) due to inadequate
explanation to subjects, a weak manipulation or a measure
of the dependent variable that is not reliable etc. (see threats to
validity) A meaningful result is more likely to be over looked
when the significance level is very low(.001) Type II error pg 278
• Type II errors may result from too small sample sizes and
effect sizes. However while nonsignificant results do not
necessarily indicate that the null hypothesis is correct,
significant results do not necessarily indicate a meaningful
relationship. As your sample size increases, so does the
Long-term psychosocial consequences of false-positive
screening mammography Brodersen J & Siersma VD, Ann Fam Med. 2013 Mar-Apr;11(2):106-15
• PURPOSE: Cancer screening programs have the potential of intended beneficial effects, but
they also inevitably have unintended harmful effects. In the case of screening mammography,
the most frequent harm is a false-positive result. Prior efforts to measure their psychosocial
consequences have been limited by short-term follow-up, the use of generic survey
instruments, and the lack of a relevant benchmark-women with breast cancer.
• METHODS: In this cohort study with a 3-year follow-up, we recruited 454 women with
abnormal findings in screening mammography over a 1-year period. For each woman with an
abnormal finding on a screening mammogram (false and true positives), we recruited another
2 women with normal screening results who were screened the same day at the same clinic.
These participants were asked to complete the Consequences of Screening in Breast Cancer-a
validated questionnaire encompassing 12 psychosocial outcomes-at baseline, 1, 6, 18, and 36
months.
• RESULTS: Six months after final diagnosis, women with false-positive findings reported
changes in existential values and inner calmness as great as those reported by women with a
diagnosis of breast cancer (Δ = 1.15; P = .015; and Δ = 0.13; P = .423, respectively). Three years after being
declared free of cancer, women with false-positive results consistently reported greater
negative psychosocial consequences compared with women who had normal findings in all 12
psychosocial outcomes (Δ >0 for 12 of 12 outcomes; P <.01 for 4 of 12 outcomes)
• CONCLUSION: False-positive findings on screening mammography causes long-term
psychosocial harm: 3 years after a false-positive finding, women experience psychosocial
consequences that range between those experienced by women with a normal mammogram
Choosing a sample size: Power analysis
• We can select a sample size on the basis of desired
probability of correctly rejecting the null hypothesis This
probability is called the power of the statistical test
Power =1-p(Type II error)
• Power refers to the probability that your test will find a
statistically significant difference when such a difference
actually exists. In other words, power is the probability that
you will reject the null hypothesis when you should (and
thus avoid a Type II error). It is generally accepted that
power should be .8 or greater; that is, you should have an
80% or greater chance of finding a statistically significant
difference when there is one
• http://meera.snre.umich.edu/plan-an-evaluation/related-topics/power-analysis-
statistical-significance-effect-size
• http://www.surveysystem.com/sscalc.htm#one
Replications
• Scientists do not attach too much importance to the results of a
single study. Better understanding comes from integrating the results
of numerous studies of the same variable(s) pg280
• Replicating Milgram-Would People Still Obey Today?Jerry M. Burger Santa Clara University
• Seventy adults participated in a replication of Milgram’s Experiment 5
up to the point at which they first heard the learner’s verbal protest
(150 volts). Because 79% of Milgram’s participants who went past this
point continued to the end of the shock generator’s range, reasonable
estimates could be made about what the present participants would
have done if allowed to continue.
• Obedience rates in the 2006 replication were only slightly lower than
those Milgram found 45 years earlier. Contrary to expectation,
participants who saw a confederate refuse the experimenter’s
instructions obeyed as often as those who saw no model. Men and
women did not differ in their rates of obedience, but there was some
evidence that individual differences in empathic concern and desire for
control affected participants’ responses.
Replicating Milgram
• 79% of the people who continued past 150 volts (26 of 33)
went all the way to the end of the shock generator’s range. In
short, the 150-volt switch is something of a point of no
return. Nearly four out of five participants who followed the
experimenter’s instructions at this point continued up the
shock generator’s range all the way to 450 volts. This
observation suggests a solution to the ethical concerns about
replicating Milgram’s research. Knowing how people respond
up to and including the 150-volt point in the procedure allows
one to make a reasonable estimate of what they would do if
allowed to continue to the end. Stopping the study within
seconds after participants decide what to do at this juncture
would also avoid exposing them to the intense stress
Milgram’s participants often experienced in the subsequent
parts of the procedure.
Replicating Milgram
• Burger screened out any potential subjects who had taken more
than two psychology courses in college or who indicated
familiarity with Milgram’s research. A clinical psychologist also
interviewed potential subjects and eliminated anyone who
might have a negative reaction to the study procedure.
• In Burger’s study, participants were told at least three times that
they could withdraw from the study at any time and still receive
the $50 payment. Also, these participants were given a lower-
voltage sample shock to show the generator was real – 15 volts,
as compared to 45 volts administered by Milgram.
• Several of the psychologists writing in the same issue of
American Psychologist questioned whether Burger’s study is
truly comparable to Milgram’s, although they acknowledge its
usefulness.
Computer Analysis of Data
• Most analysis is carried out via computer programs
such as SPSS, SAS,SYSTAT and others although the
general procedure are very similar in all of the
programs
Selecting the appropriate Statistical Test
• Parametric statistical procedures rely on assumptions about the shape
of the distribution (i.e., assume a normal distribution) in the
underlying population and about the form or parameters (i.e., means
and standard deviations) of the assumed distribution. Nonparametric
statistical procedures rely on no or few assumptions about the shape
or parameters of the population distribution from which the sample
was drawn
• http://www.ats.ucla.edu/stat/mult_pkg/whatstat/choosestat.html
Parametric vs. Nonparametric tests
• Parametric and nonparametric are two broad classifications of statistical
procedures.
• Parametric tests are based on assumptions about the distribution of the
underlying population from which the sample was taken. The most common
parametric assumption is that data are approximately normally distributed.
• Nonparametric tests do not rely on assumptions about the shape or
parameters of the underlying population distribution. If the data deviate
strongly from the assumptions of a parametric procedure, using the
parametric procedure could lead to incorrect conclusions. If you determine
that the assumptions of the parametric procedure are not valid, use an
analogous nonparametric procedure instead.
• The parametric assumption of normality is particularly worrisome for small
sample sizes (n < 30). Nonparametric tests are often a good option for these
data.
• Nonparametric procedures generally have less power for the same sample
size than the corresponding parametric procedure if the data truly are
normal. Interpretation of nonparametric procedures can also be more difficult
than for parametric procedures.
Review of Scales of Measurement• A categorical variable, also called a nominal variable, is for mutual exclusive,
but not ordered, categories. For example, your study might compare five
different genotypes. You can code the five genotypes with numbers if you want,
but the order is arbitrary and any calculations (for example, computing an
average) would be meaningless.
• A ordinal variable, is one where the order matters but not the difference
between values. For example, you might ask patients to express the amount of
pain they are feeling on a scale of 1 to 10. A score of 7 means more pain that a
score of 5, and that is more than a score of 3. But the difference between the 7
and the 5 may not be the same as that between 5 and 3. The values simply
express an order.
• A interval variable is a measurement where the difference between two values
is meaningful. The difference between a temperature of 100 degrees and 90
degrees is the same difference as between 90 degrees and 80 degrees.
• A ratio variable, has all the properties of an interval variable, and also has a
clear definition of 0.0. When the variable equals 0.0, there is none of that
variable. Variables like height, weight, enzyme activity are ratio variables.
Temperature, expressed in F or C, is not a ratio variable. A temperature of 0.0
on either of those scales does not mean 'no temperature'. A temperature of 100
degrees C is not twice as hot as 50 degrees C, because temperature C is not a ratio
variable. A pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable
Nonparametric vs Parametric Tests
• Nonparametric statistical tests
• Nonparametric statistical tests are used instead of the parametric tests we have considered
thus far (e.g. t-test; F-test), when:
• The data are nominal or ordinal (rather than interval or ratio).
• The data are not normally distributed, or have heterogeneous variance (despite being interval
or ratio).
• The following are some common nonparametric tests: Chi square:
• 1. used to analyze nominal data
• 2. compares observed frequencies to frequencies that would be expected under the null
hypothesis
• Mann-Whitney U
• 1. compares two independent groups on a DV measure with rank-ordered (ordinal) data
• 2. nonparametric equivalent to a t-test
• Wilcoxon matched-pairs test
• 1. used to compare two correlated groups on a DV measured with rank-ordered (ordinal) data
• 2. nonparametric equivalent to a t-test for correlated samples
• Kruskal-Wallis test
• 1. used to compare two or more independent groups on a DV.with rank-ordered (ordinal) data
• 2. nonparametric alternative to one-way ANOVA
Generalizing Results Chp14
External Validity is the extent to which findings may be generalized
Even though a researcher randomly assigns participants to
experimental conditions rarely are those subjects randomly selected
from the general population; subjects are selected because they are
available (e.g college freshmen and sophomores who must fulfill
course requirements)
Such subjects represent a very restricted population and as they are
often older adolescents usually have a developing sense of identity,
social and political attitudes that are also developing and a high need
for peer approval
These student/subjects are rather homogenous as a group but
different from older adults. We know about general principles of
psychological functioning may be limited to a select and unusual group
Although the use of rats is convenient many research findings have
been applied to humans particularly in the fields of memory, sexuality,
drugs, brain function etc.
Generalizing Research results• While college students represent a ready group of volunteers those
researchers using different populations are even more dependent on
volunteers than university researchers. Volunteers may be a unique
population
• However college student populations are increasingly diverse and
representative of society. Studies with certain college populations are
replicated at other colleges using different mixes of students and
many studies are later replicated with other populations
• Rosenthal and Rosnow (1975) stated that volunteers tend to be more
highly educated, higher SES and more social
• Different kinds of people volunteer for different kinds of experiments.
Titles of experiments may change who volunteers (e.g. “problem
solving vs. interaction in small groups”) pg 289
• Internet surveys also solicit volunteers. Those individuals who use the
internet more frequently. Higher internet use is associated with living in
an urban area, being younger and college educated, and having a higher
Gender and subgroups
• A study published in July 2006 in Genome Research
compared the levels of gene expression in male and
female mice and found that 72 percent of active genes
in the liver, 68 percent of those in fat, 55.4 percent of
the ones in muscle, and 13.6 percent of genes in the
brain were expressed in different amounts in the sexes.
• In an analysis of 163 new drug applications submitted
to the Food and Drug Administration between 1995
and 2000 that included a sex analysis, drug
concentrations in blood and tissues from men and
women in 11 of the drugs varied by as much as 40
percent. However, the applications included no sex-
based dosing recommendations. Source Melinda Wenner Moyer Slate Magazine
Gender and subgroupsNature 465, 665 (10 June 2010) éditorial
• Admittedly, there can be legitimate reasons to skew the ratios. For
instance, researchers may use male models to minimize the
variability due to the estrous cycle, or because males allow them to
study the Y chromosome as well as the X. And in studies of
conditions such as heart disease, from which female mice are
thought to be somewhat protected by their hormones, scientists
may choose to concentrate on male mice to maximize the outcome
under study
• However justifiable these imbalances may be on a case-by-case
basis, their cumulative effect is pernicious: medicine as it is currently
applied to women is less evidence-based than that being applied to
men Moreover, hormones made by the ovaries are known to
influence symptoms in human diseases ranging from multiple
sclerosis to epilepsy. apart from a few large, all-female projects, such
as the Women's Health Study on how aspirin and vitamin E affect
cardiovascular disease and cancer, women subjects remain seriously
Gender and subgroups
• Journals can insist that authors document the sex of
animals in published papers — the Nature journals are at
present considering whether to require the inclusion of
such information. Funding agencies should demand that
researchers justify sex inequities in grant proposals and,
other factors being equal, should favor studies that are
more equitable.
• Drug regulators should ensure that physicians and the
public alike are aware of sex-based differences in drug
reactions and dosages. And medical-school accrediting
bodies should impress on their member institutions the
importance of training twenty-first-century physicians in
how disease symptoms and drug responses can differ by
sex.
Hypothetical study on aggression and
crowding for males and females pg291
• A B
• High high
• Aggression males
Aggression males
females females
low low
low crowding high low crowding high
C D
high high
males
Aggression males Aggression
females
females
low crowding high low high
Figure A males and females essentially equal no interaction Figure B main effect for crowding but also for gender
Figure C Interaction between males and crowding no effect for females Figure D interaction Positive relationship for males and
crowding with a negative relationship for females and crowding C and D results for males cannot be generalized to females
Cultural Considerations
Arnett et al (2008) state that psychology is built on the study
of WEIRD (Western, Educated, Industrialized, Rich,
Democratic) people pg293
Traditional theories of self concept are built upon western
concepts of the self as separate or individualistic while in
some other cultures self-esteem is derived more from the
relationships to others
“Asian-Americans are more likely to benefit from support that
does not involve the sort of intense disclosure of personal
stressful events and feeling that is the hallmark of support in
many European American groups”pg 293
However many studies find similarities across cultures
Generalizing from Laboratory Settings
• Laboratory research has the advantage of studying
the effect of an independent variable under highly
controlled conditions but does the ‘artificiality’ of
the laboratory limit its external validity?
• Anderson, Lindsay and Bushman (1999) compared
38 pairs of studies for which there were similar
laboratory and field studies on areas including
aggression, helping memory and depression and
found that the effect size of the independent
variable on the dependent variable was very similar
in the two types of studies (which raises the
confidence in the external validity of the studies) pg296
Replications
• Replications are a way of compensating for limitations in
generalizing from any single study
• An exact replication is an attempt to precisely follow the
procedures of a study to determine if the same results will be
obtained. An exact replication may be followed when a researcher
is attempting to build on a previous study and wants to be
confident in the external validity of the study to proceed with
his/her own follow-up
• Review the findings of the “Mozart Effect” in which students who
listened to 10 minutes of a Mozart Sonata showed a higher
performance on a spatial reasoning task (S-B-IQ scale) (Rauscher,
Shaw and Ky,1993) which resulted in many failures to replicate the
original result. An alternative explanation may be that the effect is
limited to music that also increases arousal or that the original
study made a type I error (Incorrect rejection of the null
hypothesis) or that results occur only under special conditions pg297
Conceptual Replications
• In a conceptual replication researchers attempt to
understand the relationships between variables
• One way this is accomplished is to redefine
the operationalized definition of a variable.
While the original definition of exposure
to music was defined as a 10 minutes of
the Mozart Sonata for two pianos in
D minor a new operationalized definitions may include a
different selection of Mozart or a different composer
• When conceptual replications produces similar results
this increases our confidence in the external validity of
the original findings and demonstrates that the
relationship between the theoretical variables holds
Generalizations Literature Reviews and Meta-Analyses
• You can evaluate the external validity of a study by conducting a
literature review which summarizes and evaluates a particular
research area. The literature review synthesizes and provides
information which
• 1) summarizes what has been found to date 2) tells the reader
what findings are strongly supported or not in the literature
3) points out inconsistencies in the findings and
4) discusses future direction for this area of research
• Meta-analysis-gives a thorough summary of several studies that
have been done on the same topic, and provides the reader with
extensive information on whether an effect exists and what size
that effect has. The analysis combines the results of a number of
studies (e.g. by use of effect size) Traditional reviews do not
usually calculate effect sizes or attempt to integrate information
from different experimental designs used across studies cited but
is a more qualitative approach while a meta-analysis is a more
quantitative approach pg299
Generalization and Variation
• Variations in the service quality of medical practices
Ly DP & Glied SA Am J Manag Care 2013 Nov 1;19(11)
• There was substantial variation in the service quality of physician
visits across the country. For example, in 2003, the average wait
time to see a doctor was 16 minutes in Milwaukee but more
than 41 minutes in Miami; the average appointment lag for a
sick visit in 2003 was 1.2 days in west-central Alabama but
almost 6 days in Northwestern Washington. Service quality was
not associated with the primary care physician-to-population
ratio and had varying associations with the organization of
practices. CONCLUSIONS:
• Cross-site variation in service quality of care in primary care has
been large, persistent, and associated with the organization of
practices. Areas with higher primary care physician-to-population
ratios had longer, not shorter, appointment lags.
Regional Differences in Prescribing Quality Among Elder Veterans
and the Impact of Rural Residence Brian C. Lund Journal of Rural Health 29 (2013) 172–179
Regional variation often reflects discrepancies in the implementation
of best practices, and comparisons of high versus low performing
sites may identify mechanisms for improving performance. A recent
analysis of national Medicare data revealed significant regional
variation, with the highest concentration of potentially inappropriate
prescribing found in the Southern United States and the lowest rates
in the Northeast and upper Midwest.22 Similar geographic
distributions of prescribing quality have been previously reported
among older adults in both outpatient and inpatient settings. The
most direct interpretation of these findings are differences in
provider-level characteristics, where different approaches to
pharmacotherapy lead to patients in low performing regions being
exposed to riskier medication regimens. However, prescribing is also
influenced by system-level factors such as differences in health
system organization, access to prescription drug benefits, and higher
copayments for newer (and potentially safer) medications.
"Real World" Atypical Antipsychotic Prescribing Practices in Public
Child andAdolescent Inpatient Settings Elizabeth Pappadopulos, et al. Schizophrenia
Bulletin, Vol. 28, No. 1, 2002
• The widespread use of atypical antipsychotics for youth treated
in inpatient settings has been the focus of increasing attention,
concern, and controversy. Atypical antipsychotic medications
have supplanted traditional neuroleptics as first line treatments
for schizophrenia and other psychotic disorders in adult
populations. A similar trend has also been observed in the
treatment of child and adolescent psychiatric patients, although
data on the safety and efficacy of atypical agents in youth are
scarce
• Among child and adolescent inpatients, atypical antipsychotics
are mainly prescribed for aggression rather than for psychosis.
Current debates revolve around whether these agents are
appropriately monitored and managed. In an effort to address
these concerns, a survey was developed and administered to
physicians at four facilities and to a group of 43 expert clinicians
and researchers.
"Real World" Atypical Antipsychotic
Prescribing Practices in Public Child and
Adolescent Inpatient Settings
• Taken together, these studies show that as many as 98
percent of children and adolescents in psychiatric hospitals
are treated with psychotropic medications during their
inpatient stay and approximately 45 percent to 85 percent of
these patients receive multiple medications simultaneously.
Antipsychotics are the most commonly prescribed agents
across most inpatient settings for the treatment of aggression
• While overall rates of psychotropic prescribing (ranging from
68% to 79% of patients) did not differ across inpatient units,
preferences for particular classes of medications varied by
facility. In addition, a higher percentage of patients were
given antipsychotics in the county-university hospital (74%)
than in the State hospital (57%) or the private hospital (35%).
While these trends may be due to differences in the patient
populations treated at each facility, Kaplan and Busner note
that the use of antipsychotics for nonpsychotic disorders was
statistically equivalent across settings.
"Real World" Atypical Antipsychotic Prescribing Practices in
Public Child and Adolescent Inpatient Settings
• Atypical antipsychotics represent a major advance in the
treatment of schizophrenia and psychosis among adults
because of their superior efficacy and side effect profile in
comparison to conventional antipsychotics. However, because
these benefits have not been reliably established in children
(Sikich 2001), antipsychotic prescribing practices for child and
adolescent psychiatric inpatients have largely developed from
clinical experience rather than from scientific evidence.
• A recent literature review shows that published data on
treatments for aggression are primarily from open studies and
case reports. Much of the research conducted involves
aggressive youth with compromised intelligence and are not
easily applied to the general population of youngsters with
aggressive behavior problems.
"Real World" Atypical Antipsychotic Prescribing Practices
in Public Child and Adolescent Inpatient Settings
• Concerns about side effects, such as weight gain, elevated
prolactin levels, and abnormal electrocardiograms, especially
in children, have yet to be resolved by research. In the face of
limited data from clinical trials, intensive study is needed on
factors that influence physicians' antipsychotic prescribing
preferences and that result in unnecessary treatment
variability.
• Taken together, the audit of patient charts reveals much-
needed real-world information about the administration of
antipsychotics and other psychotropic medications in this set
of public inpatient facilities for children and adolescents. The
children and adolescents treated in these settings represent a
particularly severe and comorbid patient population. Despite
the fact that inpatient youth diagnosed with psychosis
accounted for only a fraction (20%) of the population,
antipsychotics were commonly prescribed in this sample and
were often used in combination with other agents.
"Real World" Atypical Antipsychotic prescribing practices
Antipsychotics are administered to children and adolescents in public
inpatient settings in high proportions for complex comorbid
conditions involving aggression. Ironically, this real-world patient
population is excluded from clinical research, leaving clinicians to
rely on clinical experience rather than empirical evidence, data
reveal that there are great disparities in the use of antipsychotics
across facilities, and this may be due in part to the lack of available
data to guide these practices.
Several findings regarding the administration of psychotropic
medications surprised us and raised important areas of concern. The
number and proportion of medications on admission were very
similar to medication regimens at discharge. One would expect that
after an average stay of more than 3 months, more adjustments
would be made to the medication regimen. The rationales for this
lack of change in treatment regimen are situation makes it difficult
to determine whether and how changes in medication might affect
Prescription practices
• The administration of two or more psychotropic medications
(polypharmacy) is also an area of concern. In our chart review, because
the number of medications given to patients tended not to change over
the course of treatment, it is possible that polypharmacy in these facilities
represents treatment inertia. In other words, physicians at these facilities
tend to sustain, rather than initiate, the use of polypharmacy. Patients'
charts did not provide enough information regarding rationale for
physicians' medication strategies, and given that cases are often seen by a
number of physicians, there is little evidence of continuity in medication
use. For example, one study found that nearly half of patients given
risperidone in a State hospital were taken off their medication within 15
days after discharge by their outpatient physician
• A clear rationale for medication strategy was often missing from
medication progress notes. This is particularly important given the great
concern over antipsychotics' side effects, a concern that was repeatedly
raised during focus groups. In these ways, physicians' actual practices did
not match experts‘ agreed-upon best practices. Many current practices
Prenatal exposure to ultrasound waves impacts
neuronal migration in mice PNASAng et al. August 22, 2006 vol. 103 no. 34
• Neurons of the cerebral neocortex in mammals, including humans, are
generated during fetal life in the proliferative zones and then migrate to
their final destinations by following an inside-to outside sequence. The
present study examined the effect of ultrasound waves (USW) on neuronal
position within the embryonic cerebral cortex in mice. We used a single
BrdU (Bromodeoxyuridine commonly used in the detection of proliferating cells in living tissues) injection to
label neurons generated at embryonic day 16 and destined for the
superficial cortical layers.
• Our analysis of over 335 animals reveals that, when exposed to USW for a
total of 30 min or longer during the period of their migration, a small but
statistically significant number of neurons fail to acquire their proper
position and remain scattered within inappropriate cortical layers and or in
the subjacent white matter. The magnitude of dispersion of labeled neurons
was variable but systematically increased with duration of exposure to USW.
These results call for a further investigation in larger and slower-developing
brains of non-human primates and continued scrutiny of unnecessarily long
prenatal ultrasound exposure.
Prenatal Exposure to Ultrasound
Schematic representation of the progression of neuronal migration to the superficial cortical
layers in the normal mouse. (A–D) Most cells labeled with BrdU at E16 arrive in the cortex by
E18, and, by P1, those cells become surpassed by subsequently generated neurons.
Eventually, these cells will settle predominantly in layers 2 and 3 of the cerebrum. (E–H)
Model of the USW effect. When cells generated at E16 are exposed to USW, they slow down
on E17, and some remain in the white matter or are stacked in the deeper cortical layers.
Effect Size
• Effect Size refers to the strength of association between variables.
The Pearson r correlation coefficient is one indicator of effect size; it
indicates the strength of the linear association between two
variables pg 252 Cozby & Bates
• The concept of effect size already appears in everyday language. For
example, a weight loss program may boast that it leads to an average
weight loss of 30 pounds. In this case, 30 pounds is the claimed
effect size. Another example is that a tutoring program may claim
that it raises school performance by one letter grade. This grade
increase is the claimed effect size of the program. These are both
examples of "absolute effect sizes", meaning that they convey the
average difference between two groups without any discussion of
the variability within the groups. For example, if the weight loss
program results in an average loss of 30 pounds, it is possible that
every participant loses exactly 30 pounds, or half the participants
lose 60 pounds and half lose no weight at all
Socioeconomic Inequality in the Prevalence of Autism
Spectrum Disorder Durkin MS et al.PLoS One. 2010 Jul 12;5(7)
• The prevalence of ASD increased in a dose-response manner with increasing SES, a
pattern seen for all three SES indicators used to define SES categories
Prevalence per 10001 of ASD by three SES indicators based on census block group of residence. 1Thin bars indicate
95% confidence intervals. Within each SES indicator, both the trend test and x2 tests were significant at p,0.0001. 2MHI
refers to median household income.
• The main results of this study were consistent with the only study
larger than this to examine the association between ASD risk and an
indicator of SES. That study, published in 2002 by Croen and
colleagues, looked at more than 5000 children with autism
receiving services coordinated by the California Department of
Developmental Services and found a stepwise increase in autism
risk with increasing maternal education
• Epidemiologists long have suspected that associations
between autism and SES are a result of ascertainment bias, on
the assumption that as parental education and wealth
increase, the chance that a child with autism will receive an
accurate diagnosis also increases
• Paranormal phenomena Signal to Noise ratio
Path Analysis
• Path analysis is a straightforward extension of multiple
regression. Its aim is to provide estimates of the
magnitude and significance of hypothesized causal
connections between sets of variables. This is best
explained by considering a path diagram.
• To construct a path diagram we simply write the names
of the variables and draw an arrow from each variable
to any other variable we believe that it affects. We can
distinguish between input and output path diagrams.
An input path diagram is one that is drawn beforehand
to help plan the analysis and represents the causal
connections that are predicted by our hypothesis. An
output path diagram represents the results of a
statistical analysis, and shows what was actually found.
• To construct a path diagram we simply write the names of the
variables and draw an arrow from each variable to any other
variable we believe that it affects. We can distinguish between
input and output path diagrams. An input path diagram is one
that is drawn beforehand to help plan the analysis and
represents the causal connections that are predicted by our
hypothesis
• An output path diagram represents the
results of a statistical analysis, and shows
what was actually found
Distributions and Central Tendency
Dispersion Sum of Squares
• Subject Score(x) X2
Dispersion Sum of Squares
• Subjects Score X X 2 x X2
• 1 0 0 -5 25
• 2 1 1 -4 16
• 3 2 4 -3 9
• 4 4 16 -1 1
• 5 5 25 0 0
• 6 6 36 1 1
• 7 7 49 2 4
• 8 8 64 3 9
• 9 8 64 3 9
• 10 9 81 4 16
• N=10 T=50 ∑X2= 340 = 0 ∑ = =90
A Modified Constraint-Induced
Therapy program
• Answer the following questions about the article
• 1) A constraint-induced movement therapy (CIT) program is what kind of intervention (pg1 abstract)
• 2) Describe the Subjects: (how many) children with (what disorder) were placed in
(what kind) of design (pg 1 under Methods in abstract)
• 3) What were the two procedures being compared? _________vs. __________
• 4) What were the two specifically designed tests? Name then __________and ________
• 5)How many times were the tests administered?_____ At what points in the study were they
administered________?
• Was there a significant difference between the groups? (yes or no)?
• Which of the two groups or procedures was more effective?__________
Type out the above questions on a separate sheet and fill in the blanks and turn in the paper with
your name, class & title at the top. Each blank is worth 2 points 12 blanks=25 points (24 + 1
bonus point)
• ∑
Organization of report/article Appendix A
The body of the paper will have the following sections;
Introduction, Methods, Results and Discussion
• Introduction includes 1) the problem under study
2) literature review 3) rationale and hypothesis of the study-
Introduction progresses from broad theories and research
findings to specific current details
• Method provides reader with details information about how
the study was conducted. Often there are subsections
describing subjects, apparatus materials and the procedure(s)
used. Number and relevant characteristics of subjects are
stated. Any equipment used is described and the procedure
section states how the study was conducted step by step in
temporal order. Methods also describes how extraneous variables
were controlled and how randomization was used
Organization of report/article Appendix A
• Results-In this section you offer the reader a straightforward
description of you analyses with no explanation of the
findings. Present your results in the same order as your
predictions were made in the introduction section. State
which statistical test was used and what level of alpha was set
at. In APA style, tables and figures are not presented in the
main body of the manuscript but rather placed at the end of
the paper. Avoid duplication of tables, figures as well as
statements in the text
• Discussion-In this section the interpretations of the results are
described considering what is the relationship between our
results and past research and theory. Explain how the study
either did or did not obtain the results expected, what flaws
and limitations were in the methods used and if you can
generalize your results and the implications for future research
Organization of report/article Appendix A
• Introduction -1) What is known 2) What is not
known that this study addresses
• Methods –Subjects Who are they, Where did you
get them, What did you do with them (how
assigned to groups, conditions etc.)
• Results- What happened? Did the result match the
prediction or not?
• Discussion-What do the results mean (interpret
them) for this study, the field in general and the
future
• Stephan Cowans, a Boston man who spent six years in prison
for the shooting of a police sergeant, was released in 2004
after the discovery that the fingerprint used to convict him
was not his.
• That same year, the FBI mistakenly linked Brandon Mayfield,
an Oregon lawyer, to a fingerprint lifted off a plastic bag of
explosive detonators found in Madrid after commuter train
bombings there killed 191 people. Two weeks after Mayfield’s
arrest, Spanish investigators traced the fingerprint to an
Algerian man
Diabetes and Cognitive Systems in Older Black and White Persons
• Introduction
• Diabetes has long been associated with impaired cognition in white
individuals and although the prevalence of diabetes is increasing this
association with cognition has not been fully tested in black
individuals
• Methods
• Subjects were older community dwelling persons recruited from
senior and private residential housing in the Chicagoland area. All
subjects were enrolled in 1 of 2 studies of aging and cognition
(Minority Aging Research Study and the Memory and Aging Project
with 336 and1,187 subjects respectively). After 80 subjects were
eliminated due to a diagnosis of dementia the remaining subjects
(mean age73.1 and 79.9years, mean education 14.8 and14.3years,
92.8%white and 6.3% white in the second study and all black in the
first) underwent clinical, neurological and neuropsychological
evaluation including tests of semantic memory, episodic memory,
working memory, perceptual speed and visuospatial abilities

Behavioral research (2)

  • 1.
  • 2.
    What many donot get about this topic The people who have changed how we think about science and the world were often rebels or had very ‘radical ideas’ which threatened the established order of the predominant world view Galileo has been called the "father of modern observational astronomy", the "father of modern physics", the "father of science His observations of the satellites of Jupiter caused a revolution in astronomy: a planet with smaller planets orbiting it did not conform to the principles of Aristotelian cosmology, which held that all heavenly bodies should circle the Earth and met with opposition from astronomers, who doubted heliocentrism The matter was investigated by the Roman Inquisition in 1615, which concluded that heliocentrism was false and contrary to scripture, placing works advocating the Copernican system on the index of banned books and forbidding Galileo from advocating heliocentrism. Galileo was one of the first modern thinkers to clearly state that the laws of nature are mathematical
  • 3.
    The Rebels• Thefirst of the great anatomists was Galen of Pergamon (AD 130- 200) who made vast achievements in the understanding of the heart, the nervous system, and the mechanics of breathing. Because human dissection was forbidden, he performed many of his dissections on Barbary apes, which he considered similar enough to the human form. The system of anatomy he developed was so influential that it was used for the next 1400 years. Galen continued to be influential into the 16th century, when a young and rebellious physician began the practice of using real human bodies to study the inner workings of the human body • Andreas Vesalius who came from a line of four prominent family physicians. Vesalius and other like-minded anatomy students would raid the gallows of Paris for half-decomposed bodies and skeletons to dissect. Rather than considering dissection a lowering of his prestige as a doctor, Vesalius prided himself in being the only physician to directly study human anatomy since the ancients. Although he respected Galen Vesalius often found that his study of
  • 4.
    The Rebels • Likehis fellow revolutionary scientists, Vesalius’ masterpiece was met with harsh criticism. Many of these criticisms understandably came from the church, but the most strident of all came from Galenic anatomists. These critics vowed that Galen was in no way incorrect, and so if the human anatomy of which he wrote was different from that which was proved by Vesalius, it was because the human body had changed in the time between the two. As a response to the harsh criticisms of his work, Vesalius vowed to never again bring forth truth to an ungrateful world. In the same year that he published de humani(1543), he burned the remainder of his unpublished works, further criticisms of Galen, and preparations for his future studies. He left medical school, married, and lived out the rest of his conservative life as a court physician (source brain blogger)
  • 5.
    Not what butwho you know French chemist and microbiologist renowned for his discoveries of the principles of vaccination, microbial fermentation and pasteurization. His medical discoveries provided direct support for the germ theory of disease and its application in clinical medicine-popularly known as the "father of microbiology". In 1847 he was given a 2 year appointment as an assistant in obstetrics with responsibility for the First Division of the maternity service of the vast Allgemeine Krankenhaus teaching hospital in Vienna. There he observed that women delivered by physicians and medical students had a much higher rate (13– 18%) of post-delivery mortality (called puerperal fever or childbed fever) than women delivered by midwife trainees or midwives (2%).
  • 6.
    Agree to Disagree(disagreeably) • This case-control analysis led Semmelweis to consider several hypotheses. He concluded that the higher rates of infections in women delivered by physicians and medical students were associated with the handling of corpses during autopsies before attending the pregnant women. This was not done by the midwives. He associated the exposure to cadaveric material with an increased risk of childbed fever, and conducted a study in which the intervention was hand washing.
  • 7.
    Who dares challengethe existing dogma? • Dr Semmelweis initiated a mandatory hand washing policy for medical students and physicians. In a controlled trial using a chloride of lime solution, the mortality rate fell to about 2%—down to the same level as the midwives. Later he started washing the medical instruments and the rate decreased to about 1%. His superior, Professor Klein did not accept his conclusions. Klein thought the lower mortality was due to the hospital’s new ventilation system. • Semmelweis did not get his assistant professorship renewed in 1849. He was offered a clinical faculty appointment (privatdozent) without permission to teach from cadavers. He returned home to Budapest.
  • 8.
    Misconception # 2 •The popular believe is that the material and methods in this course are abstract and have little to do with important issues in everyday life. My Question-Why do we not use these methods to examine difficult questions? • Terrorism (how it develops, how to prevent it-airport security) • Torture (Is it effective? Does it provide useful information?) • How can we best prevent rape and assault? • Are there gun control approaches that reduce gun violence? • Such questions are not addressed adequately by the ideas and tools provided by this field mainly because people maintain a view of this field as ‘academic and irrelevant’opinion Also many times we make assumptions that go untested and may turn out to be incorrect-See unregulated radiation doses in CT scans Rebecca Bindman
  • 9.
    Misconception # 3 •Numbers drive the ideas • Actually it is the ideas that drive the numbers • Numbers can describe and quantify and also tell us about differences between individuals and or groups as well as accurately describe changes that occur. The research ideas and tools in such a class as this can also help us distinguish between true and false claims and identify those claims that are significant and meaningful.
  • 10.
    An Epidemic ofFalse Claims-Scientific American May 7 2011 • False positives and exaggerated results in peer- reviewed scientific studies have reached epidemic proportions in recent years. The problem is rampant in economics, the social sciences and even the natural sciences, but it is particularly egregious in biomedicine. • Many studies that claim some drug or treatment is beneficial have turned out not to be true. We need only look to conflicting findings about beta-carotene, vitamin E, hormone treatments, Vioxx and Avandia. Even when effects are genuine, their true magnitude is often smaller than originally claimed.
  • 11.
    An Epidemic ofFalse Claims • Research is fragmented, competition is fierce and emphasis is often given to single studies instead of the big picture. • Much research is conducted for reasons other than the pursuit of truth. Conflicts of interest abound, and they influence outcomes. In health care, research is often performed at the behest of companies that have a large financial stake in the results. Even for academics, success often hinges on publishing positive findings.
  • 12.
    What is usefulnessof this course • Claims are made all the time regarding some product or process and sometimes some controversy • A new study into the efficiency and reliability of wind farms has concluded that a campaign against them is not supported by the evidence • Internet marketers of acai berry weight-loss pills and colon cleansers will pay $1.5 million to settle charges of deceptive advertising and unfair billing, the Federal Trade Commission announced today. The FTC complaint alleged that two individuals and five related companies deceptively claimed that their Acai Pure supplement would cause rapid and substantial weight loss, and that their Colotox colon cleanser would prevent colon cancer.
  • 14.
    The scientific method •A body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be scientific, a method must be based on empirical and measurable evidence subject to specific principles of reasoning Empiricism-knowledge comes only or primarily from sensory experience. • Although procedures vary from one field of inquiry to another, identifiable features distinguish scientific inquiry from other methods of obtaining knowledge. • Anthropology-Zoology
  • 15.
    The scientific method •The scientific approach recognizes that both intuition and authority can be sources of ideas but does not unquestionably accept something as true based on a person’s prestige or authority pg 3-5 • The fundamental characteristic of scientific method is empiricism-the idea that knowledge is based on observations and that these observations can be measured (creating data or a set of data) pg 5 • Science is adversarial Since a requirement is that hypotheses must be testable, researchers conduct and then publish their results, allowing others to review them and decide for themselves the validity and reliability of the data and the conclusions drawn from them pg 6 • Scientific evidence is peer reviewed-Editors of the journal examine the research submitted to determine its validity
  • 16.
    Pseudoscience • Hypothesis generatedare typically not testable • Methodology is not scientific and validity of data is questionable • Supportive evidence tends to be anecdotal and/or rely on “so-called’ experts • Conflicting evidence is ignored • Language used sounds scientific • Claims tend to be vague, rationalize strongly held beliefs and appeal to preconcieved ideas • Claims are never revised pg 9
  • 17.
    Scientific Inquiry • Researcherspropose hypotheses (a tentative idea that must be tested) pg19 as explanations of phenomena, and design experimental studies to test these hypotheses via predictions which can be derived from them • Scientific inquiry is generally intended to be as objective as possible in order to reduce biased interpretations of results. Another basic expectation is to document and share all data and methodology so they are available for careful scrutiny by other scientists, giving them the opportunity to verify results by attempting to reproduce them (replicate results)
  • 18.
    Scientific Inquiry Scientistsare funny • “The history of biochemistry is a chronicle of controversies. These controversies exhibit a common pattern. There is a complicated hypothesis, which usually entails an element of mystery and several unnecessary assumption. This is opposed by a more simple explanation, which contains no unnecessary assumptions. • The complicated one is always the popular one at first, but the simpler one, as a rule, eventually is found to be correct. This process frequently requires ten to twenty years, The reason for this long time lag was explained by Max Planck. He remarked that scientists never changed their mind, but eventually they die” --John Northrup Biochemist
  • 19.
    Hypotheses and Theories •An hypothesis is a conjectural (if-then) statement while a theory is a systematic body of ideas about a particular topic or phenomenon pg 19 • A question is asked that may refer to an observation (e.g. Do aggressive video games increase aggression in adolescents and young adults?) or may be in the form of an open-ended question (what strategies are best for coping with natural disasters?) • We then make conjectures (hypotheses), and test them to see if our predictions (specific predictions) conform to what happens in the real world • Theories encompass wider domains of inquiry that may bind many independently derived hypotheses together in a coherent, supportive structure. Theories, in turn, may help form new hypotheses or place groups of hypotheses into context
  • 20.
    Basic Steps ofScientific Inquiry • Define a question • Gather information and resources (observe) • Form an explanatory hypothesis • Test the hypothesis by performing an experiment and collecting data in a reproducible manner • Analyze the data • Interpret the data and draw conclusions that serve as a starting point for new hypothesis • Publish results • Retest (frequently done by other scientists)
  • 21.
    Examples of Pseudoscience •Expectations that 2012 would bring large-scale disasters or even the end of the world • Ancient Astronauts - Proposes that aliens have visited the earth in the past and influenced our civilization • Astrology - Belief that humans are affected by the position of celestial bodies • Flat Earth Society - Claims the Earth is flat and disc-shaped • Moon Landing Conspiracy - Contends the original moon landing was faked • Bermuda Triangle - An area where unexplained events, like disappearances of ships and airlplanes, have occurred • Cryptozoology - The search for Bigfoot (Yeti), the Loch Ness monster, El Chupacabra and other creatures that biologists believe do not exist
  • 22.
    Some More Controversies •Mayan Calendar predictions for 2012 • Crystal healing • Hypnosis – state of extreme relaxation and inner focus in which a person is unusually responsive to suggestions made by the hypnotist. The modern practice has its roots in the idea of animal magnetism, or mesmerism, originated by Franz Mesmer Mesmer's explanations were thoroughly discredited, and to this day there is no agreement amongst researchers whether hypnosis is a real phenomenon, or merely a form of participatory role- enactment
  • 23.
    The Geocentric Model&The Wanderers • Most of the time we see Mars, Jupiter and Saturn moving around the Sun in the same direction as the Earth, but during the relatively short time that the Earth overtakes one of these planets, that planet appears to be moving backward. As the Greeks noticed discrepancies between the way planets moved and the basic geocentric model, they began adjusting the model and creating variations on the original. In these models, planets and other celestial bodies move in circles that have been superimposed onto circular orbits around the Earth • http://www.lasalle.edu/~smithsc/Astronomy/retrograd.html
  • 24.
    The Earth Moved •The solution proposed by Ptolemy, to these discrepancies came in the form of a mad, but clever proposal: planets were attached, not to the concentric spheres themselves, but to circles attached to the concentric spheres • The Ptolemaic system, the most well-known versions of the geocentric model, was a complex interaction of circles. Ptolemy believed that each planet orbited around a circle, which was termed an epicycle, and the epicycle orbits on a bigger circle–the deferent–around the Earth. • However, in practice, even this was not enough to account for the detailed motion of the planets on the celestial sphere! In more sophisticated epicycle models further "refinements" were introduced. In some cases, epicycles were themselves placed on epicycles
  • 25.
    The Day theEarth Stood Still • Ptolemic geocentric theory describes and correctly predicts-one could confidently predict when a planet’s apparent motion would come to a halt and turn around, and for how long it would seem to move backwards. Theory predicts but does not explain HOW or WHY the planets move as they do • Correlation~Prediction Causality • Navigation unaffected • Occam’s razor or the law of parsimony • Once Kepler proposed the theory of elliptical orbits, heliocentrism became such a simple model compared to Ptolemy's unwieldy cycles and epicycles, that heliocentrism rapidly gained in popularity and quickly became the dominant theory
  • 26.
  • 27.
  • 28.
    Hypothetically Speaking Researchers generallytest a hypothesis-a tentative idea or question that can be supported or refuted and then design a study to test the hypothesis. The researcher also makes a prediction regarding the outcome of the experiment pg 19 If the prediction is not confirmed the researcher will either reject the hypothesis or conduct further research using different methods pg 19 However, if the results of the study confirm the prediction the hypothesis is supported but not proven
  • 29.
    Constructing the study •Participants in the study are Subjects pg 20 • Participants in survey research are respondents • Those who help researchers understand a particular culture or organization are informants • Participants are often more fully described by characterizing them as students, employees, residents, patients etc. • Other terms for subjects include respondents, informants pg 20
  • 30.
    Sources of Ideas •Common sense-The things we all believe to be true although such notions do not always turn out to be correct (also popular beliefs-the 5 sec rule pg 20-21 ) • Observation- Listening to music with degrading sexual lyrics predicts a range of early sexual behavior • Serendipity-Luck? Pg 21 Pavlov? Accidental discovery of dogs salivating to other stimuli besides food (Otto Loewi and the discovery of Acetylcholine) it was generally accepted that neurons were connected by synapses and initially most neurophysiologists believed that signal transmission between cells was electrical Other Example Accidental discovery of medications in 1950’s • Theories • Past research
  • 31.
    Sense-Common and Otherwise •Common sense is often made up of much prejudice and snap judgment, and therefore is not always useful and can certainly be irrational even when it is useful • Testing a commonsense idea can be useful since such ideas do not always turn out to be true • Stress theory of ulcers: As peptic ulcers became more common in the 20th century, doctors increasingly linked them to the stress of modern life. Medical advice during the latter half of the 20th century was, essentially, for patients to take antacids and modify their lifestyle. In the 1980s Australian clinical researcher Barry Marshal discovered that the bacterium H. pylori caused peptic ulcer disease, leading him to win a Nobel Prize in 2005
  • 32.
    Another Crazy Idea •Immovable continents: Prior to the middle of the 20th century scientists believed the Earth’s continents were stable and did not move. This began to change in 1912 with Alfred Wegener’s formulation of the continental drift theory, and later and more properly the elucidation of plate tectonics during the 1960s • Accident and Serendipity- Pavlov did not set out to discover classical conditioning but was studying the digestive system and found that dogs would salivate to a neutral stimulus when paired with food
  • 33.
    Theories • Theory-a systematicbody of ideas about a particular topic or phenomenon with a consistent structure that has two functions pg22 • 1) Theories organize and explain various facts and descriptions or observations putting them into a coherent framework (system) • 2) Theories generate new knowledge by guiding our observations and generating new hypotheses Theories are living and dynamic (and the theory can be modified to account for new data) • Theories Hypotheses A theory consists of much more than a simple idea and is grounded in prior research often with several consistent hypotheses
  • 34.
    Theories (and facts)change • Theories can be modified by new discoveries Example-The original conception of long term memory as a permanent fixed storage place was modified when Loftus (1979) demonstrated that memories could be influenced by how subjects were questioned pg23 participants viewed a simulated automobile accident and later asked questions Did you see the broken headlight? vs. Did you see a broken headlight? Subjects more likely to answer yes to first version • Memories can also be induced so memory is not simply a record of what happened • Relevant to Criminal Justice system and police procedures
  • 35.
    Theories and data •Under sources of idea pg23 top Cozby and Bates cite the research of Buss (2007) proposing that males feel more intense jealousy when a partner is unfaithful due to the physical infidelity while females are more jealous due to the emotional infidelity. This is consistent with evolutionary theory • Females are more threatened by men who would form an emotional bond with another partner and withdraw support and resources –Males are more threatened that they might have to care for a child who does not share any of his genes taken from evolutionary theory pg23
  • 36.
    Past Research • “Becomingfamiliar with a body of research on a topic is perhaps the best way to generate ideas for new research” pg24 • Becoming familiar with a particular body of research allows you to see inconsistencies • What you know about one research area may be applied to another research area • Researchers refine and expand on known and published research • Replication-An attempt to repeat a finding using a different setting, a different demographic group (age, sex etc) or different methodology • Research is also stimulated by practical problems that may have immediate applications
  • 37.
    Examining Data critically •Example of facilitated communication in which a ‘facilitator’ held the hand of an autistic child to help press keys on a keyboard or otherwise assist in communication • Montee et al. 1995 constructed study with three conditions (1) Both child and facilitator were shown the same picture and child asked to identify picture (by using keyboard) assisted by facilitator (2) Only child saw the picture (3) The child and facilitator saw different pictures (unknown to facilitator) – Results Pictures were correctly identified only in condition one
  • 38.
    Evaluating web Information •Is the site associated with a major educational institution or is it sponsored by one individual or organization and if so what may be the bias of that person or organization (e.g. Disabled People's International) • Is the information provided by those responsible for the cite? What are their qualifications? • Is the information current • Do links from the site lead to legitimate organizations? Pg 35
  • 39.
    Journals and LibraryResearch • Most papers submitted for publication in major journals are rejected (during peer review) • Peer Review-Editors on the journal review the article and also send it to other experts in the field to review pg 25 Due to limited space and the number of articles received most articles submitted are rejected • Journals usually specialize in one or two articles View pg26 • PsycINFO Science Citation Index Social Sciences Citation Index pubmed
  • 40.
    Literature Review • A“literature review” reviews the scholarly literature on a specific topic by summarizing and analyzing published work on that topic. A literature review has several purposes: • 1) To evaluate the state of research on a topic • 2) To familiarize readers and students with what has already been done in the field • 3) To suggest future research directions or gaps in knowledge
  • 41.
    Traditional and OpenAccess journals • In traditional, subscriber-pays publishing, the publisher, who holds the copyright to an article, pays most printing and distribution costs and, in order to read an article, the journal subscriber pays fees, whether for hard-copy or online versions. Sometimes an author is required to pay printing page charges for complex graphics or color presentations. • “Open access” publishing generally means that the author or publisher, who holds the copyright to an article, grants all users unlimited, free access to, and license to copy and distribute, a work published in an open access journal usually on-line
  • 42.
    Traditional and OpenAccess journals • Traditional publishing - Individuals and libraries are charged fees to access the article. Depending on the contract you sign as an author, you may not be able to distribute copies of your article or post it online. • The now-common usage of the term "open access" means freely available for viewing or downloading by anyone with access to the internet. • UK Wellcome Trust(global charitable foundation) assumes that “the benefits of research are derived principally from access to research results”, and therefore that “society as a whole is made worse off if access to scientific research results is restricted” • Problems of traditional and open access • Sending papers to reviewers who are sympathetic (traditional) • Payment for publication (by authors) could create conflicts of interest and have a negative impact on the perceived neutrality of peer review, as there would be a financial incentive for journals to publish more articles(open access) • Open Access is also often seen as a solution to the situation where many libraries have been forced to cut journal subscriptions because of price increases
  • 43.
    Traditional vs. OpenAccess Publishing • Controversies about open access publishing and archiving confront issues of copyright and governmental competition with the private sector. • Traditional publishers typically charge readers subscriber fees to fund the costs of publishing and distributing hard-copy and/or online journals. • In contrast, most open access systems charge authors publication fees and give readers free online access to the full text of articles
  • 44.
    Good and Badsources
  • 45.
    Anatomy of aResearch Article Abstract, Introduction, Method Section, Results Section and Discussion (Conclusions)
  • 46.
    Abstract and Introduction •Abstract – a summary of the report which typically runs no more than 120 words. It includes information about the hypothesis, the procedure of the study and a summary of results (there may be some information about the discussion) • Introduction The researcher outlines the problem including past research and theories relevant to the problem. Expectations are listed (usually in the form of hypotheses) pg35
  • 47.
    Method Section • Themethod section is divided into subsections as determined by the author and dependent on the complexity of the study and its design. Sometimes there is an overview of the design explained to the reader • The next section describes the characteristics of the participants (number of subjects, male/female etc.) • The next subsection describes the procedure, the materials or instruments used, how data was recorded. • Additional subsections are used as necessary to describe equipment, procedures or other information to be included • Details of all relevant information must be included to allow other researchers to replicate the study
  • 48.
    Results and Discussion •Results-In this section the researcher presents the findings, usually in three ways. First there is a narrative summary. Second there is a statistical description. Third tables are presented. “Statistics are only a tool the researcher uses. . .” Not understanding how the calculations were performed is not a deterrent to reading and understanding the logic behind the design and statistical procedures used • Discussion-The researcher reviews the research from various perspectives, determining if the research supports the hypothesis or not and offer explanations in either case, what went wrong in the study. There is also usually a comparison with past research and there may be suggestions for practical applications of the research findings
  • 49.
    The Quick Guide-copyrighted •Introduction 1) What is known 2) What is not known which this study addresses • Methods Who Where What –Who are the subjects-(describe them), where did they come from and what did you do with them (often divide them into groups such as experimental and control) • Results-What happened? (e.g. which group did better) • Discussion-What do the results mean. Interpretation of the study is in this section
  • 50.
    Ethical Research-Chapter 3 •Beneficence-The principle which states the need to maximize benefits and minimize harm pg40 • Risk-Benefit Analysis- what is potential harm?, does confidentiality hold?, was there informed consent?
  • 51.
    Milgram’s Methodology • Througha rigged drawing, the participant was assigned the role of teacher while the confederate was always the learner. The participant watched as the experimenter strapped the learner to a chair in an adjacent room and attached electrodes to the learner’s arm. The participant’s task was to administer a paired associate learning test to the learner through an intercom system. • Participants sat in front of an imposing shock generator and were instructed to administer an electric shock to the learner for each incorrect answer. Labels above the 30 switches that spanned the front of the machine indicated that the shocks ranged from 15 to 450 volts in 15-volt increments. Participants were instructed to start with the lowest switch and to move one step up the generator for each successive wrong answer.
  • 52.
    Milgram’s Methodology • Thesubjects believed that for each wrong answer, the learner was receiving actual shocks. In reality, there were no shocks. After the confederate was separated from the subject, the confederate set up a tape recorder integrated with the electro-shock generator, which played pre-recorded sounds for each shock level. After a number of voltage level increases, the actor started to bang on the wall that separated him from the subject. After several times banging on the wall and complaining about his heart condition, all responses by the learner would cease • At this point, many people indicated their desire to stop the experiment and check on the learner. Some test subjects paused at 135 volts and began to question the purpose of the experiment. Most continued after being assured that they would not be held responsible • After the 330-volt shock, the learner no longer screamed or protested when receiving a shock, suggesting that he was physically incapable of responding. The major dependent variable was the point in the procedure
  • 53.
    Milgram’s Methodology Deception • Ifat any time the subject indicated his desire to halt the experiment, he was given a succession of verbal prods by the experimenter, in this order • Please continue. • The experiment requires that you continue. • It is absolutely essential that you continue. • You have no other choice, you must go on • If the subject still wished to stop after all four successive verbal prods, the experiment was halted. Otherwise, it was halted after the subject had given the maximum 450- volt shock three times in succession • The experimenter also gave special prods if the teacher made specific comments. If the teacher asked whether the learner might suffer permanent physical harm, the experimenter replied, "Although the shocks may be painful, there is no permanent tissue damage, so please go on
  • 54.
    Ethical Research • Milgramsummarized the experiment in his 1974 article, "The Perils of Obedience", The legal and philosophic aspects of obedience are of enormous importance, but they say very little about how most people behave in concrete situations. I set up a simple experiment at Yale University to test how much pain an ordinary citizen would inflict on another person simply because he was ordered to by an experimental scientist.. . . The extreme willingness of adults to go to almost any lengths on the command of an authority constitutes the chief finding of the study and the fact most urgently demanding explanation. . . relatively few people have the resources needed to resist authority • Milgram (1974) maintained that the key to obedience had little to do with the authority figure’s manner or style. Rather, he argued that people follow an authority figure’s commands when that person’s authority is seen as legitimate.
  • 55.
    Data can surpriseus • Before conducting the experiment, Milgram polled fourteen Yale University senior-year psychology majors to predict the behavior of 100 hypothetical teachers. All of the poll respondents believed that only a very small fraction of teachers (the range was from zero to 3 out of 100, with an average of 1.2) would be prepared to inflict the maximum voltage. Milgram also informally polled his colleagues and found that they, too, believed very few subjects would progress beyond a very strong shock. • Milgram also polled forty psychiatrists from a medical school and they believed that by the tenth shock, when the victim demands to be free, most subjects would stop the experiment. They predicted that by the 300 volt shock, when the victim refuses to answer, only 3.73 percent of the subjects would still continue and they believed that "only a little over one-tenth of one percent of the subjects would administer the highest shock on the board
  • 56.
    The relevance ofMilgram • Milgram sparked direct critical response in the scientific community by claiming that "a common psychological process is centrally involved in both [his laboratory experiments and Nazi Germany] events • There are psychological processes which can disengage morality from conduct
  • 57.
    Criticism of Milgram •In addition to their scientific value, the obedience studies generated a great deal of discussion because of the ethical questions they raised (Baumrind, 1964; Fischer, 1968; Kaufmann, 1967; Mixon, 1972). Critics argued that the short-term stress and potential long- term harm to participants could not be justified. • In his defense, Milgram (1974) pointed to follow-up questionnaire data indicating that the vast majority of participants not only were glad they had participated in the study but said they had learned something important from their participation and believed that psychologists should conduct more studies of this type in the future. Nonetheless, current standards for the ethical treatment of participants clearly place Milgram’s studies out of bounds (Elms, 1995).
  • 58.
    Mechanisms of moraldisengagement. A.Bandura • Theory of Moral Disengagement seeks to analyze the means through which individuals rationalize their unethical or unjust actions • Moral justification- turns killing into a moral act. when non-violent acts appear to be ineffective and when there is a serious threat to a person's way of life. Justification can take many forms and can be considered a service to humanity or for the greater good of the community • Displacement of Responsibility- Group decision making can diffuse responsibility. Personal responsibility is obscured • Disregard for Consequences- People minimize the consequences of acts they are responsible for. It's easier to hurt others when they are not visible • Dehumanization- People find violence easier if they don't consider they victims as human beings. The road to terrorism is gradual • Euphemistic labeling- terms that are less negative or might be viewed as positive — to make actions seem less harmful. This sort of labeling also serves to limit or reduce their responsibility for their actions • Advantageous comparison- people who engage in reprehensible acts make it seem less objectionable by comparing it to something perceived as being worse
  • 59.
    Some criticisms ofMilgram• Professor James Waller, Chair of Holocaust and Genocide Studies at Keene State College, formerly Chair of Whitworth College Psychology Department, expressed the opinion that Milgram experiments do not correspond well to the Holocaust events • The subjects of Milgram experiments, wrote James Waller (Becoming Evil), were assured in advance, that no permanent physical damage would result from their actions. However, the Holocaust perpetrators were fully aware of the finite nature of their hands-on killing and maiming of the victims. • The laboratory subjects themselves did not know their victims and were not motivated by racism. On the other hand, the Holocaust perpetrators displayed an intense devaluation of the victims through a lifetime of personal development. • Those serving punishment at the lab were not sadists, nor hate-mongers, and often exhibited great anguish and conflict in the experiment, unlike the designers and executioners of the Final Solution who had a clear "goal" on their hands, set beforehand. • The experiment lasted for an hour, with no time for the subjects to contemplate the implications of their behavior. Meanwhile, the Holocaust lasted for years with ample time for a moral assessment of all individuals and organizations involved.
  • 60.
    Risks of Research(continued) • Procedures that can cause physical harm are rare while those that involve psychological stress are much more common (refer to Schacter’s study on stress and affiliation) If stress is possible the researcher must use all safeguards possible to assist in dealing with the stress and also include a debriefing session pg 42 • Loss of privacy/confidentiality- Data should be stored securely and be made anonymously if possible but if not care should be taken to separate identifying data from actual data pg43 • Concealed Observation Is it ethical to use data taken from public web sites or those which require some identification
  • 61.
    Risks of Research-Informed Consent • Informed Consent Implies that potential subjects should be provided with all information that might influence their decision to participate in the study pg44 • Informed consent forms generally include 1) purpose of research 2) procedures involved 3) risk/benefits 4) any compensation 5) confidentiality 6) assurance of voluntary participation and permission to withdraw from study 7) contact information for subjects to ask questions • To make form easier to understand it should not be written in the first person – I understand that participation is voluntary (first person) Instead – Participation in this study is voluntary pg44
  • 62.
    Deception and InformedConsent • Deception occurs when there is active misrepresentation of information. In the Milgram experiment there were two examples pg47 • 1) Subjects were told the study was about memory and learning while it was actually about obedience • 2) Subjects were not told they would be delivering shocks to confederates (Milgram created a false reality for subjects) • Milgram’s study took place before informed consent became routine. Might “honest’ informed consent resulted in a different outcome? Would it have biased the sample?
  • 63.
    Deception and Ethics •The concepts of informed consent and debriefing have become standard and more explicit pg48 • While false cover stories are still commonly used especially in Social Psychology, the use of deception is decreasing overall for three reasons • 1) researchers have become more interested in cognitive variables rather than emotional ones and adopt practices more similar to those in cognitive studies which involve less deception (memory research) • 2) there is greater sensitivity and awareness of ethical issues and how they should be handled in research • 3) Review boards at universities are more stringent about approving research involving deception and want to know if alternatives are not available
  • 64.
    Alternatives to Deception •Role Playing-different forms. Ss may be described a situation and asked how they would respond or predict how real participants would react pg50 • However it is not easy to predict one’s own behavior especially when there is some undesirable behavior being studied (e.g. conformity, aggression) • Most people overstate their altruistic tendencies
  • 65.
    Alternatives to Deception •Simulations-enactment of some real situation (can still pose ethical problems) • Zimbardo prison experiment 1971 Stanford • “Our planned two-week investigation into the psychology of prison life had to be ended prematurely after only six days because of what the situation was doing to the college students who participated. In only a few days, our guards became sadistic and our prisoners became depressed and showed signs of extreme stress”-Phillip Zimbardo http://www.prisonexp.org/
  • 66.
    Alternatives to Deception •Honest Experiments-behavior studied without elaborate deception (e.g. speed dating used to study romantic attraction) • Subjects agree to have their behavior studied and know the hypotheses of the researchers • Use situations when people seek assistance Assign students to different conditions of skill improvement (e.g. on-line or in-class help) • Use naturally occuring events to test hypotheses (e.g. New York residents given PTSD checklist to determine if they were different from Wash D.C. residents after 9/11 attacks)
  • 67.
    Sample selection andethics• Justice principle- Any decisions to include or exclude certain people from a research study must be make solely on scientific grounds (e.g. Tuskegee Syphilis Study 1932-1972) pg 52-54 • According to the rules of the U.S. Dept. of Health and Human services all institutions that 4receive federal funds must have an Institutional Review Board (IRB) responsible to review research proposed and conducted by that institution (even if it is not conducted on site at that institution) • IRB must have at least 5 members with at least one member from outside the institution. Exceptions to IRB review include • 1) research in there is no risk (anonymous questions, surveys etc.) are exempt from IRB review • 2) Research with minimal risk(risk no greater than that encountered in daily life) are routinely approved by IRB. All other research with greater than minimal risk is reviewed and requires safeguards such as informed consent) See Table 3.1 pg 54 Assessment of Risk
  • 68.
    IRB impact onResearch • Some researchers may be frustrated over the sometimes long process of review with numerous requests for revisions and clarifications. • These IRB policies apply to all areas of research so that the caution necessary for some medical research is applied to other research with less risk • Some studies indicate that students who have participated in research studies are more lenient in their judgments of the ethics of the experiment than the researchers themselves or the IRB members pg55
  • 69.
    Risk-Benefits of ClinicalResearch • Clinical trials involving new drugs are commonly classified into four phases
  • 70.
    Risk-Benefits of ClinicalResearch •Phase I: Researchers test a new drug or treatment in a small group of people for the first time to evaluate its safety, determine a safe dosage range, and identify side effects. •Phase II: The drug or treatment is given to a larger group of people to see if it is effective and to further evaluate its safety. •Phase III: The drug or treatment is given to large groups of people to confirm its effectiveness, monitor side effects, compare it to commonly used treatments, and collect information that will allow the drug or treatment to be used safely. •Phase IV: Studies are done after the drug or treatment has been marketed to gather information on the drug's effect in various populations and any side effects associated with long- term use (source NIH U.S. Library of Medicine)
  • 71.
    Risk-Benefits of ClinicalResearch • More common than physical stress is psychological stress (Schacter’s study (1959) on anxiety and affiliation)-In the study they had two conditions -- high anxiety and lower anxiety. In the high anxiety Researchers emphasized the ominous and expected pain of the electric shock experiment. In the low anxiety they made it seem nearly painless • Subjects were to rate their anxiety level, and then decide if they prefer being alone or with others before the electric shock tests would begin. Lastly they were given the choice to be let out of the experiment (without credit for their psych class). • Results- 63% of the high anxiety condition wanted to remain together, but only 33% wanted to be together in the low anxiety condition
  • 72.
    Risk-Benefits of ClinicalResearch • Psychological stress-Social psychology experiments (deception) • Giving unfavorable feedback about S’s personality or asking about traumatic or unpleasant events • The Bystander Intervention Model predicts that people are more likely to help others under certain conditions.
  • 73.
    Social Psychology-Psychological harm/stress •Bystander intervention research • Many factors influence people's willingness to help, including the ambiguity of the situation, perceived cost, diffusion of responsibility, similarity, mood and gender, attributions of the causes of need, and social norms. • Situational ambiguity. In ambiguous situations, (i.e., it is unclear that there is an emergency) people are much less likely to offer assistance than in situations involving a clear-cut emergency (Shotland & Heinold, 1985). They are also less likely to help in unfamiliar environments than in familiar ones • Perceived cost. The likelihood of helping increases as the perceived cost to ourselves declines (Simmons, 1991). We are more likely to lend our class notes to someone whom we believe will return them than to a person who doesn't appear trustworthy
  • 74.
    Social Psychology-Psychological harm/stress- Bystanderintervention research • Diffusion of responsibility-The presence of others may diffuse the sense of individual responsibility. It follows that if you suddenly felt faint and were about to pass out on the street, you would be more likely to receive help if there are only a few passers-by present than if the street is crowded with pedestrians. With fewer people present, it becomes more difficult to point to the "other guy" as the one responsible for taking action. If everyone believes the other guy will act, then no one acts • Similarity- People are more willing to help others whom they perceive to be similar to themselves—people who share a common background and beliefs. They are even more likely to help others who dress like they do than those in different attire (Cialdini & Trost, 1998). People also tend to be more willing to help their kin than to help non—kin (Gaulin & McBurney, 2001). • Mood- People are generally more willing to help others when they are in a good mood
  • 75.
    Social Psychology-Psychological harm/stress- Bystanderintervention research • Gender. Despite changes in traditional gender roles, women in need are more likely than men in need to receive assistance from strangers • Attributions of the cause of need. People are much more likely to help others they judge to be innocent victims than those they believe have brought their problems on themselves (Batson, 1998). Thus, they may fail to lend assistance to homeless people and drug addicts whom they feel "deserve what they get." • Social norms. Social norms prescribe behaviors that are expected of people in social situations (Batson, 1998). The social norm of "doing your part" in helping a worthy cause places a demand on people to help, especially in situations where their behavior is observed by others (Gaulin & McBurney, 2001). For example, people are more likely to make a charitable donation when they are asked to do so by a co-worker in full view of others than when they receive an appeal in the mail in the privacy of their own home
  • 76.
    APA Ethics CodeResearch with Humans and Animals • APA ethics code-Psychologists are committed to increasing scientific and professional knowledge of behavior and people’s understanding of themselves and others and to the use of such knowledge to improve the condition of individuals, organizations and society pg55 • Five general principles of the APA ethics code relate to beneficence, responsibility, integrity, justice and respect for the rights and dignity of others • Of the ten ethical standards concerning conduct the focus is on the 8th Ethical Standard for Research and Publication
  • 77.
    Ethics and Researchwith Humans • Institutional approval-IRB • Informed consent includes purpose of experiment, right to decline or withdraw from study, consequences of declining, risks, benefits, confidentiality, incentives for participation and contact information • Psychologist conducting intervention research clarify the nature of the treatment, services available to control group, how will treatment and control groups be formed, alternatives for those wishing to withdraw or not participate and any compensation offered for participation pg56
  • 78.
    Ethics in Researchwith Humans (continued) • 8.05 Psychologists may dispense with informed consent when there is no risk of harm or only anonymous questions or observations are used and confidentiality is protected pg57 • 8.06 Psychologist avoid offering excessive financial or other inducements and if a professional service is offered, its nature, risk and obligations are clarified • 8.07 Psychologists do not use deception unless if can be justified by prospective scientific or other value and no reasonable alternatives are available. No deception is allowed in research that is expected to cause physical pain or severe emotional distress
  • 79.
    Nuremburg Code • Atthe end of World War II, 23 Nazi doctors and scientists were put on trial for the murder of concentration camp inmates who were used as research subjects. Of the 23 professionals tried at Nuremberg,15 were convicted, 7 were condemned to death by hanging, 8 received prison sentences from 10 years to life, and 8 were acquitted • Ten points describing required elements for conducting research with humans became known as the Nuremburg Code • 1) Informed consent is essential 2) Research should be based on prior animal work. The risks should be justified by the anticipated benefits. 3) Only qualified scientists must conduct research. 4) Physical and mental suffering must be avoided. • 5) Research in which death or disabling injury is expected should not be conducted
  • 80.
    Ethics and AnimalResearch • Approximately 7% of articles in Psych Abstracts (PsychINFO) involve animals • Animals commonly used to test effects of drugs, to study physiological mechanisms and genetics • 95% of animals in research are rats, mice and birds • Animal Rights groups have become more active Environmental conditions for animals can be more easily controlled than for humans It is more difficult to monitor a human’s behavior than an animal’s behavior Most scientists agree that animal research benefits humans
  • 81.
    Top Five Reasonsto Stop Animal Testing- PETA • It’s unethical to sentence 100 million thinking, feeling animals to life in a laboratory cage and intentionally cause them pain, loneliness, and fear. • It’s bad science. The Food and Drug Administration reports that 92 out of every 100 drugs that pass animal tests fail in humans. • It’s wasteful. Animal experiments prolong the suffering of people waiting for effective cures by misleading experimenters and squandering precious money, time, and resources that could have been spent on human-relevant research. • It’s archaic. Forward-thinking scientists have developed humane, modern, and effective non-animal research methods, including human-based microdosing, in vitro technology, human-patient simulators, and sophisticated computer modeling, that are cheaper, faster, and more accurate than animal tests. • The world doesn’t need another eyeliner, hand soap, food ingredient, drug for erectile dysfunction, or pesticide so badly that it should come at the expense of animals’ lives.
  • 82.
    Ethics and AnimalResearch • 8.09 Psychologists acquire, care for, use and dispose of animals in compliance with federal, state and local regulations and with professional standards pg59-60 • Psychologists ensure appropriate consideration for animal’s comfort, health and humane treatment • All individuals under the supervision of a psychologist using animals have received instruction in research methods as well as the care, maintenance and handling of the species being used • Surgery is performed under appropriate anesthesia minimizing infection and pain and subjecting animals to pain or stress must be justified scientifically • When an animal’s life must be terminated it must be done rapidly minimizing pain and according to accepted procedure
  • 83.
    Misrepresentation-Fraud and Plagiarism •Fabrication of data is fraud which is most commonly detected when other scientists cannot replicate the results of a study pg 62-63 • Fraud is not considered a major problem in science (it is still rare) in part because researchers know that others will read their reports and conduct their own studies and if found guilty of fraud reputations and careers are seriously damaged • No independent agencies exist to check on the activities of scientists • Plagiarism-misrepresenting another’s work as your own but can include a paragraph or even a sentence that is copied without a reference. Even if you paraphrase you must cite your source • Szabo (2004)-50% of British university students believed that using internet for academically dishonest activates is acceptable
  • 84.
    Fundamental Research Issues-chp4 •Variable – any event, situation, behavior or individual characteristic that varies. Any variable must have at least two or more levels or values pg69 • There are two broad classes of variable-those that vary in quality and those that vary in quantity ;for example gender is a qualitative variable and intelligence is a quantitative variable • Common variables studied are reaction time, memory, self-esteem, stress etc. • Discrete variables can have only finite set of values (no fractional values) sex, political affiliation, number of children) Continuous variable can take any value including fractional- Height, weight, some ability, IQ (do
  • 85.
    Fundamental Research Issues •Operational definition- The set of procedures sued to measure or manipulate a variable pg71 • Many measurements are indirect and we infer from them (We do not really measure temperature but the length of a column of mercury and infer temperature from that) • Pain is a subjective state but we can create measures to infer how much pain someone is experiencing • Wong-Baker FACES rating scale • To determine an operational definition we often ask “how does one behave if one possesses that trait?” • Operational definitions forces scientists to discuss abstract concepts in concrete terms and communicate with each other using agreed upon concepts (how good is your operational definition=construct validity)
  • 86.
    Relationships between Variables •Validity which refers to the degree to which a test or other measure assesses or measures what it claims to measure is known as construct validity Does the operational definition reflect the true meaning of the variable? Pg71 • Validity which refers to whether you can generalize your results to other populations or situations is known as external validity (generalizability) pg85
  • 87.
    Common Threats toValidity • History--the specific events which occur between the first and second measurement. • Maturation--the processes within subjects which act as a function of the passage of time. i.e. if the project lasts a few years, most participants may improve their performance regardless of treatment. • Testing--the effects of being measured may change the behavior or performance of the subject. • Instrumentation--the changes in the instrument, observers, or scorers which may produce changes in outcomes.
  • 88.
    Threats to Validity(continued) • Statistical regression-It is also known as regression to the mean. This threat is caused by the selection of subjects on the basis of extreme scores or characteristics. Give me forty worst students and I guarantee that they will show immediate improvement right after my treatment • Selection of subjects--the biases which may result in selection of comparison groups. Randomization (Random assignment) of group membership is a counter-attack against this threat
  • 89.
    Relationships Between Variables •Relationships between variables 1) Positive Linear Relationship 2) Negative Linear Relationship 3) No Relationship and 4) Curvilinear Relationship pg72 • Positive linear Relationship Increases in one variable are accompanied by increases in a second variable • Negative linear Relationship Increases of one variable are accompanied by decreases in a second variable • No Relationship Levels of one variable are not related to levels of a second variable • Curvilinear Relationship Increases in one variable are accompanied by systematic increases and decreases in a second variable pg73-74
  • 90.
    Correlation Coefficient • Correlationrefers to the degree of how strongly variables are related to one another • Correlated variables are those which tend to vary together; Correlation Causality • Mexican lemon imports prevent highway deaths Obesity caused debt bubble • Others- Pirates cause Global warming Number of radios and number of people in asylums
  • 91.
    Correlation-Scatter Plot • 1is a perfect positive correlation • 0 is no correlation (the values don't seem linked at all) • -1 is a perfect negative correlation The value shows how good the correlation is and if it is positive or negative
  • 92.
    The local icecream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days Ice Cream Sales vs Temperature Temperature °C Ice Cream Sales 14.2° $215 16.4° $325 11.9° $185 15.2° $332 18.5° $406 22.1° $522 19.4° $412 25.1° $614 23.4° $544 18.1° $421 22.6° $445 17.2° $408
  • 93.
    Correlation example • Youcan easily see that warmer weather leads to more sales, the relationship is good but not perfect. The correlation is 0.9575
  • 94.
    There has beena heat wave! It gets so hot that people aren't going near the shop, and sales start dropping. • The correlation calculation only works well for relationships that follow a straight line. The calculated value of correlation is 0. But we can see the data follows a nice curve that reaches a peak around 25° C. But the correlation calculation is not "smart" enough to see this • If you make a Scatter Plot, and look at it, you may see more than the correlation value says. • Make your own scatterplot • http://www.alcula.com/calculat ors/statistics/scatter-plot/
  • 95.
    Random Variation • Randomvariability refers to uncertainty in events pg76 • Random Variability-Variability of a process (which is operating within its natural limits) caused by many irregular and erratic (and individually unimportant) fluctuations or chance factors that (in practical terms) cannot be anticipated, detected, identified, or eliminated. • Research attempts to identify systematic relationships between variables ( reducing random variability)
  • 96.
    Dispersion Sum ofSquares In statistics, statistical dispersion (also called statistical variability or variation) is variability or spread in a variable • Subjects Score X X 2 x X2 • 1 0 0 -5 25 • 2 1 1 -4 16 • 3 2 4 -3 9 • 4 4 16 -1 1 • 5 5 25 0 0 • 6 6 36 1 1 • 7 7 49 2 4 • 8 8 64 3 9 • 9 8 64 3 9 S= SS = ? • 10 9 81 4 16 N-1 • N=10 T=50 ∑X2= 340 = 0 ∑ = =90
  • 97.
    Experimental vs NonexperimentalMethods • Nonexperimental methods relationships are studied by observations or by measuring the variable of interest directly (recording responses to questions, examining collected data (much of the data is correlational-e.g. students who work longer hours have lower GPAs Variables are measured but not manipulated) • Experimental method involves direct manipulation and control of variables. The two variables do not just vary together but one variable is introduced to determine how if affects the second variable pg78
  • 98.
    Nonexperimental Method • Twolimitations of Nonexperimental method • 1) We are usually measuring covariation (correlation) which means it is difficult to determine the direction of cause and effect (Negative correlation between anxiety and exercise -does anxiety reduce exercise or does exercise reduce anxiety? If exercise reduces anxiety than starting an exercise program would be a good way to reduce anxiety but if anxiety causes people to stop exercising then forcing someone to exercise may not reduce their anxiety) • 2)We have the problem of a third variable (suppressor variable)pg 78-80 (in the example of anxiety and exercise a third variable such as higher income may lead to both the lowering of anxiety and increase in exercise) Industrialization birth rate increase in stork population Class exercise – Interpret the correlation between shy sons and talkative mothers (r=Positive correlation-Talkative mothers have shy sons)
  • 99.
    Third Variable (suppressor)Problem •Direction of cause and effect not always crucial. If you are interested in making predictions while unable to manipulate variables it is still valuable (e.g. Astronomy) • Example pg 79- Two causal patterns are possible in the correlation of Similarity: Liking • 1) Similarity causes people to like each other • 2) Liking causes people to become more similar • However when there is a 3rd variable that is undesirable because it influences the relationship between the variables that an experimenter is examining (extraneous variable) and interpretation of the relationship is unclear (example of research on wine drinking and heart protection)
  • 100.
    Confounding variables andCorrelation • One limitation of nonexperimental methods is that measures are indirect (and correlational) making it difficult to determine the direction of cause and effect pg80 (A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship) • Most common measure of correlation is the Pearson Product moment correlation coefficient (r) http://www.alcula.com/calculators/statistics/correlation-coefficient/ • r= SP SP=∑ XY- (∑X)(∑Y) SSxSSy N X= 2,4,4,5,7,8 =30 SP 2x5,4x9,4x9,5x11,7x15,8x17=378 Y=5,9,9,11,15,17=66 SP= 378-(30)(66) =48 r= 48_ =1.00 6 (24)(96)
  • 101.
    Confounding Variables • Confoundingvariable-is an extraneous variable(uncontrolled) in a statistical model that correlates (directly or inversely) with both variables being studied pg80 (A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship) • If you eliminate the confounding variable you eliminate alternative or competing explanations
  • 102.
  • 103.
    Correlation and Prediction •Correlation refers to the degree of relationship between two variables • Regression-(Multiple) regression is a statistical tool used to derive the value of a criterion from several other independent, or predictor, variables. It is the simultaneous combination of multiple factors to assess how and to what extent they affect a certain outcome (y=X1+X2+X3. . . ETC.) • “The terms correlation, regression and predication are so closely related in statistics that they are often used interchangeably”- J.Roscoe • Construct regression model of predicting student grades with student grades as the dependent variable(y)
  • 104.
    Latitude is significantlyassociated with the prevalence of multiple sclerosis: a meta-analysis • Background There is a striking latitudinal gradient in multiple sclerosis (MS) prevalence, but exceptions in Mediterranean Europe and northern Scandinavia, and some systematic reviews, have suggested that the gradient may be an artefact. The authors sought to evaluate the association between MS prevalence and latitude by meta-regression • Epidemiologic studies have shown a positive correlation of multiple sclerosis (MS) prevalence with latitude. However, there has not been a causal association found • In statistics, a meta-analysis refers to methods that focus on contrasting and combining results from different studies, in the hope of identifying patterns among study results
  • 105.
    Vitamin D andits immunoregulatory role in multiple sclerosis-Niino M,Drugs Today (Barc). 2010 Apr • Mapping the distribution of multiple sclerosis (MS) reveals a high prevalence of the disease in high-latitude areas, suggesting a positive relationship between vitamin D and MS. Vitamin D is known to play an important role in bone and mineral homeostasis. It has recently been reported that several types of immune cells express vitamin D receptors and that vitamin D has strong immune-modulating effects. Vitamin D and its analogues inhibited experimental autoimmune encephalomyelitis (EAE, an animal model of MS) and there have been reports of small clinical trials on the treatment of MS with vitamin D. • Furthermore, there have been discussions on the association between vitamin D levels and MS and about the genetic risk of vitamin D receptor (VDR) gene polymorphisms in MS. The current review discusses the immunological functions of vitamin D, the association between vitamin D and MS and expectations regarding the role of vitamin D in future treatments of MS
  • 106.
    Sunlight and vitaminD for bone health and prevention of autoimmune diseases, cancers, and cardiovascular disease-Michael F Holick, Am J Clin Nutr 2004 • Vitamin D is taken for granted and is assumed to be plentiful in a healthy diet. Unfortunately, very few foods naturally contain vitamin D, and only a few foods are fortified with vitamin D. This is the reason why vitamin D deficiency has become epidemic for all age groups in the United States and Europe. Vitamin D deficiency not only causes metabolic bone disease among children and adults but also may increase the risk of many common chronic diseases. • Solar ultraviolet B photons are absorbed by 7-dehydrocholesterol in the skin, leading to its transformation to previtamin D3, which is rapidly converted to vitamin D3 • Once formed, vitaminD3 is metabolized in the liver to 25-hydroxyvitamin D3 and then in the kidney to its biologically active form, 1,25- dihydroxyvitamin D3. Vitamin D deficiency is an unrecognized epidemic among both children and adults in the United States. • Although chronic excessive exposure to sunlight increases the risk of nonmelanoma skin cancer, the avoidance of all direct sun exposure increases the risk of vitamin D deficiency, which can have serious consequences.
  • 107.
    Vitamin D andmultiple sclerosis Hayes CE et al. Proc Soc Exp Biol Med. 1997 Oct;216(1):21-7 • This theory can explain the striking geographic distribution of MS, which is nearly zero in equatorial regions and increases dramatically with latitude in both hemispheres. It can also explain two peculiar geographic anomalies, one in Switzerland with high MS rates at low altitudes and low MS rates at high altitudes, and one in Norway with a high MS prevalence inland and a lower MS prevalence along the coast. • Ultraviolet (UV) light intensity is higher at high altitudes, resulting in a greater vitamin D3 synthetic rate, thereby accounting for low MS rates at higher altitudes. On the Norwegian coast, fish is consumed at high rates and fish oils are rich in vitamin D3.
  • 108.
    Experimental Method • Theexperimental method reduces ambiguity by manipulating one variable and measuring the other • Example in Exercise and Anxiety-One group exercises daily for a week and another group does not exercise (Experimental vs Control group), Anxiety would be measured (discuss limits of this design) pg81 • Experimental method attempts to eliminate the influence of potentially confounding variables by controlling all aspects of the experiment except the manipulated variable which is held constant and ensuring that any variable that is not held constant are variables whose effects are random (random variables) give example
  • 109.
    Randomization • The numberof potential confounding variables is infinite but the experimental method attempts to deal with this problem through randomization which ensures that the extraneous confounding variable is as likely to affect one group as it is the other. Any variable that cannot be held constant can be controlled by randomization pg82 • Example If experiment is conducted over several days the researcher can use a random order of scheduling the sequence of the various experimental conditions (or can use a cross over) so that one group is not consistently studied in the morning or the afternoon
  • 110.
    Random assignment• Thething that makes random assignment so powerful is that greatly decreases systematic error – error that varies with the independent variable • Extraneous variables that vary with the levels of the independent variable are the most dangerous type in terms of challenging the validity of experimental results. These types of extraneous variables have a special name, confounding variables. For example, instead of randomly assigning students, the instructor may test the new strategy in the gifted classroom and test the control strategy in a regular class. Clearly, ability would most likely vary with the levels of the independent variable. In this case pre-knowledge would become a confounding extraneous variable
  • 111.
    Independent and DependentVariables • In research the variables are believed to have a cause and effect relationship so that one variable is considered the cause (independent) while the other variable is considered the effect (dependent variable) pg83 • The independent variable is manipulated while the dependent variable is measured • The independent variable is manipulated by the experimenter and the subject has no control over it (what the subject does is dependent on the variable manipulated by the experimenter) What are the independent & dependent variables in the class article? What are the operational definitions of terms in the study?
  • 112.
    Internal and ExternalValidity • Validity discusses to what extent are you measuring what you claim to be measuring • Internal validity is a property of scientific studies which reflects the extent to which a causal conclusion based on a study is warranted, and requires three elements pg85 • Temporal precedence-The causal variable (independent) is manipulated and the effect is observed/measured on the dependent variable • Covariation-There must be some covariation between the two variables which is shown when subjects show some effect different than the control conditions • Alternative explanations are eliminated (which means that confounding variables are eliminated or controlled)
  • 113.
    • External validityrefers to what extent the results can be generalized aka Generalizability • Can the results of a study be replicated with other operational definitions, different subjects, different settings • Researchers most interested in internal validity, establishing a relationship between two variables, may more likely conduct the study in a lab setting with a restricted sample while a researcher more interested in external validity might conduct a nonexperimental design with a more diverse sample External Validity
  • 114.
    Laboratory vs FieldExperiments • Lab experiments require a high degree of control but the setting may be too artificial and may limit the answering of some questions or the generality of results • In Field Experiments the independent variable is manipulated in a natural setting (see study pg87 top) confederate coughs or not on passerbys who are then asked to rate their perceived risk of contracting a serious disease or having a heart attack) • While it is more difficult to eliminate extraneous and confounding variables in field studies there is less danger of artificiality limiting the conclusions drawn from the study
  • 115.
    Ethical and PracticalConsiderations • In certain cases experimentation is unethical or impractical (e.g. child rearing practices) and variables are observed and measured as they occur • When certain social variables are studied people are frequently categorized into groups based on their experience (example of studying corporal punishment groups were formed by who was spanked and who was not as a child-an ex post facto design (after the fact) Since no random assignment was made this would not be an experimental design pg88
  • 116.
    Variables and Describingand Predicting Behavior • Subject variables are characteristics of the subjects such as age, gender, ethnic group (categorical) and are nonexperimental by nature • Since a major goal is to describe behavior, studies can be conducted with simple observations and manipulations (examples of Piaget and Buss’ study(2007) describing the reasons people reported having sex) pg88 • Multiple methods-Since no study is a perfect test of a hypothesis , multiple studies using multiple methods with similar conclusions increase our confidence in the findings pg89
  • 117.
    Statistical Procedures inMeasurement • Good research is inevitably dependent on measurement • Measurement devices or tests have at least three essential attributes • Standardization-test administered to well-defined group and their performance represents the norm (norm group) (standardization often includes the use of standard scores z scores T scores etc. discussed in a later section) • Validity-A test is valid when it measures what it is intended to measure • Reliability-refers to the test’s precision in measuring
  • 118.
    The problem ofstandardization-Diagnostic CT scans: assessment of patient, physician, and radiologist awareness of radiation dose and possible risks-Radiology. 2004 May;231(2):393-8. Epub 2004 Mar 18 Lee,CL et al. • PURPOSE: To determine the awareness level concerning radiation dose and possible risks associated with computed tomographic (CT) scans among patients, emergency department (ED) physicians, and radiologists. • MATERIALS AND METHODS: • Adult patients seen in the ED of a U.S. academic medical center during a 2-week period with mild to moderate abdominopelvic or flank pain and who underwent CT were surveyed after acquisition of the CT scan. Patients were asked whether or not they were informed about the risks, benefits, and radiation dose of the CT scan and if they believed that the scan increased their lifetime cancer risk. Patients were also asked to estimate the radiation dose for the CT scan compared with that for one chest radiograph. ED physicians who requested CT scans and radiologists who reviewed the CT scans were surveyed with similar questions and an additional question regarding the number of years in practice. The chi(2) test of independence was used to compare the three respondent groups regarding perceived increased cancer risk from one abdominopelvic CT scan. • RESULTS: • Seven percent (five of 76) of patients reported that they were told about risks and benefits of their CT scan, while 22% (10 of 45) of ED physicians reported that they had provided such information. Forty- seven percent (18 of 38) of radiologists believed that there was increased cancer risk, whereas only 9% (four of 45) of ED physicians and 3% (two of 76) of patients believed that there was increased risk (chi(2)(2) = 41.45, P <.001). All patients and most ED physicians and radiologists were unable to accurately estimate the dose for one CT scan compared with that for one chest radiograph. • CONCLUSION: • Patients are not given information about the risks, benefits, and radiation dose for a CT scan. Patients, ED physicians, and radiologists alike are unable to provide accurate estimates of CT doses regardless of their experience level
  • 119.
    Radiation Dose AssociatedWith Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer Rebecca Smith-Bindman Arch Intern Med. 2009;169(22):2078-2086 • Background Use of computed tomography (CT) for diagnostic evaluation has increased dramatically over the past 2 decades. Even though CT is associated with substantially higher radiation exposure than conventional radiography, typical doses are not known. We sought to estimate the radiation dose associated with common CT studies in clinical practice and quantify the potential cancer risk associated with these examinations. • Methods We conducted a retrospective cross-sectional study describing radiation dose associated with the 11 most common types of diagnostic CT studies performed on 1119 consecutive adult patients at 4 San Francisco Bay Area institutions in California between January 1 and May 30, 2008. We estimated lifetime attributable risks of cancer by study type from these measured doses. • Results Radiation doses varied significantly between the different types of CT studies. The overall median effective doses ranged from 2 millisieverts (mSv) for a routine head CT scan to 31 mSv for a multiphase abdomen and pelvis CT scan. Within each type of CT study, effective dose varied significantly within and across institutions, with a mean 13-fold variation between the highest and lowest dose for each study type. The estimated number of CT scans that will lead to the development of a cancer varied widely depending on the specific type of CT examination and the patient's age and sex. An estimated 1 in 270 women who underwent CT coronary angiography at age 40 years will develop cancer from that CT scan (1 in 600 men), compared with an estimated 1 in 8100 women who had a routine head CT scan at the same age (1 in 11 080 men). For 20-year-old patients, the risks were approximately doubled, and for 60-year-old patients, they were approximately 50% lower. • Conclusion Radiation doses from commonly performed diagnostic CT examinations are higher and more variable than generally quoted, highlighting the need for greater standardization across institutions.
  • 120.
    Measurement Concepts Chp5 • Reliability refers to the consistency, precision or stability of a measure of behavior pg96 Are the results the same or very similar each time you measure a variable? • Measures that change or fluctuate are not reliable (assuming change is not due to the variable changing) • Any measure has two parts 1) true score- real value of the variable and 2) measurement error-is shown by the greater variability • Researchers cannot use unreliable measures (Duh!) • Reliability is increased when we increase the number of items in our measure, survey or test
  • 121.
    Measuring Reliability • Wecan measure reliability using the Pearson product moment correlation coefficient pg98 • To calculate reliability we must have at least two scores on the measure across individuals. If the measure is reliable the two scores should be similar for each of the individuals studied (high positive correlation For most measures coefficient should be at least .80) pg 98 • Types of Reliability • 1) Test-Retest –Measures the same individuals at least for two points in time then calculate the Pearson product moment r between the scores. Test-Retest reliability is sometimes called a coefficient of stability in that it measures how stable is the trait being measured (Discuss some threats to validity for this measure) This is not a good measurement for traits that are considered to be in a state of flux or events occurring between the two administrations of the test
  • 122.
    Measuring Reliability • 2)Equivalent Form-Can avoid problems associated with Test-Retest by giving equivalent forms of the same test to the same set of people, calculating the correlation between the two scores. You can administer the two tests close in time (something you cannot do with Test-Retest). • However to the extent that the two forms are not totally equivalent a new source of error is introduced. Equivalent forms usually yield lower estimates of reliability than Test-Retest (why?) see next slide with two forms of Rey Complex Figure
  • 123.
  • 124.
    Measuring Reliability • Split-HalfReliability-Test is administered once, then the test is split in half, scored separately and a Pearson r is calculated for each score • Split-Half-correlation between the first and second half of the measurement • Odd-Even correlation between the even items & odd items of a measurement • In either case only one administration is required and the coefficient is determined by the internal components of the test (aka internal consistency reliability) • Split-half not meaningful in speed tests (in which most items are not difficult and score depends on how many items answered correctly e.g. algebra test) Coefficient of reliability is inflated* • Item-Total correlations-Look at the correlation between each item score with the total score, based on all items (also measures internal consistency) • Cronbach’s alpha -is a coefficient of internal consistency Averages split- half coefficients. a function of the number of test items and the average inter-correlation among the items pg99-100
  • 125.
    Interrater Reliability • Inresearch in which raters observe behaviors and make ratings or judgments (and then those judgments are compared and agree determines interrater reliability) • Bandura (1961) conducted a study to investigate if social behaviors (i.e. aggression) can be acquired by imitation 36 boys and 36 girls were tested from the Stanford University Nursery School aged between 3 to 6 years old. The role models were one male adult and one female adult • Under controlled conditions, Bandura arranged for 24 boys and girls to watch a male or female model behaving aggressively towards a toy called a 'Bobo doll'. The adults attacked the Bobo doll in a distinctive manner - they used a hammer in some cases, and in others threw the doll in the air and shouted "Pow, Boom“. Another 24 children were exposed to a non- aggressive model and the final 24 child were used as a control group and not exposed to any model at all. • To test the inter-rater reliability of the observers, 51 of the children were rated by two observers independently and their ratings compared. These ratings showed a very high reliability correlation (r = 0.89), which suggested that the observers had good agreement about the behavior of
  • 126.
    Construct Validity ofMeasures pg101 • Construct Validity is concerned with whether our methods of studying variables is accurate (is our operational definition valid?) also see pg 90 Does our method actually measure the construct it was intended to measure
  • 127.
    Measures of (construct)Validity/Valid=True • Construct Validity • Refers to the accuracy of our measurements and operational definition- Indicators of Construct Validity –Is our method of measuring a variable accurate • Face Validity-The item appears to accurately measure the variable defined. Appearance is not sufficient to conclude that a measure is accurate. Some measures, such as surveys in popular magazines have questions that may look reasonable (have face validity) but tell you very little-Cosmopolitan Surveys 1) What Guys Secretly Think of Your Hair & Makeup: The truth revealed! 2) 20 Dresses He Will Love 3) What He Thinks When He Walks Through Your Door (4) 7 Facebook Habits that Guys Hate 5) 78 Ways to Turn Him On 6) The Secret to Getting Any Guy (7)How to be a Total Man Magnet (8) Sexy Summer Hair Ideas (9)Meet a New Guy by Summer! (10)How to Decode His Body Language http://www.cosmopolitan.co.uk/quizzes/how-hot-headed-are-you-quiz Little if any empirical evidence exists to support the conclusions in these articles Content Validity- How well does the content of a test sample the situations about which conclusions are drawn. Requires some expertise to define a “universe of interest”, careful drawing of a sample of ideas from this universe and the preparation of test items that match these ideas-Compare the content of the measure with the universe of content that defines that construct pg103 (For example, the content of the SAT Subject Tests™ is evaluated by committees made up of experts who ensure that each test covers content that matches all relevant subject matter in its academic discipline) Both face validity and content validity focus on determining if the content of a
  • 128.
    Validity continued• ContentValidity-Statistical methods may be applied to help determine content validity. A test constructor may perform a correlation between the score on each item and the score on the total test. Test items that are not consistent with the total are either revised or eliminated • Predictive Validity (a type of Criterion validity)- A measure is used to predict performance so that one measure occurs earlier than another Predictive validity is one type of Criterion Validity (LSAT and performance in Law School) • Concurrent Validity applies to validation studies in which the two measures are administered at approximately the same time (for example, an employment test may be administered to a group of workers and then the test scores can be correlated with the ratings of the workers' supervisors taken on the same day or in the same week. The resulting correlation would be a concurrent validity coefficient) pg104 • Concurrent validity and predictive validity are two types of criterion-related validity in which scores are correlated or measured against an external criterion . The difference between concurrent validity and predictive validity rests solely on the time at which the two measures are administered.
  • 129.
    Validity continued • ConvergentValidity-Defines how well one set of scores on a measure are related to another set of scores measuring the same or similar concepts • measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other • Discriminant Validity-measures of constructs that theoretically should not be related to each other are, in fact, observed to not be related to each other pg104 (compare Convergent and Discriminant Validity to differential diagnosis) • Convergent and discriminant validity are both considered subcategories or subtypes of construct validity - neither one alone is sufficient for establishing construct validity • Imagine you are under the assumption that those that would buy your product again are satisfied, as that would be what is expected. Testing for convergent validity in a survey may look like this: • Question 1: Would you buy product X again if given the chance? • Question 2: How satisfied are you with product X? • If they say yes to the first question, but they do not score the product very highly in the second question, the question may have failed the validity test
  • 130.
    Validity continued• Divergentvalidity is designed to see if you get the expected opposite result, because that should also help imply that the question is answering in the way you wanted it to answer. For example: • Question 1: Do you wish you did not own product X? • Question 2: Would you buy product X again if given the chance? • If they answered yes for the first question, and yes for the second question, it would imply that the question was too confusing, because you did not receive the opposite response you expected. This would be divergent validity • A major impetus to the study of validity was provided a half century ago by Campbell & Fiske (1959), who introduced the multitrait-multimethod (MTMM) matrix as a means for construct validation. The MTMM method can be used when multiple traits are examined simultaneously and each of them is assessed by a given set of measures or measurement methods (e.g., Eid, 2000; Marsh & Hocevar, 1983). As shown initially by Campbell and Fiske, and further elaborated by subsequent authors, two types of validity coefficients are of special interest when the MTMM matrix is utilized in the validation process— convergent validity and discriminant validity coefficients. • Reactivity-A measure is reactive if awareness of being measured changes an individual's behavior This is what threat to validity? • History? Maturation? Testing? Selection (of subjects)? Regression?
  • 131.
    Relationship between Reliability andValidity • Validity is the extent to which a test measures what it is supposed to measure while reliability is how well it measures the variable(s) • You can have reliability without validity but you cannot have validity without reliability
  • 132.
    Association of Facilitiesof Medicine of Canada AFMC • Validity of concepts such as illness or disease • Cultural conventions affect where the boundary between disease and non-disease is placed: menopause may be considered a health issue in North America, but symptoms are far less commonly reported in Japan. • Improvements in health have not reduced the demands on doctors. Instead, doctors are called on to broaden the scope of what they treat. Conditions, previously not regarded as medical problems, such as hyperactivity in children, infertility in young couples, weight gain in middle- aged adults, or the various natural effects of aging, now commonly lead patients to consult their doctor; the list is likely to expand.
  • 133.
    Validity of DiagnosticLabels• ?Non-Disease • In 2002, the British Medical Journal stimulated a debate over the appropriate expectations to place on doctors and on how to define the limits of medicine. Richard Smith, editor of the Journal, surveyed readers to collect examples of non-diseases, and found almost two hundred. • He defined non-disease in terms of "a human process or problem that some have defined as a medical condition but where people may have better outcomes if the problem or process was not defined in that way." Examples include burnout, chemical sensitivity, genetic deficiencies, senility, loneliness, bags under the eyes, work problems, baldness, freckles, and jet lag. • Smith’s purpose was to emphasize that disease is a fluid concept with no clear boundaries. He noted various dangers in being over-inclusive in defining disease: • when people are diagnosed with a disease and become patients they could be denied insurance, lose their job, have their body invaded in the name of therapy, or be otherwise stigmatised. • The debate is covered in the British Medical Journal, April 13, 2002; vol. 324: pages 859-866 and 883-907.
  • 134.
    Measures of Validity(continued) • Predictive Validity-extent to which a score on a scale or test predicts scores on some criterion measure • Predictive Validity Concerns tests that are intended to predict future performance (GRE, LSAT). The construct validity of the measure is shown if it predicts future behavior
  • 135.
    False Positives-False Negatives •Biomedical Research Imaging Center at the University of North Carolina at Chapel Hill School of Medicine-Etta Pisano • American Cancer Society issued new guidelines that recommend an annual MRI screen in addition to an annual mammography for women at high risk of breast cancer. • But, because the false-positive rate of MRIs was relatively high -- about 11 percent in the new study -- the authors don't recommend MRI as a screening tool for the general population. • National Cancer Institute-Even though breast cancer is the most common noncutaneous cancer in women, fewer than 5 per 1,000 women actually have the disease when they are screened. Therefore, even with a specificity of 90%, most abnormal mammograms are false-positives
  • 136.
    Effectiveness of PositronEmission Tomography for the Detection of Melanoma Metastases ANNALS OF SURGERY Vol. 227, No. 5, 764-771 1998 Holder,W et. al • The purpose of this study was to determine the sensitivity, specificity, and clinical utility of 18F 2-fluoro-2-deoxy-D-glucose (FDG) total body positron emission tomography (PET) scanning for the detection of metastases in patients with malignant melanoma (melanoma causes the majority (75%) of deaths related to skin cancer). • Introduction-Recent preliminary reports suggest that PET using FDG may be more sensitive and specific for detection of metastatic melanoma than standard radiologic imaging studies using computed tomography (CT). PET technology is showing utility in the detection of metastatic tumors from multiple primary sites including breast, lung, lymphoma, and melanoma. However, little information is available concerning the general utility, sensitivity, and specificity of PET scanning of patients with metastatic melanoma. • Methods One hundred three PET scans done on 76 nonrandomized patients having AJCC (American Joint Committee on Cancer) stage II to IV melanoma were prospectively evaluated. Patients were derived from two groups. Group 1 (63 patients) had PET, CT (chest and abdomen), and magnetic resonance imaging (MRI; brain) scans as a part of staging requirements for immunotherapy protocols. Group 2 (13 nonprotocol patients) had PET, CT, and MRI scans as in group 1, but for clinical evaluation only. PET scans were done using 12 to 20 mCi of FDG given intravenously. Results of PET scans were compared to CT scans and biopsy or cytology results.
  • 137.
    Effectiveness of PETtumor detection • Malignant tumors generally have greater rates of glucose utilization and overall metabolism than normal tissues. FDG is a glucose analogue that is taken up by rapidly dividing cells. • Most melanomas are rapid users of glucose; in fact, melanoma cells in vitro demonstrate a higher FDG uptake than any other tumor type. • PET scanning uses tracers that emit positrons (positively charged electrons) that are very short-lived. They are produced in medical cyclotrons or accelerators to be used quickly after preparation. The half-life of 18F is 109 minutes. • Positrons rapidly combine with negative electrons and are annihilated. This process produces a pair of 511-KeV photons emitted 1800 to one another that are then detected by the PET scanner. A computer then processes the images so that they can
  • 138.
    PET False PositivesFalse Negatives • False negatives occur in 1) patients who have hyperglycemia 2) Tumors that are slow-growing or have a large necrotic component may have decreased FDG uptake. • False positives are caused by 1) urinary excretion of the isotope Administered radioiodine is excreted mainly by the urinary system, and so all dilations, diverticuli and fistulae of the kidney, ureter and bladder may produce radioiodine retention.(Shapiro, Rufini et al. 2000) 2) Patients who are unusually muscular or have an increased resting muscle tone take up FDG at a much higher rate than persons with relaxed musculature. • Back to the study- The purpose of this study was to determine prospectively the sensitivity, specificity, and clinical utility of FDG total body PET scanning for the detection of metastases in patients with malignant melanoma by comparing PET to double-contrast CT scans and histologically or cytologically correlating these findings.
  • 139.
    Effectiveness of PETin Melanoma Detection • Methods (continued) • Sensitivity was defined as the proportion of patients with metastatic melanoma who had a positive PET scan. • Specificity was defined as the proportion of patients who did not have metastatic melanoma who had a negative PET scan • FDG was synthesized using the Siemens RDS negative ion cyclotron and CPCU automated chemistry module. 18 Fluorine as fluoride was produced using a proton- neutron reaction on 95% enriched'8 oxygen water. 18 Fluorine-FDG was synthesized in the CPCU using the modified Hamacher synthesis (mannose triflate/18F-fluoride reaction). The product was delivered pure, sterile, and in an injectable form. Each lot of 18Fluorine-FDG was analyzed to confirm radionuclide, radiochemical, and chemical purity as well as sterility and pyrogenicity. The product conformed with United States Pharmacopeia monograph standards. Huh?
  • 140.
    • Results • Theaccuracy of CT scanning for melanoma lung metastases was equivalent to that of PET scanning. However, PET scanning was superior to CT scanning in identifying melanoma metastases to regional and mediastinal lymph nodes, liver, and soft tissues. (The mediastinum is the cavity that separates the lungs from the rest of the chest. It contains the heart, esophagus, trachea, thymus, and aorta)
  • 141.
    Results (continued) • PETCT • Total scans 103 92 • Evaluable scans 100 92 • True-positive scans 49 26 • False-positive scans 8 7 • True-negative scans 40 38 • False-negative scans 3 21
  • 142.
    Discussion • CT scanningis widely used for the detection of metastases in a variety of malignant neoplasms, including melanoma. The primary value of CT scanning is the clear delineation of anatomic detail. A particular problem with CT scanning is that small lymph nodes or small metastases may not be detectable or may appear to be of normal size and configuration, while enlarged nodes and other masses may be due to inflammation and nonmalignant processes. These findings contribute to both the false-positive and false negative rates reported for CT scans. CT scanning for detection of both primary and metastatic disease in the lung is generally very good for lesions in the lung parenchyma. • PET scanning as currently done does not reveal the anatomic detail of CT scanning. However, imaging of even extreme anatomic detail often cannot discern benign from malignant processes particularly with smaller 1cm lesions. The value of PET scanning lies in the visualization of high metabolic activity of rapidly growing tumors such as melanoma With close clinical correlation and tissue confirmation, PET scanning is an extremely useful tool to evaluate high-risk melanoma patients for the development of metastases • Conclusion PET is superior to CT in detecting melanoma metastases and has a role as a primary strategy in the staging of melanoma.
  • 143.
    Accuracy and reliabilityof forensic latent fingerprint decisions • The criminal justice system relies on the skill of latent print examiners as expert witnesses. Currently, there is no generally accepted objective measure to assess the skill of latent print examiners • The interpretation of forensic fingerprint evidence relies on the expertise of latent print examiners. The National Research Council of the National Academies and the legal and forensic sciences communities have called for research to measure the accuracy and reliability of latent print examiners’ decisions. Here, we report on the first large-scale study of the accuracy and reliability of latent print examiners’ decisions, in which 169 latent print examiners each compared approximately 100 pairs of latent and exemplar fingerprints from a pool of 744 pairs. • Latent prints (“latents”) are friction ridge impressions (fingerprints, palmprints, or footprints) left unintentionally on items such as those found at crime scenes Exemplar prints (“exemplars”), generally of higher quality, are collected under controlled conditions from a known subject using ink on paper or digitally with a livescan device . Latent print examiners compare latents to exemplars, using their expertise rather than a quantitative standard to determine if the information content is sufficient to make a decision. Proceedings of the National Academy of Sciences of the United States of America PNAS Ulery,B et al MARCH 2011
  • 144.
    Accuracy and reliabilityof forensic latent fingerprint decisions • Latent print examination can be complex because latents are often small, unclear, distorted, smudged, or contain few features; can overlap with other prints or appear on complex backgrounds; and can contain artifacts from the collection process. Because of this complexity, experts must be trained in working with the various difficult attributes of latents • Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%. Independent examination of the same comparisons by different participants (analogous to blind verification) was found to detect all false positive errors and the majority of false negative errors in this study. Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion.
  • 145.
    Types of VariablesDiscrete vs Continuous • Discrete vs. Continuous • A discrete variable is one with a well defined finite set of possible values, called states. Examples are: the number of dimes in a purse, a statement which is either “true” or “false”, which party will win the election, the country of origin, voltage output of a digital device, and the place a roulette wheel stops. • A continuous variable is one which can take on a value between any other two values, such as: indoor temperature, time spent waiting, water consumed, color wavelength, and direction of travel. A discrete variable corresponds to a digital quantity, while a continuous variable corresponds to an analog quantity
  • 146.
    Variables and MeasurementScales • We want to determine if there is a relationship between our independent variable (chosen and/or manipulated by the Experimenter) and the dependent variable (measuring some aspect or behavior of our subject(s) • Four Kinds of Measurement Scales • Nominal scales- When measuring using a nominal scale, one simply names or categorizes responses (nominal variables are categorical). Gender, handedness, favorite color, and religion are examples of variables measured on a nominal scale. The essential point about nominal scales is that they do not imply any ordering among the responses. For example, when classifying people according to their favorite color, there is no sense in which green is placed "ahead of" blue. Responses are merely categorized. Nominal scales embody the lowest level of measurement. In an experiment the independent variable is often a nominal or categorical variable pg106 (example on pg 107 Group 1 participated in meditation Group 2 did not All subjects underwent MRI. The independent variable was participation/no participation, a nominal (categorical) variable
  • 147.
    Variables and MeasurementScales• Ordinal Scales- allow us to rank order the levels of a variable (category) being studied. However nothing is specified about the magnitude of the interval between the two measures so that in a rank order no particular value is attached to the intervals between numbers (horse race; First, Second, Third) • Ordinal scales fail to capture important information that will be present in the other scales we examine. In particular, the difference between two levels of an ordinal scale cannot be assumed to be the same as the difference between two other levels. In a satisfaction scale ranking a customer’s satisfaction for a product, the difference between the responses "very dissatisfied" and "somewhat dissatisfied" is probably not equivalent to the difference between "somewhat dissatisfied" and "somewhat satisfied.“ • Example pg107 Movie rating system from one to four checks
  • 148.
    Variables and MeasurementScales • Interval scales are numerical scales in which intervals have the same interpretation throughout in that the intervals between the numbers are equal in size. As an example, consider the Fahrenheit scale of temperature. The difference between 30 degrees and 40 degrees represents the same temperature difference as the difference between 80 degrees and 90 degrees. This is because each 10-degree interval has the same physical meaning However there is no absolute zero on the scale (in this case the zero does not indicate an absence of temperature but is only an arbitrary reference point) pg107 • Since an interval scale has no true zero point, it does not make sense to compute ratios of temperatures. For example, there is no sense in which the ratio of 40 to 20 degrees Fahrenheit is the same as the ratio of 100 to 50 degrees; no interesting physical property is preserved across the two ratios-it does not make sense to say that 80 degrees is "twice as hot" as 40 degrees
  • 149.
    Variables and MeasurementScales • Ratio scales- The ratio scale of measurement is the most informative scale. It is an interval scale with the additional property that its zero position indicates the absence of the quantity being measured. Often these include physical measures such as length, weight or time (Since ratios are allowed you can say someone is twice as fast or slow as someone else)pg108 • With interval and ratio scales your can make quantitative distinctions that allow you to talk about amounts of the variable • Since money has a true zero point, it makes sense to say that someone with 50 cents has twice as much money as someone with 25 cents (weight, time, length are also ratio scale measures) • Since many variables in behavioral science are less precise ratio scales are often not achieved. However since statistical tests for Interval and Ratio variables are the same the real question becomes if you can achieve an interval scale of measurement for your study so that you can use (usually) more powerful statistical tests
  • 150.
    Cramped Synchronized GeneralMovements in Preterm Infants as an Early Marker for Cerebral Palsy Ferrari,F Arch Pediatr Adolesc Med. 2002 • Objective To ascertain whether specific abnormalities (ie, cramped synchronized general movements [GMs]) can predict cerebral palsy and the severity of later motor impairment in preterm infants affected by brain lesions. • Design Traditional neurological examination was performed, and GMs were serially videotaped and blindly observed for 84 preterm infants with ultrasound abnormalities from birth until 56 to 60 weeks' postmenstrual age. The developmental course of GM abnormalities was compared with brain ultrasound findings alone and with findings from neurological examination, in relation to the patient's outcome at age 2 to 3 years.
  • 151.
    Cramped Synchronized GeneralMovements in Preterm Infants as an Early Marker for Cerebral Palsy • An early prediction of cerebral palsy will lead to earlier enrollment in rehabilitation programs. Unfortunately, reliable identification of cerebral palsy in very young infants is extremely difficult.10 It is generally reported that cerebral palsy cannot be diagnosed before several months after birth11-15 or even before the age of 2 years.16 • A so-called silent period, lasting 4 to 5 months or more, and a period of uncertainty until the turning point at 8 months of corrected age have also been identified.12-13 The neurological symptoms observed in the first few months after birth in preterm infants who will develop cerebral palsy are neither sensitive nor specific enough to ensure reliable prognoses. • Irritability, abnormal finger posture, spontaneous Babinski reflex,17-18 weakness of the lower limbs,19 transient abnormalityof tone,12-13,20-24 and delay in achieving motor milestones11 are some of the neurological signs that have been described in these high-risk preterm infants
  • 152.
    Early Marker forCerebral Palsy continued • Results Infants with consistent or predominant (33 cases) cramped synchronized GMs developed cerebral palsy. The earlier cramped synchronized GMs were observed, the worse was the neurological outcome. Transient cramped synchronized character GMs (8 cases) were followed by mild cerebral palsy (fidgety movements were absent) or normal development (fidgety movements were present). Consistently normal GMs (13 cases) and poor repertoire GMs (30 cases) either lead to normal outcomes (84%) or cerebral palsy with mild motor impairment (16%). Observation of GMs was 100% sensitive, and the specificity of the cramped synchronized GMs was 92.5% to 100% throughout the age range, which is much higher than the specificity of neurological examination. • Conclusions Consistent and predominant cramped synchronized GMs specifically predict cerebral palsy. The earlier this characteristic appears, the worse is the later impairment
  • 153.
    Observational Methods Chp6 • Observational methods are generally either quantitative (focus on behaviors that can be quantified) or qualitative (focus on people behaving in natural settings-samples usually smaller than for quantitative methods) • Naturalistic observation-individuals observed in their natural environment=field work/field observation-researchers do not attempt to influence events pg116 • Researcher interested in first, describing people, setting and events and second, analyze what was observed Naturalistic observation=qualitative
  • 154.
    Observational Methods • Researcherdecides if will be participant or nonparticipant observer-Field research often very time consuming and inconvenient also often in unfamiliar environments • Jane Goodall Instead of numbering the chimpanzees she observed, she gave them names Claiming to see individuality and emotion in chimpanzees, she was accused of anthropomorphism • Hunter Thompson and the Hell’s Angels- became converted to their motorcycle mystique, and was so intrigued, as he puts it, that 'I was no longer sure whether I was doing research on the Hell's Angels or being slowly absorbed by them’ he remained close with the Angels for a year, but ultimately the relationship waned. It ended for good after several members of the gang gave him a savage beating or "stomping" over a remark made by Thompson to an Angel named Junkie George, who was beating his wife. Thompson said: "Only a punk beats his wife." The beating stopped only when senior members of the club ordered it
  • 155.
    Methodological Issues inObservation • Coding-researcher chooses behavior and describes and measures that behavior with a coding system pg119 In systematic observation usually two or more raters are used to code behaviorpg120 • Sampling-Event recording simply tallies the frequency of a given behavior during the observation period. Interval recording similarly captures frequency, but divides the observation period into segments and counts the number of segments in which the target behavior is displayed, either throughout the interval or at a particular time point in the interval. Duration recording measures the length time a behavior lasts • Functional behavior assessment, an observational strategy, assesses antecedents, frequency, duration, and consequences of the aggressive behavior for the target child and others in the environment to determine the functions that the aggressive behavior serves for the child. In spite of obvious benefits of direct observation, the strategy can be limited by several problems
  • 156.
    Methodological Issues inObservation • Behaviors must be clearly defined, and observers must be trained to fully understand the exact behaviors that are to be captured. Observer bias or the tendency to see what one expects to see is especially troublesome in direct observation of aggression • In a study conducted by Baron (1976) an accomplice failed to move his vehicle for 15 seconds after the traffic signal at preselected intersections turned green. The reactions of passing motorists to this unexpected delay were recorded by two observers seated in a second parked car at the intersection using a tape recorder to determine the frequency, duration and latency of horn honking of motorists (Video recording has become very popular) • Reactivity- the possibility that the presence of the observer will affect behavior can be minimized by concealed observation with small cameras and microphones pg120 What threat to validity does this represent?
  • 157.
    Methodological Issues inObservation • Case study-observational method applied to an individual Presents individual’s history, symptoms, characteristic behavior response to treatment pg121 • Case studies may or may not include naturalistic observation-In Psychology/Psychiatry the case study is usually a description of the patient with an historical account of some event pg121 • Case study often done when individual possesses a rare, unusual or unusual condition especially about some condition involving memory, language, social function • Mania after termination of epilepsy treatment: a case report see file
  • 158.
    Archival Research • Usespreviously compiled information to answer research questions and researcher does not collect original data Use of public records, databases or other written records (e.g. Census Bureau) • Survey Archives-stored surveys from Political surveys from polling organizations, National Science Foundation-Researcher may not be able to afford collecting and tabulating all this data • Two major problems with archival data- May be difficult to obtain desired records It is difficult to be certain of how accurate is the information collected by others pg124
  • 159.
    Survey Research Chp7 • Survey research uses questionnaires and interview to ask people to give information about themselves about attitudes, beliefs, demographic variables (age, gender, income etc.) Assume that people are willing and able to provide truthful and accurate answers pg130 • Survey research can be a good compliment to experimental research • Some researcher ask questions without considering what useful information will be gained by such questions • Response Set-Tendency to respond to all questions from a particular point of view “Faking good”-social desirability leads respondent to answer in most socially acceptable way • If researcher communicates honestly, assures confidentiality and promises feedback participants can be expected to provide honest answers pg131
  • 160.
    Survey Research• Attitudesand Beliefs surveys ask people to evaluate certain issues/situations/people • Consumer Reports We conduct many surveys by selecting a random sample from the approximately 7 million readers who subscribe to Consumer Reports and/or to ConsumerReports.org, who are some of the most consumer-savvy people in the nation. • Some surveys focus on behavior(how many times did you exercise this week?) • Question Wording-Many of the problems in surveys stem from the wording and include 1) use of unfamiliar technical terms 2) vague or imprecise terms 3) ungrammatical sentences 4) run on sentences that overload memory 5) using misleading information • Subtle wording differences can produce great differences in results. “Could,” “should,” and “might” all sound about the same, but may produce a big differences in agreement to a question. • Strong words such as “force” and “prohibit” represent control or action and can bias your results “The government should force you to pay taxes” Different cultural groups may respond differently. One recent study found that while U.S. respondents skip sensitive questions, Asian respondents often discontinue the survey entirely-source qualtrics.com
  • 161.
    Survey Research • Questionsneed to be Simple and easy to understand “And,” “or”, or “but” within a question usually make it overly complex pg132-133 • Avoid 1) double barreled questions-questions that ask two things at once 2) Loaded questions leading people to respond in a certain way “Do you favor eliminating the wasteful excesses in the public school budget”? Do you approve of the President’s oppressive immigration policy? A leading question suggests to the respondent that the researcher expects or desires a certain answer. The respondent should not be able to discern what type of answer the researcher wants to hear 3) Negative Wording- Do you feel the city should not approve the proposed women’s shelter? -Agreeing with the question means disagreement with the proposal and can confuse people 4) Yea-saying and Nay-saying-Response Set- A tendency to agree or disagree with all questions when a respondent notices that they have answered several questions the same way, they assume the next questions could be answered that way too-can reverse wordingpg133 http://www.surveymonkey.com/s.asp?u=952783415975
  • 162.
    Responses to Questions•Closed ended questions-have a limited number of responses, more structured ,easier to code written answers are the same for all respondents (yes-no agree-disagree) Fixed number of response alternatives • Open-Ended questions harder to categorize and code. Frequently the different type of questions give different response patterns and different conclusions pg134-135 • In a poll conducted after the presidential election in 2008, people responded very differently to two versions of this question: “What one issue mattered most to you in deciding how you voted for president?” One was closed-ended and the other open-ended. In the closed-ended version, respondents were provided five options (and could volunteer an option not on the list). When explicitly offered the economy as a response, more than half of respondents (58%) chose this answer; only 35% of those who responded to the open-ended version volunteered the economy. Moreover, among those asked the closed-ended version, fewer than one-in-ten (8%) provided a response other than the five they were read; by contrast fully 43% of those asked the open-ended version provided a response not listed in the closed-ended version of the question. Pew Research Center Researchers will sometimes conduct a pilot study using open-ended questions to discover which answers are most common. They will then develop closed-ended questions that include the most common responses as answer choices
  • 163.
    Responses to Questions •In addition to the number and choice of response options offered, the order of answer categories can influence how people respond to closed-ended questions. Research suggests that in telephone surveys respondents more frequently choose items heard later in a list (a “recency effect”). • in the example discussed above about what issue mattered most in people’s vote (previous slide), the order of the five issues in the closed-ended version of the question was randomized so that no one issue appeared early or late in the list for all respondents. Randomization of response items does not eliminate order effects, but it does ensure that this type of bias is spread randomly • Questions with ordinal response categories – those with an underlying order (e.g., excellent, good, only fair, poor OR very favorable, mostly favorable, mostly unfavorable, very unfavorable) – are generally not randomized because the order of the categories conveys important information to help respondents answer the question. Generally, these types of scales should be presented in order so respondents can easily place their responses along the
  • 164.
    Wording and Orderof Questions• "Thinking of your teachers in high school, would you say that the female teachers were more empathetic with regard to academic and personal problems than the male teachers, or were they less empathetic?" The other group responded to a question with the direction reversed: "Thinking of your teachers in high school, would you say that the male teachers were more empathetic with regard to academic and personal problems than the female teachers, or were they less empathetic?" Responses were measured on a nine-point scale ranging form "less empathetic" (1) to "more empathetic" (9). Not only were the mean ratings statistically different, but when female teachers were the subject, 41 percent of respondents felt that the female teachers were more empathetic than male teachers; when male teachers were the subject, only 9 percent of respondents felt that female teachers were more empathetic than the male teachers. The direction of comparison significantly affected the results obtained when the authors compared soccer with tennis and tennis with soccer on which was the more exciting sport-Wanke, Schwarz and Noelle-Neumann (1995)-authors concluded that respondents generally "focus on the features that characterize the subject of comparison and make less use of the features that characterize the referent of the comparison."
  • 165.
    Wording and Orderof Questions• A researcher wishing to increase the variability and thereby make it harder for statistics to demonstrate significant differences among stimuli (e.g., comparing different brands of tissues) can accomplish this by using scales with too many points. A two-point scale, on the other hand, used with a stimulus that subjects can actually rate on many gradations will result in a very imprecise measurement. This will make it very difficult to find differences among means. For example, will there be a significant difference between the mean ratings for the presidencies of Abraham Lincoln and William Clinton if the scale consists of only two points, "good" and "bad“? • Waddell (1995) suggested that traditional customer satisfaction measurement scales ask the wrong question by focusing on "How am I doing?" rather than "How can I improve?" He claims that consumers usually rate products/services as being better when using performance or satisfaction scales and that these scales often produce high average scores. Neal (1999) posited that satisfaction measures cannot be used to predict loyalty since loyalty is a behavior and satisfaction is an attitude-RATING THE RATING SCALES- H.Friedman Journal of Marketing Management, Vol. 9:3, Winter 1999
  • 166.
    Rating Scales • Ratingscales ask people to provide quantity or “how much” Rating scales provide a set of categories designed to elicit information about a quantitative or a qualitative attribute.pg135 • Simplest form presents people with five or seven response alternatives with he endpoint on the scale labeled to define the extremes • Am I the greatest professor ever? strongly agree __ __ __ __ __ __ __ strongly disagree • Graphic rating scale- requires a mark along a continuous 100 millimeter line that is anchored at either end with descriptors
  • 168.
    Rating Scales • Semanticdifferential scale- respondents rate any concept on a series of bipolar adjectives using a 7 point scale • Almost anything can be measured using this technique- concepts are measured along three basic dimensions 1) evaluation (good-bad) 2) activity (fast-slow) 3) potency (weak-strong) • Non verbal scales for children • Labeling response alternatives Researchers may provide labels to more clearly define the meaning of each alternative-the middle alternative is a neutral point half-way between the endpoints
  • 169.
    Rating Scales • Thereare instances in which you may not want a balanced scale • Example pg137 In comparison with other graduates how would you rate this student’s potential Lower 50% upper 50% upper 25% upper 10% upper 5% _________ _________ _________ _________ ________ • Most of the alternatives ask to rate someone within the upper 25% as students in this group tend to be highly motivated and professors tend to rate them positively • High frequency vs. Low frequency scales –alternatives indicate different frequencies of variable How often do you exercise Less than once a month about once a month once every two weeks once a week ________ _______ ________ _______
  • 170.
    Questionnaires & Surveys •Questionnaires should be professional and neatly typed with clear response alternatives In sequencing the questions it is best to ask the most interesting questions first, questions on a particular topic grouped together and demographic questions presented last pg138 • Administer the questionnaire first to a small group of friends, colleagues for their feedback • Questionnaires are in written form and may be given to groups or individuals while surveys can be written or given as interviews
  • 171.
    Questionnaires & Surveys •Questionnaires given to groups(classes, meetings, job orientation) have the advantage of having ‘captive audiences’ that are likely to complete the questionnaire and the researcher is usually present to answer questions pg139 • Mail questionnaires/surveys- Inexpensive but often with a low return rate due to distractions. Low interest and no one being present to answer questions or provide clarification • Internet questionnaires/surveys-Responses are sent immediately to researcher Problems exist with 1) sampling People interested in the topic can complete the form and polling organizations sample from collected databases-Are the results similar to traditional methods? (2) Do people misrepresent themselves (seems unlikely but no way to know
  • 172.
    Questionnaires & Surveys •Interviews-Because an interview involves interaction between people it is more likely that a person will agree to answer questions versus a mailed interview pg140 • The interviewer can answer questions and provide clarification • Problems with interviewer bias-interviewer may react positively or negatively to answers (inadvertently) or might influence answer due to characteristics (age,sex,race etc.) or bias could lead interviewers to see what they want to see
  • 173.
    Types of Interviews•Face to Face interviews -Expensive and time consuming Interviewer may have to travel to person’s home or person to office-Likely to be used when sample size is small • Telephone interviews- Most large scale surveys are done via telephone which are less expensive than face-o-face interviews and allow data to be collected relatively quickly as many interviewers can work on the same survey at once-In computer assisted telephone interview (CATI) systems the questions appear on the computer screen and the data are entered directly for analysis • Focus group interviews- 6-10 persons together for 2-3 hours usually selected because they share a particular interest or knowledge of a topic Often receive an incentive to compensate for time and traveling. Questions often open-ended and asked of everyone-plus advantage of group interaction. Interviewer must be skilled in dealing with individuals who wish to dominate discussion or hostility between members. Discussions often recorded and later analyzed. Although they provide a great deal of data they are also costly and time consuming pg142
  • 174.
    Surveys to studychanges over time • Surveys usually study one point in time but because some questionnaires are given every year can track changes (also can use a panel study of the same group of people over time)
  • 175.
    Autism rating items •Before age 3, did the child ever imitate another person? • 1. Yes, waved bye-bye • 2. Yes, played pat-a-cake • 3. Yes, other ( ___________________________ ) • 4. Two or more of above (which? 1____2____3____ ) • 5. No, or not sure_______________________________ • Age 2-4) Does child hold his hands in strange postures? • 1. Yes, sometimes or often 2. No________________ • (Age 3-5) Does child sometimes line things up in precise evenly-spaced rows and insist they not be disturbed? • 1. No 2. Yes 3. Not sure
  • 176.
    CARS Childhood AutismRating Scale (sample item) 0 No evidence of difficulty or abnormality in relating to people. The child's behavior is appropriate for his or her age. Some shyness, fussiness, or annoyance at being told what to do may be observed, but not to an atypical degree. 1.5 (if between these points) 2 Mildly abnormal relationships. The child may avoid looking the adult in the eye, avoid the adult or become fussy if interaction is forced, be excessively shy, not be as responsive to the adult as is typical, or cling to parents somewhat more than most children of the same age. 2.5 (if between these points) 3 Moderately abnormal relationships. The child shows aloofness (seems unaware of adult) at times. Persistent and forceful attempts are necessary to get the child's attention at times. Minimal contact is initiated by the child. 3.5 (if between these points) 4 Severely abnormal relationships. The child is consistently aloof or unaware of what the adult is doing. He or she almost never responds or initiates contact with the adult. Only the most persistent attempts to get the child's attention have any effect.
  • 177.
  • 179.
    Sampling • One wayto describe the amount of possible sampling error is to use interval estimation. Assuming that sampling errors are normally distributed you can establish a range of values on either side of the point estimate (sample) and then determine the probability that the parameter (value) lies within this range. This probability is expressed as a percentage and is called the level of confidence • 95% of the total area under the cure lies within plus or minus two standard deviations with less than 5% outside those values. If the point estimate (sample) were 30 and the standard deviation were 4 you could be 95% certain that the population value is within 22-38 (95% confidence interval)
  • 180.
    Sampling from apopulation• Since studying entire populations would be an enormous undertaking we sample from the population and infer what the population is like based on the data obtained from the sample (using statistical significance) • Simple Random Sampling Every member of the population has an equal probability of being selected-if 1,000 people in population everyone has 1/1000 chance to be selected. In conducting phone interviews researcher have computer generated list of phone numbers
  • 181.
    Random Number GeneratorAssume we have a population of 500 subjects and we want a sample of 30 Select column and row starting point and use 3 digits to include all possible outcomes
  • 183.
    Sampling • Stratified randomsampling- The population is divided into subgroups (strata) and members from each strata are randomly selected. The subgroups should represent a dimension that is relevant to the research e.g. If you are conducting a survey of sexual attitudes you may want to stratify on the basis of age, gender and amount of education as these factors are related to sexual attitudes (attributes such as height are not relevant to the research) pg146 • Stratified sampling also has the advantage of building in representation of all groups. Out of 10,000 students on campus 10% foreign students on a student visa then you will need at least 100 from this group in a sample of 1,000 students • Sometimes researchers will “oversample” from a small subgroup to ensure their representation in the sample
  • 184.
    Sampling distributions• Ifwe have a very large population we may draw a random sample of 30 from this population and determine some statistic (e.g. mean) . Then we repeat the process 1,000 times producing 1,000 random samples of size 30 with the corresponding 1,000 sample statistics. A frequency distribution can be drawn up, similar to a frequency distribution of any type of score resulting in a model called the (theoretical) sampling distribution of the statistic (in this case the sampling distribution of the means) • The expected value of any statistic is the predicted value which would give the least error for many samples is the mean of the sampling distribution. The standard error of any statistic is the standard deviation of its sampling distribution (Source Roscoe chapter 19) • the standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean
  • 185.
    Sampling Distribution • Ifyou took all of these separate means and calculated an overall mean for the whole lot, you would end up with a value that was the same as the population mean (the mean you’d get if you could measure every one of them) • The arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed- Central Limit Theorem. In terms of the Central Limit Theorem, as the sample size increases, the variance decreases, thus creating a relatively normal distribution https://www.khanacademy.org/math/probability/statistics- inferential/sampling_distribution/v/central-limit-theorem
  • 186.
    Central Limit Theorem •The Central Limit Theorem (CLT for short) basically says that for non-normal data, the distribution of the sample means has an approximate normal distribution, no matter what the distribution of the original data looks like, as long as the sample size is large enough (usually at least 30) and all samples have the same size. • The use of an appropriate sample size and the central limit theorem help us to get around the problem of data from populations that are not normal. Thus, even though we might not know the shape of the distribution where our data comes from, the central limit theorem says that we can treat the sampling distribution as if it were normal
  • 187.
    Sampling Distribution • ClusterSampling-is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. After the clusters are chosen all observations/indviduals in the selected clusters are included in the sample. pg147 • Cluster sampling is typically used when the researcher cannot get a complete list of the members of a population they wish to study but can get a complete list of groups or 'clusters' of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, (for example, people who live in different postal districts in the UK) • You could get a list of all classes taught (each class is a cluster), take a random sample of classes from this list and have all members (students) of the chosen classes complete your survey
  • 188.
    Nonprobability Sampling • Inprobability sampling where the probability of every member is knowable in nonprobability sampling the probability of being selected is not known-techniques are arbitrary. A population may be defined but little effort is expended to ensure the sample accurately • nonprobability sampling does not involve random selection • Nonprobability sampling is cheap and convenient • Three types 1) Haphazard 2) Purposive 3) Quota
  • 189.
    Nonprobability Sampling • Haphazardor Convenience Sampling (Accidental, Judgment) • Select a sample that is convenient e.g. students walking into the campus café • Seen in the traditional "man (person) on the street" interviews conducted frequently by television news programs to get a quick (although nonrepresentative) reading of public opinion. (use of college students in much psychological research is primarily a matter of convenience). • In clinical practice, we might use clients who are available to us as our sample. In many research contexts, we sample simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no evidence that they are representative of the populations we're interested in generalizing to -- and in many cases we would clearly suspect that they are not. People sampled such as TV viewers may be different from the general population (Fox News, MSNBC) and are often asked about controversial issues such as abortion, taxes, gun regulation, and wars which induce certain people
  • 190.
    Nonprobability Sampling • Inpurposive sampling, we sample with a purpose in mind. We usually would have one or more specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the street who are carrying a clipboard and who are stopping various people and asking if they could interview them? Most likely they are conducting a purposive sample (and most likely they are engaged in market research). They might be looking for Caucasian females between 30-40 years old. They size up the people passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the sample. • Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible
  • 191.
    Nonprobability Sampling • Asample is chosen that reflects a numerical composition of various subgroups in the population(technique is similar to stratified sampling without random sampling-you are collecting data in a haphazard way pg 148 • Quota sampling is a method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of specified type to attempt to recruit for example, an interviewer might be told to go out and select 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys so that they could interview them about their television viewing. • It suffers from a number of methodological flaws, the most basic of which is that the sample is not a random sample and therefore the sampling distributions of any statistics are unknown
  • 193.
    Evaluating Samples • Evenusing random sampling does not ensure sample is representative. Error derives from two sources- 1) Sampling frame used 2) poor response rates • Sampling frame- The actual population of individuals(or clusters) from which a random sample will be drawn. Rarely will this perfectly coincide with the population of interest as some biases will be introduced- You are compiling a list of phone numbers to call during the day from the directory and will exclude those with unlisted numbers, those without phones and those who are not home during the day • Response rate- percentage of people in sample who respond (complete phone or mail survey) Mail surveys have lower response rates than phone surveys. Can increase response rate with explanatory postcard before survey arrives, send a second mailing of the survey or provide SSAEstamped self addressed pg 150
  • 194.
    Experimental Design-Chapter 8 •Researcher manipulates the independent variable (usually to create groups) and then compares the groups in terms of their scores on the dependent variable (outcome measure) while keeping all other variables constant through direct experimental control or randomization- If score on the dependent variable are different then the researcher can conclude that the difference was due to the difference between groups and no other cause (and the experiment will have internal validity) pg157-8 • A Confounding variable varies along with the independent variable. Confounding occurs when the effects of the independent variable and an uncontrolled variable are intertwined so you cannot determine which causes the effect
  • 195.
    Basic Experiments • Thesimplest experimental design has two variables, the independent and dependent with the independent variable having a minimum of two levels, an experimental and control group This type of experiment can take one of two possible forms 1) posttest only design or 2) pretest- posttest design • Obtain two equivalent groups (random selection), introduce independent variable and then measure the effect of the independent variable on the dependent variable random assignment to groups or assign same subjects to both groups (CIT study with cross-over design)
  • 196.
    Posttest only vsPretest-Posttest design • After groups formed (experimental and control) must choose two levels of the independent variable (treatment for the experimental group and no treatment for the control group) e.g. Experimental group gets treatment to stop smoking and control group does not • Pretest-Posttest designs- the only difference between the posttest only and pretest-posttest design is that in the latter a pretest is given before the experimental manipulation is introduced
  • 197.
    Posttest only vsPretest-Posttest • The pretest-posttest design makes it easier to assume the groups are equal at the beginning of the experiment. However if you have randomly assigned subjects to the different groups using a sufficiently large sample the groups should be equal without using a pretest • Generally need a minimum of 20-30 Subjects pg160
  • 198.
    Posttest only vsPretest-Posttest advantages and disadvantages • Advantages Pretest-Posttest • While randomization is expected to produce equivalent groups this assumption may go unmet with small sample sizes and a pretest can increase the likelihood of equivalency • Pretest may be necessary for assignment to groups so that those that score low or high on any pretest can be randomly assigned to conditions • The comparison of pretest to posttest allows each subject to be evaluated in terms of change between the measures (with no pretest such comparison is not possible)
  • 199.
    Posttest only vsPretest-Posttest advantages and disadvantages • Pretests helps determine the effects of attrition (dropout) –Can examine pretest scores of dropouts to determine if their scores differed from those completing the study • Disadvantages Pretest • A pretest may be time consuming • A pretest may sensitize (alert) the subjects to the hypothesis which can result in changing a subject’s behavior in the study (can disguise the pretest as part of another study or embed the pretest in a series of irrelevant measures-time consuming)
  • 200.
    Posttest only vsPretest-Posttest advantages and disadvantages• Solomon four group design- Half the subjects receive only the posttest and the other half receive both pretest and posttest. If there is no impact of the pretest, the posttest scores will be the same in the two control groups (with and without pretest) see table 8.1 pg 162 • Repeated measures has advantage of needing fewer subjects which decreases the effects of natural variation between individuals upon the results. Repeated subject designs are commonly used in longitudinal studies, over the long term, in educational tests where it is important to ensure that variability is low and in research on such functions as perception involving only a few subjects often receiving extensive training pg164
  • 201.
    Between group designvs. Repeated Measures design • Between-group design is an experiment that has two or more groups of subjects each being tested by a different testing factor simultaneously- each subject is in either the treatment (experimental) group or the control group pg163 • A repeated-measures design is one in which multiple, or repeated, measurements are made on each subject. weekly blood pressures each subject measured after receiving each level of independent variable
  • 202.
    Between group designvs. Repeated Measures design • In the between groups design subjects are assigned to each of the conditions using random assignment http://www.randomizer.org/form.htm • In repeated measures the same individual participates in all of the groups. These studies are more sensitive to finding statistically significant results-Even if you have randomly selected and assigned subjects to conditions in the between groups design there is still individual variation (naturally occurring “random error”-differences between the subjects assigned to the different groups) which may make the effect of the independent variable unclear but when testing the same person in different conditions (versus different persons in different conditions) this random error is eliminated
  • 203.
    Between group designvs. Repeated Measures design • One limitation of repeated measures is that the conditions must be presented in a particular sequence which could result in an order effect-the order of presenting the treatments affects the dependent (outcome) variable (maybe a subject performs better in the second condition because of practice in the first condition (practice effect) or performed poorer in the second condition due to fatigue (fatigue effect) or that the first treatment influences the second treatment (carryover effect) • Carryover effect occurs when the first condition produces a change that is still influencing the person when the second condition is introduced
  • 204.
    Between group designvs. Repeated Measures design • Experiment- Subjects are presented with a list of words and asked to recall as many words as they can. In one condition, the words are presented one word per second; in the other condition, the words are presented two words per second. The question is whether or not having performed in one condition affects performance in the second condition. Perhaps learning the first list of words will interfere with learning the second list because it will be hard to remember which words were in each list. Or maybe the practice involved learning one list will make it easier to learn a second list. In either case, there would be a carryover effect: performance on the second list would be affected by the experience of being given the first list • Such effects are dealt with through counterbalancing or extended time intervals between conditions presented serially
  • 205.
    Repeated Measures-types ofcounterbalancing • Complete counterbalancing- All possible orders of presentation are included in the experiment pg165-166 • Latin Square-A Latin square is an table filled with n x n different symbols in such a way that each symbol occurs exactly once in each row and exactly once in each column. Each condition appears at each ordinal position (1st 2nd 3rd etc.) and occurs exactly once in each row and once in each column • Using a Latin square controls for most order effects without having to include all possible orders (each condition preceeds and follows each condition one time) • Time Interval-longer rest periods counteract fatigue,practice effects but require a greater commitment to participate
  • 206.
    Matched Pairs Design •Rather than using random assignment to groups you can first match subjects on a variable (achieving equivalency in this manner rather than through randomization) and avoid repeated measures/counterbalanced designs pg169 • Example study 1000 subjects each receive one of two treatments - a placebo or a cold vaccine. The 1000 subjects are grouped into 500 matched pairs. Each pair is matched on gender and age. For example,Pair 1 might be two women, both age 21. Pair 2 might be two men, both age 21. Pair 3 might be two women, both age 22 • matched • 1) matched
  • 207.
    Conducting Experiments Chp9 •Selecting research participants- Determining sample size. Sampling error is a function of sample size and the error tends to be smaller for larger samples-The larger your sample size, the more sure you can be that their answers truly reflect the population. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval. • http://www.raosoft.com/samplesize.html
  • 208.
    Manipulating the Independentvariable • Straightforward manipulations- Subjects are selected and assigned to conditions. The conditions are constructed to represent different levels (e.g. high versus low level of difficulty for material to be learned, high versus low levels of subject motivation, subjects are categorized as ‘experts’ or ‘naïve’) • Generally easier to interpret results when the manipulation is straightforward (without accounting for possible subtleties in staged manipulations –experimenter effects etc.) pg179 • Most research uses this type of manipulation pg177 • Cost of Manipulation- Straightforward manipulations involve less presentation of verbal or written material while running the study with groups of subjects-this is less costly pg181
  • 209.
    Staged Manipulations andConfederates • Staged manipulations used to create some psychological state (frustration, anger etc.) Zitek et al. and ‘sense of entitlement’ Subjects playing a video game “lost” when the game crashed (unfair condition) or because the game was too difficult (fair condition) Subjects in the unfair condition later claimed more money than other subjects when competing against others on a different task • Confederates frequently used in staged manipulations-Conformity experiments-Asch study in which confederates gave incorrect judgments on line length before subjects responded pg178
  • 210.
    Strength of theManipulation • The simplest design has two levels of the independent variable. The stronger the manipulation the more likely differences will be greater between the groups • Social psychology experiment in which subjects interact with similar or dissimilar confederates to determine relationship between similarity and liking. If you have a 10 point scale of similarity the strongest manipulation would be to assign subjects to interact with either confederates of level 1 similarity (group A) or level 10 (group B)- When attempting to determine if a relationship exists a strong manipulation may be the best choice-However the strongest manipulation may not represent real-life situations and therefore show low external validity Also ethically a strong manipulation on variables such as fear or anxiety may hold ethical concerns (what is the threat to validity in strong manipulations)
  • 211.
    Measuring the Dependentvariable • Types of Measures • Self-report measures- used to measure attitudes, judgments, emotional states, attributions • Behavioral measures-direct observations of behaviors-rate of behavior, reaction time, duration pg181 • Physiological measures-recordings of bodily responses- EEG,EMG.GSR,MRI,fMRI • Multiple measures- Most studies use more than one measure (what were they in the studies discussed in class?) Study of health related behaviors multiple measures were taken on # illness days, doctor visits and medication(aspirin) taken pg183 • Multiple measures common everyday experience- people who are considering buying a house look at the house's age, condition, location, style, features, and construction, as well as the price of nearby homes. Doctors diagnosing an illness use multiple assessments: the patient's medical history, lab tests, pt answers to questions
  • 212.
    Multiple Measures • Sensitivityof the dependent variable-The dependent variable should be sensitive enough to detect differences between groups. Simple yes or no questions are much less sensitive than scaled question items (in forced choice yes-no people tend to say yes even if they have some negative feelings and gradations of feelings are not detected) pg183-4 • Tasks can be made too difficult or too easy Ceiling effect- task is so easy that everyone does well and the independent variable seems to have no effect Floor effect- task so difficult that almost nobody does well-Freedman et al. Crowding did not have an effect on cognitive performance but in later research when subjects asked to perform more complex tasks crowding did lower performance
  • 213.
    Measures-Cost & Additionalcontrols • Some measures are more costly than others • While self-report measures involve generally inexpensive measures (paper and pencil, ready questionnaires) other measures more costly- interrater observations require video equipment and at least two observers to view tapes and code behavior-physiological measures require often expensive equipment • While a control group is considered the minimum requirement for a true experiment (RCT) other types of controls are often needed to address potentially confounding factors
  • 214.
    Subject and ExperimenterEffects • Demand characteristics-some aspect of the experiment which might convey the purpose of the study which and the subject may act to confirm or disconfirm your hypothesis • This may be countered by deception/cover stories, use of unrelated filler items in a questionnaire, use of field studies or observation. Can also question subjects about their perception of the study pg185
  • 215.
    Experimental controls • Placebogroups-groups not receiving the treatment in the study • The placebo effect refers to the phenomenon in which some people experience some type of benefit after the administration of a placebo (a substance with no known medical effects) • In certain instances when the benefits of a drug or treatment are evident you must give the treatment to the control (placebo) group as soon as those subjects/patients in the group have completed their part in the study Has the placebo effect gotten stronger over time?
  • 216.
    Placebos without Deception:A Randomized Controlled Trial in Irritable Bowel Syndrome-Kaptchuk,T., et al. PLoS One. 2010; 5(12) • Placebo treatment can significantly influence subjective symptoms. However, it is widely believed that response to placebo requires concealment or deception. We tested whether open-label placebo (non-deceptive and non-concealed administration) is superior to a no-treatment control with matched patient-provider interactions in the treatment of irritable bowel syndrome (IBS) • Open-label placebo produced significantly higher mean (±SD) global improvement scores (IBS-GIS) at both 11-day midpoint (5.2±1.0 vs. 4.0±1.1, p<.001) and at 21-day endpoint (5.0±1.5 vs. 3.9±1.3, p=.002 • Placebos administered without deception may be an effective treatment for IBS. Further research is warranted in IBS, and perhaps other conditions, to elucidate whether physicians can benefit patients using placebos consistent with informed consent • http://www.cbsnews.com/news/treating-depression-is-there-a- placebo-effect/
  • 217.
    Subject and ExperimenterEffects • Experimenter's bias or experimenter effects, is a subjective bias towards a result expected by the human experimenter. These effects may occur when the experimenter knows which condition the subjects are in • Experimenter might unintentionally treat subjects in the different groups differently (verbally or non-verbally) or the experimenter may record or interpret the data and results of the different groups differently (Rosenthal study of ‘bright’ vs. ‘dull’ rats (1966) –Langer & Abelson 1974 Psychologists rated person in video as more disturbed when told it was a patient versus a job applicant pg187 • Can minimize effect by running all conditions simultaneously, automating procedures or by making observations single-blind (subject unaware of condition he/she is in) or double-blind neither subject or experimenter knows the condition of any subject
  • 218.
    Experimental controls-additional considerations •Writing of research proposal allows you to organize and plan a study (Introduction & Methods) pg189 • Pilot studies-a limited trial with a small number of subjects-can ask subjects for feedback • Manipulation check-by using self-report, behavioral or physiological measures you can measure the strength of the manipulation in the pilot study (while it might be distracting in the actual study) and determine if you obtain non significant results was it due to a problem in defining/manipulating the independent variable pg190 • Debriefing also provides you with subject feedback
  • 219.
    Complex Experimental DesignsChp 10 • Experimental Designs with only two levels of the independent provides limited information about the relationship between the independent and dependent variables (review High (medium) Low anxiety and test performance and curvilinear relationships) • If a curvilinear relationship is predicted then at least three levels of a variable must be used as many curvilinear relationships exist in psychology (example of fear and attitude change-increasing the amount of fear aroused by a persuasive message increases attitude change only up to a moderate level after which further increases in fear arousal actually reduce attitude change) pg 198
  • 220.
    Factorial Designs • Designswith multiple levels of the independent variable are more representative of actual events • Factorial designs are designs with more than one independent variable (factor) All levels of each independent variable are combined with all levels of the other independent variable(s)pg199 • A researcher might be interested in the effect of whether or not a stimulus person (shown in a photograph) is smiling or not on ratings of the friendliness of that person. The researcher might also be interested in whether or not the stimulus person is looking directly at the camera makes a difference. • In a factorial design, the two levels of the first independent variable (smiling and not smiling) would be combined with the two levels of the second (looking directly or not) to produce four distinct conditions: smiling and looking at the camera, smiling and not looking at the camera, not smiling and looking at the camera, and not smiling and not looking at the camera
  • 221.
    Interpretation of FactorialDesigns • Two types of effects are studied in a factorial design • Main effect and Interaction effect If there are two independent variables there is a main effect for each of them pg200 • Main effect-is the overall effect of one independent variable and the dependent variable,-the overall effect of each independent variable. In the example of Therapy type and Therapy Duration there is a main effect for Therapy type and a main effect for duration of therapy • Interaction effects occur when the is an interaction between the two independent variables such that the effect of one independent variable depend on the level of the other independent variable
  • 222.
    Factorial Designs Type ofTherapy (B) Factorial design 2x2 with four experimental conditions Behavioral Cognitive Short Duration of Therapy (B) n = 50 n = 50 Long n = 50 n = 50 A design with two independent variables with one variable at two levels and the other at three is a 2 x 3 factorial design with six conditions. A 3 x 3 design will have nine conditions
  • 223.
    Factorial Designs Type ofTherapy (B) Behavioral Cognitive Short Duration of Therapy (B) n = 50 n = 50 Long n = 50 n = 50 In the above experiment the type of psychotherapy (cognitive vs. behavioral) is one main effect for the first independent variable (Therapy type and the duration of psychotherapy (short vs. long)a second main effect of Therapy duration)
  • 224.
    Interpretation of FactorialDesigns • In the experiment, the main effect of type (cognitive vs. behavioral) is the difference between the average score for the cognitive group and the average score for the behavioral group … ignoring duration. That is, short-duration subjects and long- duration subjects are combined together in computing these averages. The main effect of duration is the difference between the average score for the short-duration group and the average score for the long-duration group … this time ignoring type.
  • 225.
    Interpretation of FactorialDesigns We see that the subjects in the cognitive conditions scored higher on average than the subjects in the behavioral conditions indicating a main effect for Therapy type This 2x 2 factorial design has four experimental conditions-short duration behavioral therapy, long duration behavioral therapy, short duration cognitive therapy and long duration cognitive therapy
  • 226.
    Interpretation of FactorialDesigns • Interaction effect- whenever the effect of one independent variable depends on the level of the other pg201-If cognitive psychotherapy is better than behavioral psychotherapy when the therapy is short but not when the therapy is long, then there is an interaction between type and duration of therapy When we say “it depends” we are indicating that some type of interaction is at work. You would like to go to Vegas if you have enough money and you have completed your assignments pg202
  • 227.
    Interpretation of FactorialDesigns • Effects are all independent of each other. A 2x2 factorial experiment might result in no main effects and no interaction, one main effect and no interaction, two main effects and no interaction, no main effects and an interaction, one main effect and an interaction, or two main effects and an interaction. In looking at results presented in a design table or (more importantly) a graph, you can interpret what happened in terms of main effects and interactions.
  • 228.
    Factorial Designs withManipulated and Nonmanipulated variables • One common type of factorial design includes both experimental (manipulated) and nonexperimental (nonmanipulated) variables These designs investigate how different people respond to certain situations. They investigate how the manipulated (independent) variable affects certain personal characteristics or attributes (age, gender,personality types etc.) • Person X Situation studies • Extroverts get excited about parties Introverts get anxious
  • 229.
    Person X SituationEffects Type D personality in patients with coronary artery disease Vukovic et al. Danubina 2014 Mar;26 BACKGROUND: During the past decade studies have shown that Type D personality is associated with increased risk of cardiac events, mortality and poor quality of life. Some authors suggested that depression and Type D personality have substantial phenomenological overlap. SUBJECTS AND METHODS: The sample consisted of non-consecutive case series of seventy nine patients with clinically stable and angiographically confirmed coronary artery disease (CAD), who had been admitted to the Clinic of Cardiology, University Clinical Centre, from May 2006 to September 2008. The patients were assessed by the Type-D scale (DS14), The Beck Depression Inventory (BDI), and provided demographic information. Risk factors for CAD were obtained from cardiologists. (Type D (distressed) Negative affect (worry,anxiety) and social inhibition) RESULTS: The findings of our study have shown that 34.2% patients with CAD could be classified as Type D personality. The univariate analysis has shown that the prevalence of Type D personality was significantly higher in individuals with unstable angina pectoris and myocardial infarction (MI) diagnoses (p=0.02). Furthermore, some components of metabolic syndrome were more prevalent in patients with Type D personality: hypercholesterolemia (p=0.00), hypertriglyceridemia (p=0.00) and hypertension (p=0.01). Additionally, the distribution of depression in patients with a Type D personality and a non-Type D personality were statistically significantly different (p=0.00). CONCLUSION: To our knowledge, this study is the first one to describe the prevalence and clinical characteristics of the Type D personality in patients with CAD in this region of Europe. We have found that the prevalence of Type D personality in patients with CAD is in concordance with the other studies.
  • 230.
    Person by SituationInteraction effects Furnham et al. examined distracting effect of television on cognitive processing (studying) in introverts and extroverts. Both extraverts and introverts performed better in silence but extraverts performed better than introverts in the presence of television distraction Is there a main effect? Is there an interaction effect? Factorial designs with both manipulated independent variables and subject variables recognize that a better understanding of behavior requires knowledge of both situational variables and personal attributes of people pg204
  • 231.
    Interactions and ModeratorVariables • Moderator variables influence the relationship between two other variables A moderator is a variable (z) whereby x and y have a different relationship between each other at the various levels of z. Note that this is essentially what is entailed in an interaction. a moderator variable is one that influences the strength of a relationship between two other variables, and a mediator variable is one that explains the relationship between the two other variables • Whereas moderator variables specify when certain effects will hold, mediators speak to how or why such effects occur • (Baron & Kenny, 2986, p. 1176).
  • 232.
    Mediate vs. ModerateMediatingvariable-Synonym for intervening variable. Example: Parents transmit their social status to their children directly, but they also do so indirectly, through education: Parent’s status ➛ child’s education ➛ child’s status- education is a mediating variable (mediators explain) Moderating variable A variable that influences, or moderates, the relation between two other variables and thus produces an interaction effect. a moderator is a third variable that affects the correlation of two variables if we were to replicate the Asch Experiment experiment with a female subject and found that her answers (Y variable) were not affected by confederate’s answers (X variable), then we could say that gender is a Moderator (M) in this case https://www.youtube.com/watch?v=3ymkfDBwel0
  • 233.
    Moderators vs. Confounders •Moderator: A moderator is a variable (z) whereby x and y have a different relationship between each other at the various levels of z. Note that this is essentially what is entailed in an interaction. A variable that influences, or moderates, the relation between two other variables and thus produces an interaction effect. • Confounder: A third variable that is related to x in a non-causal manner and is related to y either causally or correlationally. The third variable (z) is related to y even when x is not present. A confounding variable is an extraneous variable (i.e., a variable that is not a focus of the study) that is statistically related to (or correlated with) the independent variable. A variable that obscures the effects of another variable.
  • 234.
    Let’s review Howto control for confounding variables • Confounding variable (continued)This is bad because the point of an experiment is to create a situation in which the only difference between conditions is a difference in the independent variable. This is what allows us to conclude that the manipulation is the cause of differences in the dependent variable. But if there is some other variable that is changes along with the independent variable, then this confounding variable could be the cause of any difference • Controlling confounding variables-Essentially all person variables can be controlled by random assignment. If you randomly assign subjects to conditions, then on average they will be equally intelligent, equally outgoing, equally motivated, and so on • variablehttps://www.youtube.com/watch?v=B7QdNYLp_E0 confounding variables
  • 235.
    Moderator variables • Amoderator variable changes the strength of an effect or relationship between two variables. Moderators indicate when or under what conditions a particular effect can be expected. A moderator may increase the strength of a relationship, decrease the strength of a relationship, or change the direction of a relationship. In the classic case, a relationship between two variables is significant (i.e, non- zero) under one level of the moderator and zero under the other level of the moderator. For example, work stress increases drinking problems for people with a highly avoidant (e.g., denial) coping style, but work stress is not related to drinking problems for people who score low on avoidant coping (Cooper, Russell, & Frone, 1990).
  • 236.
    Example of Moderation •Stress Depression Social Support One of the clearest examples of moderation was presented by Cohen and Wills (1985). They argued that the social support literature (to that point in 1985) had neglected to consider the role of social support as a moderator of the stress to adjustment relationship. This moderation relationship is often depicted as shown above • This schematic suggests that the relationship between stress and depression may differ in strength at different levels of social support. In other words, stress may be more strongly associated with depression under conditions of low social support compared to conditions of high social support.
  • 237.
    Outcomes of a2 X 2 Factorial Design • Two levels to each of two independent variables We must determine if there is a significant main effect for variables A, B and an interaction effect between the variables • In the example to the right there is a Main Effect for Both Room Temperature and Test Difficulty but no interaction effect.
  • 238.
    Main effects andinteraction effects • We see that the six subjects in the cognitive conditions scored three points higher on average than the six subjects in the behavioral conditions. This is the main effect of the type of psychotherapy.To see the main effect of the duration of psychotherapy, we compare the average score in the short condition with the average score in the long condition, now computing these averages across subjects in the cognitive and behavioral conditions. We see that the six subjects in the long conditions scored three points higher on average than the six subjects in the short conditions. This is the main effect of the duration of psychotherapy
  • 239.
    Main Effects TherapyType X Duration Below are the same results plotted in the form of a bar graph. The main effect of type is indicated by the fact that the two cognitive bars are higher on average than the two behavioral bars. The main effect of duration is indicated by the fact that the two long- duration (dark) bars are higher on average than the two short-duration (light) bars
  • 240.
    Main Effects andInteraction Effects Parallel lines in these types of graphs indicate that there are main effects in the results, but no interactions. If the lines are not parallel this is indicative of an interaction. "Do students do better on hard tests or easy tests?" "It depends, in a fifty degree room there is no difference, but in a ninety degree room they do much better on easy tests.“ Interaction effect Students do best when the test is easy and the temperature is 90 degrees. Interaction effect
  • 241.
    Music is asdistracting as noise: the differential distraction of background music and noise on the cognitive test performance of introverts and extraverts Furnham, 2002 • Previous research has found that introverts' performance on complex cognitive tasks is more negatively affected by distracters, e.g. music and background television, than extraverts' performance. This study extended previous research by examining whether background noise would be as distracting as music. In the presence of silence, background garage music and office noise, 38 introverts and 38 extraverts carried out a reading comprehension task, a prose recall task and a mental arithmetic task. It was predicted that there would be an interaction between personality and background sound on all three tasks: introverts would do less well on all of the tasks than extraverts in the presence of music and noise but in silence performance would be the same. A significant interaction was found on the reading comprehension task only, although a trend for this effect was clearly present on the other two tasks. It was also predicted that there would be a main effect for background sound: performance would be worse in the presence of music and noise than silence. Results confirmed this prediction. These findings support the Eysenckian hypothesis of the difference in optimum cortical arousal in introverts and extraverts. • What was the subject variable? What was the manipulated variable? Was there a main effect? Was there an interaction effect?
  • 242.
    ANOVA • A procedureknown as the Analysis of Variance (ANOVA) is used to assess the statistical significance of main effects and interaction in a factorial design pg207 • the ANOVA can be used for factorial designs (or designs which employ more than one IV). Note that, in this context, an IV is often referred to as a factor. The factorial design is very popular in the social sciences. It has a few advantages over single variable designs. The most important of these is that it can provide some unique and relevant information about how variables interact or combine in the effect they have on the dependent variable
  • 243.
    ANOVA example • Thehuman literature had shown that children diagnosed with Fetal Alcohol Syndrome (FAS) were more active and impulsive than children not receiving this diagnosis. They also seemed to have a more difficult time controlling themselves (i.e., self restraint). These problems typically become less severe as the child ages. Were the behavioral abnormalities observed in the children with FAS due to the fact that their mothers consumed alcohol while they were pregnant or due to nutritional factors (since the diet of an alcoholic is typically not wholesome & well balanced)? Another possible causal factor of the abnormalities observed is spousal abuse. Offspring of rodents given alcohol when pregnant show similar morphological and behavioral changes to that observed in humans
  • 244.
    Study of Alcoholon Learning • We will have two IVs or factors and each will have two levels (or possible values). The table below illustrates the design. Note that EDC refers to Ethanol Derived Calories Age (factor B) Adolescent Adult Maternal Diet (factor A) Chocolate Milk (0% EDC) n=5 n=5 White Russian (35% EDC) n=5 n=5
  • 245.
    • This isan example of a 2x2 factorial design with 4 groups (or cells), each of which has 5 subjects. This is the simplest possible factorial design. The Dependent Variable (DV) used was a Passive Avoidance (PA) task. Rats are nocturnal, burrowing creatures and thus, they prefer a dark area to one that is brightly lit. The PA task uses this preference to test their learning ability. The apparatus has two compartments separated by a door that can be lifted out. One of the compartments has a light bulb which is controlled by the experimenter. The floor can be electrified and the rat receives a brief, mild electric shock Age (factor B) Adolescent Adult Maternal Diet (factor A) Chocolate Milk (0% EDC) n=5 n=5 White Russian (35% EDC) n=5 n=5
  • 246.
    ANOVA example • Thefirst trial The rat is placed in the compartment with the light bulb as shown below. When the trial begins, three things happen. The door is raised, the light is turned on, and a stopwatch is started Within a few seconds of the door being raised, the rat will typically sniff around and begin to move into the darker compartment (without the light). When the rat has completely entered the darker compartment, the door is closed and the brief, mild shock is administered. The goal is for the rat to learn not to move into the darker compartment. In other words, by remaining passive, the rat can avoid the shock, hence the term passive avoidance
  • 247.
    ANOVA example • Forour purposes, we will use a criteria of 180 seconds as our operational definition of learning PA. That is, when the rat remains in the brightly lit compartment for 3 minutes, we will say that it has learned the task and what we measure is the number of trials it takes the rat to do this. (Note that a smart rat will take less trials to learn.) Thus, the PA task was chosen as the DV because it can be thought of as a measure of "self restraint.“ The first possibility is that nothing is significant Age (factor B) A marginalsAdolescen t Adult Maternal Diet (factor A) (0% EDC) 3 3 3 (35% EDC) 3 3 3 B marginals 3 3
  • 248.
    ANOVA example continued •The second possibility is that the main effect of factor A is significant. Here is one possible representation of this outcome Age (factor B) A marginalsAdolescent Adult Maternal Diet (factor A) (0% EDC) 2 2 2 (35% EDC) 4 4 4 B marginals 3 3 Notice that the A marginals show a difference of two and thus the main effect of factor A is significant. The animals receiving alcohol in utero took more trials to learn PA than controls. The fact that the effect is consistent across both levels of factor B tells us that there is no interaction. In graphical form:
  • 249.
    ANOVA example continued •The next possibility is that the main effect of factor B is significant. Here is one possible representation of this outcome Age (factor B) A marginalsAdolescent Adult Maternal Diet (factor A) (0% EDC) 4 2 3 (35% EDC) 4 2 3 B marginals 4 2 Notice that the B marginals show a difference of two and thus the main effect of factor B is significant. The older animals took fewer trials to learn PA than the younger animals. The fact that the effect is consistent across both levels of factor A tells us that there is no interaction
  • 250.
    ANOVA example continued •The next possibility is that both main effects are significant. Here is one possible representation of this outcome Age (factor B) A marginalsAdolescent Adult Maternal Diet (factor A) (0% EDC) 3 1 2 (35% EDC) 5 3 4 B marginals 4 2 Notice that both sets of marginals show a difference of two and thus main effects are significant. The animals receiving alcohol in utero took more trials to learn PA than controls and the older animals took less trials to learn PA than the younger animals. The fact that both of these main effects are consistent across the levels of the remaining factor tells us that there is no interaction
  • 251.
    ANOVA example continued •The next possibility is that the interaction is significant. Here is one possible representation of this outcome Age (factor B) A marginalsAdolescent Adult Maternal Diet (factor A) (0% EDC) 2 4 3 (35% EDC) 4 2 3 B marginals 3 3 Notice that both sets of marginals show no difference, thus neither main effect is significant. However, some of the cell means do differ by two. The animals receiving alcohol in utero took more trials to learn PA when young and less when older than controls. In other words, the effects of prenatal alcohol depended on the age of the animal when tested. Whenever the effect of one factor depends upon the levels of another, there is an interaction.
  • 252.
    ANOVA example continued •The next possibility is the interaction and the main effect of factor A are significant as shown below Age (factor B) A marginalsAdolescent Adult Maternal Diet (factor A) (0% EDC) 1 3 2 (35% EDC) 5 3 4 B marginals 3 3 Notice that the B marginals show no difference, thus the main effect of B is not significant. The A marginals do show a difference of two which demonstrates a main effect of factor A. This tells us that the animals that received alcohol in utero took longer to learn PA than the animals that didn't. However, the cell means tell the real story here. That is, the effect depends on age. The animals receiving alcohol in utero took more trials to learn PA when young but were normal when older when compared to controls.
  • 253.
    Independent Groups, Repeated Measuresand Mixed Factorial designs• In a 2 x 2 Factorial design with four conditions for an Independent Group (between-subjects) design, a different group of subjects will be assigned to each of the four conditions. Following the example on pg208 if you have a 2 x 2 design with 10 subjects in each condition you will need 40 subjects total Level 1 Var B Level 2 Var A • Level 1 • Level 2 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 S11,S12,S13,S14, S15,S16,S17,S18, S19,S20 S21,S22,S23,S24, S25,S26,S27,S28, S29,S30 S31,S32,S34,S35, S36,S37,S38, S39,S40 2 x 2 Independent Groups ( Between Subjects Design
  • 254.
    Independent Groups, Repeated Measuresand Mixed Factorial designs • In a repeated measures (within-subjects) design the same subjects will participate in ALL conditions • Level 1 Var B Level 2 Var A Level 1 Level 2 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 2 x 2 Repeated measures (within-groups) design
  • 255.
    Independent Groups, Repeated Measuresand Mixed Factorial designs • In a 2 x 2 mixed Factorial design ten different subjects are assigned to Levels 1 and 2 of Variable A but Variable B is a repeated measures with subjects assigned to each of the two levels of Variable A receiving both Levels of Variable B • Level 1 Variable B Level 2 • Var A • Level 1 • Level 2 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 S1,S2,S3, S4,S5,S6,S7,S8, S9,S10 S11,S12,S13,S14,S 15,S16,S17,S18, S19,S20 S11,S12,S13,S14,S 15,S16,S17,S18, S19,S20 2 x2 Mixed Factorial Design
  • 256.
    Increasing the Numberof Levels of an Independent Variable • You can increase the complexity of the basic 2 x 2 Factorial design by increasing the number of levels of one or more of the independent variables pg209
  • 257.
    Example of 2x 3 Factorial Design • Dr. Sy Cottick investigated driver frustration under low, medium, and high density traffic conditions and under traffic flow controlled by a police officer or a traffic signal ( 2 conditions of Traffic Control X 3 conditions of Traffic Density. The measure of frustration was the number of horns honked by drivers before receiving the right-of-way at a controlled intersection.
  • 258.
    2 X 3Factorial Example • Is there a Main Effect for Traffic Density? • Yes The average number of horn honks increases as traffic density increases • Is there a main effect of type of controlled intersection? • Yes People honk more often at signal controlled intersections than at officer controlled intersections 2 4 Mean = 3 4 6 Mean = 5 8 10 Mean = 9 Mean = 4.67 Mean = 6.67 Traffic Type of controlled intersection Density Officer Signal Low Medium High
  • 259.
    2 X 3Factorial Example • Is there an interaction between traffic density and type of controlled intersection? • No The same difference in horn honks between officer and signal exists at each level of traffic density, so there is no interaction. 12 10 8 Number of 6 Officer Signal horn honks 4 2 0 Low Medium High Traffic Density
  • 260.
    it is notalways possible or practical to do an RCT (randomized clinical trial). It may not be ethical to do a RCT in some cases (for example, tobacco use), it may be too expensive, especially for early or exploratory studies. This 2 x 2factorial design has four experimental conditions
  • 261.
    Single-Case, Quasi-Experimental and DevelopmentalResearch Chapter 11 • While the classic experimental design includes randomly assigned subjects to the and independent variable conditions with a dependent variable (outcome) measure with all other variables held constant three types of special research situations exist • 1) Single-Case 2) Quasi-Experimental and 3)Developmental Research
  • 262.
    Single-subject, N=1 Designs •Single-subject research is experimental rather than correlational or descriptive, and its purpose is to document causal, or functional, relationships between independent and dependent variables. Single-subject research employs within- and between-subjects comparisons to control for major threats to internal validity and requires systematic replication to enhance external validity. (Martdia, Nelson, & Marchand-Martella, 1999). • (Each participant serves as his or her own control). • Single-subject research requires operational descriptions of the participants, setting, and the process by which participants were selected (Wolery& Ezell, 1993)
  • 263.
    Single Case ExperimentalDesigns • Early work in single subject designs credited to B.F. Skinner with many case studies or single case designs in clinical counseling and educational settings • Single case studies begin with a baseline measure (control) followed by a manipulation • In order to determine if the treatment was effective there is a reversal design A-B-A pg216
  • 264.
    Single Case DesignsABA Designs • A baseline and Observation • B Treatment or Intervention • A Withdrawal of Treatment • The ABA design can be further improved by ABAB design and can be extended out even further ABABAB as a single reversal may not be powerful enough
  • 265.
    Single Case Designs •A single reversal may not be enough but in addition the observed effect may have been due to a random fluctuation in behavior which would justify multiple withdrawals and treatments pg 217-218 Unlike Group studies Single case designs frequently involve multiple repeated observations of the subject(s) • Multiple Baseline Designs • In certain instances it is unethical to reverse treatment that reduces dangerous or illegal behaviors such as drug/alcoholism or sexual deviancy. In such cases it may be necessary to demonstrate the effectiveness of treatment with a multiple baseline design
  • 266.
    Multiple Baseline Designs Onevariation of multiple baseline designs is across subjects in which the behavior of several subjects is measured over time and the treatment is introduced at a different time for each subject. Change takes place over various subjects ruling out random effects Another version is a multiple baseline across behaviors Several different behaviors of a single subject are measured over time. At different times the same manipulation is applied to each of the behaviors
  • 267.
    Multiple Baseline Designs •Multiple baselines across behaviors- A reward or token system could be applied to different behaviors of the same subject/patient Different ones for grooming, socialization, appropriate speech pg219 • A third variation of the multiple baseline is across situations in which the same behavior is measured in
  • 268.
    Single-Case Designs • Procedureswith any one subject can be replicated with other subjects enhancing the generalizability or external validity (or replicated across settings). This is often done in research • Sidman (1960) suggests to present data from each single case design separately and not try to group the means of all the individuals as such means may be misleading (e.g. The treatment may have been effective in changing the behavior of some individuals but not others) • Within education, single-subject research has been used not only to identify basic principles of behavior (e.g., theory), but also co document interventions (independent variables) that are functionally related to change in socially important outcomes (dependent variables; Wolf, 1978).
  • 269.
    Help Line Evaluation •An evaluation was conducted of the impact of different methods of agency outreach on the number of phone calls received by a help line (information and referral). The baseline period represented a time in which there was no outreach; rather, knowledge about the help line seemed to spread by word of mouth. The B phase represented the number of calls after the agency had sent notices about its availability to agencies serving older adults and families. During the C phase, the agency ran advertisements using radio, TV, and print media. Finally, during the D phase, agency staff went to a variety of different gatherings, such as community meetings or programs run by different agencies, and described the help line.
  • 270.
    Evaluation of HelpLine-Glatthorn NumberofCalls EXHIBIT 7 -1 4 Multiple Treatment Design 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Week
  • 271.
    Phone calls didnot increase appreciably after notices were sent to other professionals or after media efforts, but it did increase dramatically in the final phase of the study. This graph demonstrates how tricky the interpretation of single-subject data can be. A difficulty in coming to a conclusion with such data is that only adjacent phases can be compared so that the effect for nonadjacent phases cannot be determined. One plausible explanation for the findings is that sending notices to professionals and media efforts at outreach were a waste of resources in that the notices produced no increase in the number of calls relative to doing nothing, and advertising produced no increase relative to the notices. Only the meetings with community groups and agency-based presentations were effective, at least relative to the advertising. An alternative interpretation of the findings is that the order of the activities was essential. There might have been a carryover effect from the first two efforts that added legitimacy to the third effort. In other words, the final phase was effective only because it had been preceded by the first two efforts. If the order had been reversed, the impact of the outreach efforts would have been negligible. A third alternative is that history or some other event occurred that might have increased the number of phone calls.
  • 272.
    ASSESSMENT OF DEVIANTAROUSAL IN ADULT MALE SEX OFFENDERS WITH DEVELOPMENTAL DISABILITIES- Reyes et al. JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 2006,39,173-188 • Some statistics regarding very broad characteristics of sex offenders are available (i.e., age, race, etc.), but are limited due to the wide variability in this population. In general, the demographic characteristics of sex offenders seem to match those of nonoffenders • Ten individuals, residing in a treatment facility specializing in the rehabilitation of sex offenders with developmental disabilities, participated in an arousal assessment involving the use of the penile plethysmograph. All of these individuals had been accused of committing one or more sexual offenses and had been found incompetent to stand trial. The arousal assessments involved measuring change in penile circumference to various categories of stimuli both appropriate (adult men and women) and inappropriate (e.g., 8- to 9-year-old boys and girls). Before each session, the technician was required to calibrate the penile strain gauge to ensure accurate measurement. The video clips were presented one at a time in one of three predetermined orders
  • 273.
    ASSESSMENT OF DEVIANTAROUSAL • Differentiated deviant arousal was characterized as showing arousal in the presence of a particular age and gender category that was higher than the arousal to other categories and to the neutral stimulus. differentiated arousal patterns were also consistently higher than arousal levels to the neutral stimulus. Undifferentiated deviant arousal was characterized as showing similar arousal levels to deviant and non deviant stimuli that was higher than the arousal in the presence of the neutral stimulus. The arousal assessments showed that not all of the participants were differentially aroused by the deviant stimuli
  • 274.
    ASSESSMENT OF DEVIANTAROUSAL • Specific targets for teaching are identified. Thus, skills training can be conducted to teach avoidance of high-risk situations (e.g., being in situations with children of a certain age group) • Second, the assessment results could be used to evaluate the effects of commonly used, but poorly validated, treatments. For example, classical conditioning, which typically involves pairing unpleasant odors with deviant arousal, has been commonly used but has not been validated. • Thirdly The effects of presession masturbation could be tested to determine whether ejaculation serves as an establishing operation or an abolishing operation for sexual stimuli as reinforcing (or at least as arousing stimuli)
  • 275.
    Program Evaluation • Programevaluation is a method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency. • The question that needs to be answered is whether or not the programs people are funding, implementing, voting for, receiving or objecting to are producing the intended effect. The main focus is outcome evaluation which determines if the program was effective pg221
  • 276.
    Program Evaluation • Evaluationis the systematic application of scientific methods to assess the design, implementation, improvement or outcomes of a program (Rossi & Freeman, 1993; Short, Hennessy, & Campbell, 1996). The term "program" may include any organized action such as media campaigns, service provision, educational services, public policies, research projects. • Rossi et al. (2004) identified five types of evaluations each attempting to answer different questions 1) Needs Assessment 2) Program Theory Assessment 3) Process Evaluation 4) Outcome Evaluation 5) Efficiency Assessment
  • 277.
    Needs Assessment • Aneeds assessment is a part of planning processes determining if there are problems that need to be addressed in a target population( Is adolescent drug abuse a problem in the community?) – A general 12 step process-Data may come from surveys, interviews, statistical data provided by various agencies pg221 • Confirm the issue and audiences • Establish the planning team • Establish the goals and objectives • Characterize the audience • Conduct information and literature search • Select data collection methods • Determine the sampling scheme • Design and pilot the collection instrument • Gather and report data; Analyze data; Manage data • Synthesize data and create report
  • 279.
    Program Theory • Programevaluation often involves collaboration of researchers, service providers and prospective client of the program to determine that the proposed program does actually address the needs of the target population in appropriate ways. • Example cited in assessing the needs of homeless men and women in NYC men needed help with drinking or drug problems, handling money and social skills while women needed help with heath and problems- Any designed program must take these factors into account and provide a rationale for how homeless individuals will benefit from the program
  • 280.
    Process Evaluation • Whenthe program is under way the evaluation researcher monitors it to determine if it is being effective. Is the program doing what it is supposed to do? The types of questions asked when designing a process evaluation are different from those asked in outcome evaluation. The questions underlying process evaluation focus on how well interventions are being implemented. Typical questions asked include, but are not limited to: • What intervention activities are taking place? • Who is conducting the intervention activities? • Who is being reached through the intervention activities? • What inputs or resources have been allocated or mobilized for program implementation? • What are possible program strengths, weaknesses, and areas that need improvement?
  • 281.
    Outcome Evaluation (ImpactAssessment) • Outcome evaluations measure to what degree program objectives have been achieved (i.e. short- term, intermediate, and long-term objectives). This form of evaluation assesses what has occurred because of the program, and whether the program has achieved its outcome objectives. pg223 An outcome evaluation focused on tobacco prevention activities can measure the following elements Changes in intended and actual tobacco-related behaviors Changes in people’s attitude toward, and beliefs about, tobacco Changes in people’s awareness and support for interventions and policy or advocacy effort True experimental designs may not always be possible in these conditions and quasi-experimental designs and single-case designs may offer good alternatives
  • 282.
    Program Evaluation EfficiencyAssessment Final program evaluation question addresses efficiencypg222 assessment. Once shown that a program does have its intended effect, researcher must determine if it is worth the resources that must be dedicated to it Cost vs Benefits
  • 283.
    When Bad thingsHappen to Good Intentions • The Drug Abuse Resistance Education DARE reviewed • When it became known that the prestigious American Journal of Public Health planned to publish the study, DARE strongly objected and tried to prevent publication. "DARE has tried to interfere with the publication of this. They tried to intimidate us," the publication director reported (also see pg230 text) • The U.S. Department of Education prohibits schools from spending its funding on DARE because the program is completely ineffective in reducing alcohol and drug use. DARE was declared as ineffective by U.S. General Accounting Office, the U.S. Surgeon General, the National Academy of Sciences, and the U.S. Department of Education-David J. Hanson, Ph.D. http://www.alcoholfacts.org/DARE.html
  • 284.
    An outcome evaluationof Project DARE Christopher Ringwalt1, Susan T. Ennett2 and Kathleen D. Holt2 Health Educ. Res. (1991) 6 (3): 327-337 • This paper presents the results of an evaluation of the effects of the Drug Abuse Resistance Education (DARE)Project, a school-based drug use prevention program, in a sample of fifth and sixth graders in North Carolina. DARE is distinguished by its use of specially trained, uniformed police officers to deliver 17 weekly lessons in the classroom. The evaluation used an experimental design employing random assignment of 20 schools to either a DARE or no-DARE condition, pre- and post-testing of both groups, attrition assessment, adjustments for school effects, and control for non-equivalency between comparison groups. • DARE demonstrated no effect on adolescents' use of alcohol, cigarettes or inhalants, or on their future intentions to use these substances. However, DARE did make a positive impact on adolescents' awareness of the costs of using alcohol and cigarettes, perceptions of the media's portrayal of these substances, general and specific attitudes towards drugs, perceived peer attitudes toward drug use, and assertiveness.
  • 285.
    How effective isdrug abuse resistance education? A meta-analysis of Project DARE outcome evaluations S T Ennett et al. Am J Public Health. 1994 September; 84(9): 1394–1401 This study used meta-analytic techniques to review eight methodologically rigorous DARE evaluations INTRODUCTION Project DARE (Drug Abuse Resistance Education) is the most widely used school-based drug use prevention program in the United States, but the findings of rigorous evaluations of its effectiveness have not been considered collectively. METHODS. We used meta-analytic techniques to review eight methodologically rigorous DARE evaluations. Weighted effect size means for several short-term outcomes also were compared with means reported for other drug use prevention programs. RESULTS. The DARE effect size for drug use behavior ranged from .00 to .11 across the eight studies; the weighted mean for drug use across studies was .06. For all outcomes considered, the DARE effect size means were substantially smaller than those of programs emphasizing social and general competencies and using interactive teaching strategies. CONCLUSIONS. DARE's short-term effectiveness for reducing or preventing drug use
  • 287.
    Effect Size • Consideran experiment conducted by Dowson (2000) to investigate time of day effects on learning: do children learn better in the morning or afternoon? A group of 38 children were included in the experiment. Half were randomly allocated to listen to a story and answer questions about it (on tape) at 9am, the other half to hear exactly the same story and answer the same questions at 3pm. Their comprehension was measured by the number of questions answered correctly out of 20. • The average score was 15.2 for the morning group, 17.9 for the afternoon group: a difference of 2.7. But how big a difference is this? If the outcome were measured on a familiar scale, such as GCSE grades, interpreting the difference would not be a problem. If the average difference were, say, half a grade, most people would have a fair idea of the educational significance of the effect of reading a story at different times of day. However, in many experiments there is no familiar scale available on which to record the outcomes. The experimenter often has to invent a scale or to use (or adapt) an already existing one - but generally not one whose interpretation will be familiar to most people
  • 288.
    Effect Size • Oneway to get over this problem is to use the amount of variation in scores to contextualize the difference. If there were no overlap at all and every single person in the afternoon group had done better on the test than everyone in the morning group, then this would seem like a very substantial difference. On the other hand, if the spread of scores were large and the overlap much bigger than the difference between the groups, then the effect might seem less significant. Because we have an idea of the amount of variation found within a group, we can use this as a yardstick against which to compare the difference. This idea is quantified in the calculation of the effect size. effect size is a measure of the strength of a phenomenon • The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program CALCULATE EFFECT SIZE http://www.uccs.edu/~lbecker/ Robert Coe University of Durham http://www.leeds.ac.uk/educol/documents/00002182.htm effect size r Small 0.10 Medium 0.30 Large 0.50
  • 289.
    Effect Size • Theconcept is illustrated in Figure 1, which shows two possible ways the difference might vary in relation to the overlap. If the difference were as in graph (a) it would be very significant; in graph (b), on the other hand, the difference might hardly be noticeable. In Dowson's time-of-day effects experiment, the standard deviation (SD) = 3.3, so the effect size was (17.9 - 15.2)/3.3 = 0.8. An effect size is exactly equivalent to a 'Z-score' of a standard Normal distribution. For example, an effect size of 0.8 means that the score of the average person in the experimental group is 0.8 standard deviations above the average person in the control group, and hence exceeds the scores of 79% of the control group. With the two groups of 19 in the time-of-day effects experiment, the average person in the 'afternoon' group (i.e. the one who would have been ranked 10th in the group) would have scored about the same as the 4th highest person in the 'morning' group The basic formula to calculate the effect size is to subtract the mean of the control group from that of the experimental group and, then, to divide the numerator by the standard deviation of the scores for the control group
  • 290.
    Quasi-Experimental Designs • Theexperimental method received a big boost in the 1920s from a young Englishman named Ronald Fisher. Fisher's modern experimental methods were applied in agricultural research for 20 years or so before they began to be applied in psychology and eventually in education. • In the early 1960s, a psychologist, Donald Campbell, and an educational researcher, Julian Stanley (Campbell & Stanley, 1963), published a paper that was quickly acknowledged to be a classic. They drew important distinctions between experiments of the type Fisher devised and many other designs and methods being employed by researchers with aspirations to experiments but failing to satisfy all of Fisher's conditions. Campbell and Stanley called the experiments that Fisher devised "true experiments." The methods that fell short of satisfying the conditions of true experiments they called "quasi-experiments," quasi meaning seemingly or apparently but not genuinely so.
  • 291.
    Quasi-Experimental Designs • Quasi-experimentaldesigns address the need to study the effect of an independent variable in settings in which the controls of true experimental designs cannot be achieved pg222 • A quasi-experiment is an empirical study used to estimate the causal impact of an intervention on its target population. Quasi- experimental research share similarities with the traditional experimental design or randomized controlled trial, but they specifically lack the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment (e.g. an eligibility cutoff mark)
  • 292.
    Quasi-Experimental Designs • Ashort-hand proposed by Cook and Campbell and adopted by many others uses the following code to describe quasi-experimental design (not used in text but very common) • R = randomization On = observation at time n X = intervention (i.e. surgery or giving a drug) The One-Shot Case Study (one group postest-only design) •No control group. This design has virtually no internal or external validity there is no means for determining whether change occurred as a result of the treatment or program Example-Training program for employees has only one group with one intervention and one observation(after the fact) Treatment Post-test X O .
  • 293.
    Quasi-Experimental Designs • Forexample, you want to determine whether praising primary school children makes them do better in arithmetic. You measure mathematics achievement with a test. To test this idea, you choose a class of 2nd grade pupils and increase praising of children and you find that their mathematics score did increase. You conclude that praising children, increases their mathematics score. X O (praise) (math scores) • What are the weaknesses of this design? • 1) Selection: It is possible that the students you selected as subjects were already good in mathematics. 2) History: If the school had organized a motivation course on mathematics for these students, it might influence their performance
  • 294.
    Quasi-Experimental Designs • One-GroupPretest-Posttest Design • Minimal Control. There is somewhat more structure, there is a single selected group under observation, with a careful measurement being done before applying the experimental treatment and then measuring after. This design has minimal internal validity, controlling only for selection of subject and experimental mortality. It has no external validity O1 X O2 (pretest) (praise) (posttest) • Using the previous study on praise and math scores we want to ensure that there was no pre-existing characteristic among the pre-school children, a pretest may be administered. If the children became more attentive after praising compared to the pretest, then you can attribute it to the practice of praising
  • 295.
    Quasi-Experimental Designs• O1X O2 (pretest) (praise) (posttest) • What are the weaknesses for this design? • 1) Maturation: If time between the pretest and posttest is long, it is possible that the subjects may have matured because of developmental changes. • 2) Testing: Sometimes the period between the pretest and the posttest is too short and there is the possibility that subjects can remember the questions and answers (carryover effect) • It may not be ethical to do a RCT (e.g. tobacco use) • Although Campbell and Stanley used the term control group others prefer the term comparison group to emphasize the difference between this and RCT
  • 296.
    Quasi-Experimental Designs • NonequivalentControl Groups uses a control group but it is selected from existing natural groups • Example- one group is given a medicine, whereas the control (comparison) group is given none. If different dosages of a medicine are tested, the design can be based around multiple groups. Such a design is limited in scope and contains many threats to validity. It is very poor at guarding against assignment bias since it does not use random assignment and is also subject to selection bias. Because it's often likely that the groups are not equivalent, this designed was named the nonequivalent groups design to remind us
  • 297.
    Nonequivalent Control Group Pretest-PosttestDesign In general, however, non-equivalent groups are usually chosen to be as similar as possible to each other, which helps to control extraneous variables. For example, if we are comparing cooperative learning to standard learning classroom techniques we probably would not use a daytime class as our cooperative learning group and an evening class as our standard lecture group pg228 However if we add a pretest we can improve this design. This Nonequivalent Control Group Pretest-Posttest design gives us the advantage of comparing the control group to the experimental group but this is still not a true RCT as assignment to groups is not random
  • 298.
    Nonequivalent Control Group Pretest-PosttestDesign • The nonequivalent control group design still lacks random assignment but can be improved by matching subjects (similar to matched pairs designs). If we match subjects on multiple variables and combine the scores we produce a propensity score (propensity score matching) • Matching attempts to mimic randomization by making the groups receiving treatment and not-treatment more comparable pg229
  • 299.
    A story ofNonequivalence • Two heart surgeons walk into a room. • − The first surgeon says, “Man, I just finished my 100th heart surgery!”. − The second surgeon replies, “Oh yeah, I finished my 100th heart surgery last week. I bet I'm a better surgeon than you. How many of your patients died within 3 months of surgery? Only 10 of my patients died.” − First surgeon smugly responds, “Only 5 of mine died, so I must be the better surgeon.” − Second surgeon says, “My patients were probably older and had a higher risk than your patients.”
  • 300.
    Propensity Score • Inthe statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus to those that did not. The technique was first published by Paul Rosenbaum and Donald Rubin in 1983
  • 301.
    2007Jan05 GCRC Research-SkillsWorkshop 301 Publications in Pub Med with phrase "Propensity Score" 0 20 40 60 80 100 120 140 160 180 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year Numberofpublications
  • 302.
    2007Jan05 GCRC Research-SkillsWorkshop 302 Propensity Score example Consider an HIV database: – E+: patients receiving a new antiretroviral drug (N=500). Exper. Gr. – E-: patients not receiving the drug (N=10,000). Control gr. – D+: mortality. Dependent variable Need to manually measure CD4.(CD4=T-Helper Cells send signals to other types of immune cells, including CD8 killer cells. CD4 cells send the signal and CD8 cells destroy the infectious particle May be potential confounding by other HIV drugs as well as other prognostic factors • Limitations Propensity score methods work better in larger samples to attain distributional balance of observed covariates. – In small studies, imbalances may be unavoidable. Including irrelevant covariates in propensity model may reduce efficiency; Bias may occur; Non Uniform Treatment effect
  • 303.
    2007Jan05 GCRC Research-SkillsWorkshop 303 Propensity Score example • Option 1: – Collect blood samples from all 10,500 patients. – Costly & impractical. • Option 2: – For all patients, estimate Pr(E+|other HIV drugs & prognostic factors). – For each E+ patient, find E- patient with closest propensity score. – Continue until all E+ patients match with E- patient. – Collect blood sample from 500 propensity-matched pairs. • Panel of 7 specialists in critical care specified variables related to decision • age, sex, yrs of education, medical insurance, primary & secondary disease category, admission dx • Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am
  • 304.
    Interrupted Time Series •A time series is simply a set of measurements of a variable taken at various points in time • In an interrupted time-series design, a time series (the dependent variable) is interrupted (usually near the middle) by the manipulation of the independent variable. • This design uses several waves of observation before and after the introduction of the independent (treatment) variable X • O1 O2 O3 O4 X O5 O6 O7 O8
  • 305.
    Interrupted Time SeriesControl Series Design • Control Series Design pg230 • The addition of a second time series for a comparison group helps to provide a check on some of the threats to validity of the Single Interrupted Time Series Design(previous slide),especially history • Group A: O1 O2 O3 O4 X O5 O6 O7 O8 Group B: O1 O2 O3 O4 - O5 O6 O7 O8 • This design is like a pretest-posttest design but with multiple pretests and multiple posttests. The advantage of this approach is that it provides greater confidence that the change in the dependent variable was caused by the manipulation and is not just a random fluctuation.
  • 306.
    Developmental Research Designs •Developmental research studies how individuals change as a function of age. Can adopt two general approaches to studying individuals of different ages Researchers might select groups of people who are remarkably similar in most areas, but differ only in age • Cross-sectional studies are designed to look at a variable at a particular point in time. Longitudinal studies involve taking multiple measures over an extended period of time, while cross-sectional research is focused on looking at variables at a specific point in time Cross sectional designs are more common as they cost less and provide immediate results allowing comparisons across various groups
  • 307.
    Developmental Research Designs •Disadvantages of cross-sectional research. Researcher must infer that that the differences among age groups are due to development but this variable (development) is not directly observed but is based on comparisons of different cohorts of individuals • A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born on a certain day or period, are exposed to a drug or vaccine or pollutant, or undergo a certain medical procedure)
  • 309.
    Cross-Sectional, Longitudinal &Sequential Studies • Longitudinal studies are the best way to study changes as people grow older and also the best way to study how scores on a variable at one age are related to another variable at a later age although attrition (loss of subjects) from the study is often problematic • Sequential method combines the cross-sectional and longitudinal methods. In a Study by Orth et al. different age groups were formed and compared (e.g. 25-34;35-44;45-54 etc.) (cross-sectional) but then each person is measured a second time (longitudinal)
  • 310.
    Self-Esteem Development FromYoung Adulthood to Old Age:A Cohort-Sequential Longitudinal Study Orth, Trzesniewski & Robins , JPSP, 2010, Vol. 98, No. 4, 645–658 • The authors examined the development of self-esteem from young adulthood to old age. Data came from the Americans’ Changing Lives study, which includes 4 assessments across a 16-year period of a nationally representative sample of 3,617 individuals aged 25 years to 104 years. Latent growth curve analyses indicated that self-esteem follows a quadratic trajectory across the adult life span, increasing during young and middle adulthood, reaching a peak at about age 60 years, and then declining in old age. No cohort differences in the self-esteem trajectory were found. Women had lower self-esteem than did men in young adulthood, but their trajectories converged in old age. Whites and Blacks had similar trajectories in young and middle adulthood, but the self-esteem of Blacks declined more sharply in old age than did the self- esteem of Whites. More educated individuals had higher self-esteem than did less educated individuals, but their trajectories were similar. Moreover, the results suggested that changes in socioeconomic status and physical health account for the decline in self-esteem that occurs in
  • 311.
  • 312.
    Controlling for Threatsto Validity pg224-227 • 1) History: did some other current event effect the change in the dependent variable? • 2) Maturation: were changes in the dependent variable due to normal developmental processes? • 3) Statistical Regression: did subjects come from very low or high performing groups? • 4) Selection: were the subjects self-selected or non randomly selected into experimental and control groups, which could affect the dependent variable? • 5) Experimental Mortality: did some subjects drop out? did this affect the results? • 6) Testing: Did the pre-test affect the scores on the post-test? • 7) Instrumentation: Did the measurement method change during the research? • 8) Design contamination: did the control group find out about the experimental treatment? did either group have a reason to want to make the research succeed or fail?
  • 313.
    Odds Ratio• Instatistics, the odds ratio (usually abbreviated “OR”) is one of three main ways to quantify how strongly the presence or absence of property A is associated with the presence or absence of property B in a given population. If each individual in a population either does or does not have a property “A”, (e.g. "high blood pressure”), and also either does or does not have a property “B” (e.g. “moderate alcohol consumption”) where both properties are appropriately defined, then a ratio can be formed which quantitatively describes the association between the presence/absence of "A" (high blood pressure) and the presence/absence of "B" (moderate alcohol consumption) for individuals in the population. This ratio is the odds ratio (OR) and can be computed following these steps: • 1) For a given individual that has "B" compute the odds that the same individual has "A" • 2) For a given individual that does not have "B" compute the odds that the same individual has "A" • 3) Divide the odds from step 1 by the odds from step 2 to obtain the odds ratio (OR) • If the OR is greater than 1, then having “A” is considered to be “associated” with having “B” in the sense that the having of “B” raises (relative to not-having “B”) the odds of having “A”. Note that this is not enough to establish that B is a contributing cause of “A”: it could be that the association is due to a third property, “C”, which is a contributing cause of both “A” and “B”
  • 314.
    Understanding Research ResultsChp 12 • Because experimenters must calculate the size of differences that chance is likely to produce and compare them with the differences they actually observe, they necessarily become involved with probability theory and its application to statistics • True experiments satisfy three conditions: the experimenter sets up two or more conditions whose effects are to be evaluated subsequently; persons or groups of persons are then assigned strictly at random, that is, by chance, to the conditions; the eventual differences between the conditions on the measure of effect (for example, the pupils' achievement in each of two or more learning conditions) are compared with differences of chance or random magnitude G.Glass Arizona State University
  • 315.
    Understanding Research Results •Statistics used in two ways to understand and interpret research • 1) Statistics are used to describe data • 2) Statistics are used to draw inferences • Review Scales of Measurement (which have important implications for the way data are described and analyzed) • Nominal scales–categorical, do not imply any ordering among the responses • Ordinal Scales-rank order the levels of a variable (category) being studied. nothing is specified about the magnitude of the interval between the two measures • Interval scales -intervals have the same interpretation throughout in that the intervals between the numbers are equal in size. However there is no absolute zero on the scale • Ratio scales- most informative scale. An interval scale with the additional property that its zero position indicates the absence of the quantity being measured
  • 316.
    Understanding Research Results •Three basic ways to describe results of variables studied • 1) Comparing group percentages (e.g. percent of males vs females who like to travel) • 2) Correlating scores of individuals on two variables (e.g. do students sitting in the front of the class receive better grades) • 3) Comparing group means (mean number of aggressive acts by children who witnessed an adult model aggression compared to mean number of aggressive acts by children who did not witness an adult model be aggressive) • Frequency Distributions- indicate the number of individuals who receive each possible score on a variable (pg243) Often these distributions are graphed Raw Data -Data collected in original form. Frequency- The number of times a certain value or class of values occurs. Frequency Distribution The organization of raw data in table form with classes and frequencies
  • 317.
    Graphing Frequency Distributions •Pie Charts-The frequency determines the size of the slice
  • 319.
    Graphing Frequency Distributions •Bar Graphs-separate bar for each piece of information X Axis-Horizontal Y axis-vertical bar graphs used when x-axis variable nominal • Frequency polygons- a line used to represent the distribution of frequency scores line graphs used when x-axis values numeric pg247
  • 320.
    Graphing Frequency Distributions •Histogram- uses bars to display a frequency distribution. Values are continuous (versus bar graph) with bars drawn next to each other
  • 321.
    Descriptive Statistics Descriptive statisticsis the discipline of quantitatively describing the main features of a collection of data or the quantitative description itself. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent Must have at least two statistics (characteristic of a sample) to describe a data set 1) measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while 2) measures of variability include the standard deviation (or variance) pg245-6
  • 322.
    Descriptive Statistics • Themean is an appropriate indicator of central tendency only when scores are measured on an interval or ratio scale because the actual values of the numbers are used in calculating the statistic
  • 323.
    Common Symbols (Greek) •Μ mu refers to a population mean; and x, to a sample mean. • σ sigma (lower case)refers to the standard deviation of a population; and s, to the standard deviation of a sample • N is the number of elements in a population. n is the number of elements in a sample • Σ is the summation symbol, used to compute sums over a range of values. Σx or Σxi refers to the sum of a set of n observations. Thus, Σxi = Σx = x1 + x2 + . . .
  • 324.
    Common Symbols (greek)•Letter Name • Α α alpha • Β β beta • Γ γ gamma • Δ δ delta • Ε ε epsilon • Ζ ζ zeta • Θ θ theta • Κ κ kappa • Λ λ lambda • Μ μ mu • Π π pi • Ρ ρ rho • Letter Name • Σ σ sigma • Κ κ kappa • Λ λ lambda • Μ μ mu • Π π pi • Ρ ρ rho • Σ σ sigma • Φ φ phi • Χ χ chi • Ψ ψ psi • Ω ω omega
  • 325.
    Central Tendency andVariability (dispersion) • A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. The mean (or average) is the most popular and well known pg245 measure of central tendency. its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by
  • 326.
    Median and Mode Table1. • 37 33 33 32 29 28 28 23 22 22 22 21 21 21 20 20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6 The median is the midpoint of a distribution: the same number of scores is above the median as below it. For the data in Table 1, there are 31 scores. The 16th highest score (which equals 20) is the median because there are 15 scores below the 16th score and 15 scores above the 16th score. The median can also be thought of as the 50th percentile. The mode is the most frequently occurring value. For the data in Table 1, the mode is 18
  • 327.
    Variability (Dispersion) • Theterms variability, spread, and dispersion are synonyms, and refer to how spread out a distribution is. • Range- The difference between the highest and lowest score • Variance- Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the pg 246 variance is defined as the average squared difference of the scores from the mean. The standard deviation is simply the square root of the variance
  • 328.
    Correlation and Prediction •Correlation refers to the degree of relationship between two variables • Regression-(Multiple) regression is a statistical tool used to derive the value of a criterion from several other independent, or predictor, variables. It is the simultaneous combination of multiple factors to assess how and to what extent they affect a certain outcome (y=X1+X2+X3. . . ETC.) • “The terms correlation, regression and predication are so closely related in statistics that they are often used interchangeably”- J.Roscoe • How would you test the hypothesis that “enhanced interrogation” results in useful intelligence What model would you use? RCT? Correlation? Regression?
  • 329.
    Correlation and strengthof relationships • A correlation coefficient is a statistic that describes how strongly variables are related to one another The most familiar correlation coefficient is the Pearson-product-moment coefficient. Pearson's r is a measure of the linear correlation (dependence)pg248 between two variables X and Y (does not describe curvilinear relationships)
  • 330.
    Correlation-Scatter Plot • 1is a perfect positive correlation • 0 is no correlation (the values don't seem linked at all) • -1 is a perfect negative correlation The value shows how good the correlation is and if it is positive or negative
  • 331.
    Correlation and strengthof relationships Restriction of range-One issue is that one variable or the other is sampled over too narrow of a range. This restriction of range, as it is called, makes the relationship seem weaker than it is Suppose we want to know the correlation between a test such as the SAT and freshman GPA. We collect SAT test scores from applicants and compute GPA at the end of the freshman year. If we use the SAT in admissions and reject applicants with low scores, we will have range restriction because there will be nobody in the sample with low test scores. If individuals in your sample are very similar you will have a restriction of range. Trying to understand the correlates of intelligence will be difficult if everyone in your sample is very similar in intelligence
  • 332.
    Effect Size • EffectSize refers to the strength of association between variables. The Pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between two variables pg 252 Cozby & Bates • The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program. These are both examples of "absolute effect sizes", meaning that they convey the average difference between two groups without any discussion of the variability within the groups. For example, if the weight loss program results in an average loss of 30 pounds, it is possible that every participant loses exactly 30 pounds, or half the participants lose 60 pounds and half lose no weight at all
  • 333.
    Effect Size Effect size-ameasure of the strength of a phenomenon (for example the change in an outcome after experimental intervention). The Pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between two variables Correlation coefficients indicating small effects range from.10-.20 medium effects ~.30 large effects above .40 (others say .50) Sometimes the squared value of r is reported which transforms the value into a percentage (this is also referred to as the percent of shared variance between the two variables. The correlation between gender and weight is about .70 (males weighing more than females) squaring the value of .70 results in .49% Therefore 49% of the difference in weight between males and females is accounted for by gender
  • 334.
    Effect Size • Aneffect size is a measure that describes the magnitude of the difference between two groups. Effect sizes are particularly valuable in best practices research because they represent a standard measure by which all outcomes can be assessed • An effect size is typically calculated by taking the difference in means between two groups and dividing that number by their combined (pooled) standard deviation. Intuitively, this tells us how many standard deviations’ difference there is between the means of the intervention (treatment) and comparison conditions; for example, an effect size of .25 indicates that the treatment group outperformed the comparison group by a quarter of a standard deviation.
  • 335.
    Effect Size continued •An effect size of 0.33 denotes that a treatment led to a one-third of a standard deviation improvement in outcome. Similarly, an effect size of 0.5 denotes a one-half of a standard deviation increase in outcome. Because effect sizes are based upon these mean and standard deviation scores it allows direct comparisons across studies • Cohen's d is an effect size used to indicate the standardized difference between two means • http://www.uccs.edu/~lbecker/
  • 336.
    Regression Equations • Theterms correlation, regression and prediction are so closely related in statistics that they are often used interchangeably-J.Roscoe Regression equations are calculations used to predict a person’s score on one variable when that person's score on another variable are already known- Cozby&Bates pg 253 • Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). https://www.youtube.com/watch?v=ocGEhiLwDVc
  • 337.
    Linear Regression • Intercept-thevalue at which the fitted line crosses the y-axis
  • 338.
    Multiple Correlation/Regression • Multiplelinear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. The dependent variable is affected by more than one independent variable • In simple linear regression, a criterion variable (Y)is predicted from one predictor variable(X). In multiple regression, the criterion is predicted by two or more variables Y=a + b1X1 + b2X2 + b3X3 • Example Y= Health rating of chosen city. X1 = death rate per 1000 residents X2 = doctor availability per 100,000 residents X3 = hospital availability per 100,000 residents X4 = annual per capita income in thousands of dollars X5 = population density people per square mile
  • 339.
    Multiple Correlation/Regression • PewResearch Center survey on Happiness (Y)-Results of multiple regression: Married people are happier than unmarrieds. People who worship frequently are happier than those who don’t. Republicans are happier than Democrats. Rich people are happier than poor people. Whites and Hispanics are happier than blacks. Sunbelt residents are happier than those who live in the rest of the country. • Also found some interesting non-correlations. People who have children are no happier than those who don’t, after controlling for marital status. Retirees are no happier than workers. Pet owners are no happier than those without pets
  • 340.
    Correlation/Regression Path Diagrams Simple correlation: r= .38 R = .38 Multiple Regression: r = .38 r = .30 R = .45 Parental Support Happiness Parental Support Happiness Self-esteem
  • 341.
    Partial Correlation • Extraneousor confounding variables are controlled in experimental research by keeping them constant or through randomization. This is harder to do in non experimental research. pg256 • One technique to control for such variables in non experimental research is to use partial correlation • A partial correlation is a correlation between the two variables of interest with the influence of the third variable removed from or “partialed out of” the original correlation –which tells you what the correlation between the primary variables would be if the third variable were held constant pg 256
  • 342.
    Partial Correlation • Insimple correlation, we measure the strength of the linear relationship between two variables, without taking into consideration the fact that both these variables may be influenced by a third variable. • The calculation of the partial correlation co-efficient is based on the simple correlation co-efficient. However, simple correlation coefficient assumes linear relationship. Generally this assumption is not valid especially in social sciences, as linear relationship rarely exists in such phenomena • It may be of interest to know if there is any correlation between X and Y that is NOT due to their both being correlated with Z. To do this you calculate a partial correlation.
  • 343.
    Partial Correlation • Ifyou calculate the correlation for subjects on each of three variables, X, Y, and Z and obtain the following • X versus Y: rXY = +.50 r2XY = .25 • X versus Z: rXZ = +.50 r2XZ = .25 • Y versus Z: rYZ = +.50 r2YZ = .25 • For each pair of variables—XY, XZ, and YZ— the variance overlap, is 25% • Partial correlation is a procedure that allows us to measure the region of three-way overlap precisely, and then to remove it from the picture in order to determine what the correlation between any two of the variables would be (hypothetically) if they were not each correlated with the third variable. Alternatively, you can say that partial correlation allows us to determine what the correlation between any two of the variables would be (hypothetically) if the third variable were held constant.
  • 344.
    Partial Correlation • or • rXY·Z=_____rXY—(rXZ)(rYZ)_______ sqrt[1—r2XZ] x sqrt[1—r2YZ] • rXY·Z =___ .50—(.50)(.50)___ sqrt[1—.25] x sqrt[1—.25] • rXY·Z =+.33 (therefore r2XY·Z = .11)
  • 345.
    Structural Equation Modeling(SEM) • SEMs are suited to both theory testing and theory development. Measurement is recognized as difficult and error-prone. Compared to regression and factor analysis, SEM is a relatively young field, having its roots in papers that appeared only in the late 1960s. As such, the methodology is still developing, and even fundamental concepts are subject to challenge and revision. This rapid change is a source of excitement for some researchers and a source of frustration for others. • Researchers typically construct path diagrams to represent the model being tested. Path Diagrams play a fundamental role in structural modeling. Path diagrams are like flowcharts. They show variables interconnected with lines(arrows) that are used to indicate causal flow
  • 346.
    Structural Equation Modeling(SEM) • Structural equation models go beyond ordinary regression models to incorporate multiple independent and dependent variables as well as hypothetical latent constructs that clusters of observed variables might represent http://www.youtube.com/watch?v=ZuX_QzZGjf0 start at 4’23” end at 11”30’
  • 347.
    Structural Equation Modeling(SEM)• Interpretation of path coefficients: First of all, they are not correlation coefficients. X and Y are converted to z-scores before conducting a simple regression analysis (path coefficients are regression coefficients converted into standardized z scores). • Interpreting path coefficients-Suppose we have a network with a path connecting from region A to region B. The meaning of the path coefficient (e.g., 0.81) is this: if region A increases by one standard deviation from its mean, region B would be expected to increase by 0.81 its own standard deviations from its own mean while holding all other relevant regional connections constant. With a path coefficient of -0.16, when region A increases by one standard deviation from its mean, region B would be expected to decrease by 0.16 its own standard deviations from its own mean while holding all other relevant regional connections constant • One of the nice things about SPSS is that it will allow you to start with a correlation matrix (you don’t need the raw data)
  • 348.
    Score Transformations-A scorehas meaning only as it is related to other scores Feet Inches 5.00 6.25 5.50 5.75 60 75 66 69 • Often it is necessary to transform data from one measurement scale to another. For example you might want to convert height measured in in inches. The table shows the heights of four people measured in both feet and inches. To transform feet to inches, you simply multiply by 12. (Similarly, to transform inches to feet, you divide by 12) Some conversions require that you multiply by a number and then add a second number. A good example of this is the transformation between degrees Centigrade and degrees Fahrenheit. The table below converts F to C temperatures of 4 US cities Houston 54 12.22 Chicago 37 2.78 The formula to transform Minneapolis 31 -0.56 Centigrade to Fahrenheit Miami 78 25.56
  • 349.
    Score Transformations The figurebelow shows a plot of degrees Centigrade as a function of degrees Fahrenheit. Notice that the points form a straight line. Such transformations are therefore called linear transformations Many transformations are not linear. With nonlinear transformations, the points in a plot of the transformed variable against the original variable would not fall on a straight line. Examples of nonlinear transformations are: square root, raising to a power, or logarithm. Question- Transforming distance in miles into distance in feet is a linear transformation. True or False This is a linear transformation because you multiply the distance in miles by 5,280 feet/mile
  • 350.
    Linear vs NonlinearScore Transformations • Transforming a variable involves using a mathematical operation to change its measurement scale. • Linear transformation. A linear transformation preserves linear relationships between variables. Therefore, the correlation between x and y would be unchanged after a linear transformation. Examples of a linear transformation to variable x would be multiplying x by a constant, dividing x by a constant, or adding a constant to x. • Nonlinear transformation. A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus, changes the correlation between variables. Examples of a nonlinear transformation of variable x would be taking the square root of x or the reciprocal of x. A logarithmic scale is a scale of measurement that displays the value of a physical quantity using intervals corresponding to orders of magnitude, rather than a standard linear scale
  • 351.
    Linear vs NonlinearScore Transformations The Richter magnitude scale (often shortened to Richter scale) was developed to assign a single number to quantify the energy that is released during an earthquake. The scale is a base-10 logarithmic scale. An earthquake that measures 5.0 on the Richter scale has a shaking amplitude 10 times larger than one that measures 4.0, and corresponds to a 31.6 times larger release of energy http://www.matter.org.uk/schools/Content/S eismology/richterscale.html
  • 353.
    Linear vs NonlinearScore Transformations • Transforming scores from raw scores into transformed scores has two purposes: 1) It gives meaning to the scores and allows some kind of interpretation of the scores, 2) It allows direct comparison of two scores • Linear transformation-As one side changes the other changes in equal proportions. Converting the score into percentile ranks is one way of transforming scores The scale of the percentile rank is a non-linear transformation of that of the raw score, meaning that at different regions on the raw score scale, a gain of 1 point may not correspond to a gain of one unit or the same magnitude on the percentile rank scale
  • 354.
    Percentile Rank Transformation •PR=100/N (cf-f/2) ; PR of 17=100/150 (64-21/2)=36
  • 355.
    Linear Score Transformations •By itself, a raw score or X value provides very little information about how that particular score compares with other values in the distribution. • A score of X = 53, for example, may be a relatively low score, or an average score, or an extremely high score depending on the mean and standard deviation for the distribution from which the score was obtained. • If the raw score is transformed into a z-score, however, the value of the z-score tells exactly where the score is located relative to all the other scores in the distribution. The formula for computing the z-score for any value of X is z = X – μ σ
  • 356.
    Linear Score Transformations-ZScores • z = 0 is in the center (at the mean), and the extreme tails correspond to z-scores of approximately –2.00 on the left and +2.00 on the right. • Although more extreme z-score values are possible, most of the distribution is contained between z = –2.00 and z = +2.00. • M=0,SD=1
  • 357.
    357 z-Scores as aStandardized Distribution The advantage of standardizing distributions is that two (or more) different distributions can be made the same. – For example, one distribution has μ = 100 and σ = 10, and another distribution has μ = 40 and σ = 6. – When these distribution are transformed to z-scores, both will have μ = 0 and σ = 1. – A z-score of +1.00 specifies the same location in all z-score distributions.
  • 358.
    Understanding Research Results StatisticalInference Chp 13 • Inferential statistics allow researchers to assess 1) how their results reflect the larger population (Do the differences observed in the sample means reflect the difference in the population means?)and 2) the likelihood that their results are repeatable (replicable) • Even in establishing the equivalence between groups (via controlling certain variables and randomization) the difference between the sample means is almost never zero (equivalence is not perfect)
  • 359.
    Statistical Inference • Inusing statistical inference we begin with a null and a research hypothesis • Null hypothesis H0 - there is no relationship between two measured phenomena (it is assumed true until evidence indicates otherwise) H0: μ1 = μ2 • Research or alternative hypothesis H1 μ1 μ2 can be just the negation of the null hypothesis • If we can determine that the null hypothesis is incorrect then we can accept the alternate (research) hypothesis which is that the independent variable did have an effect on the dependent variable
  • 360.
    Statistical significance, probabilityand sampling distributions • A significant result is one that has a very low probability of occurring by chance if the population means are equal • Using probability theory and the normal curve, we can estimate the probability of being wrong • Probability is the likelihood of the occurrence of some event. The probability required for significance is called the alpha level with the most common alpha probability used being set at .05 (the outcome of the study is considered significant when there is a probability of .05 or less that the results were due to chance-statistical significance is based on probability distributions)
  • 361.
    Statistical significance, probability andsampling distributions • The Sampling distribution is the probability distribution of a given statistic based on a random sample • The more observations sampled the more likely you are to obtain an accurate estimate of the true population value • http://onlinestatbook.com/stat_sim/sampling_dist/
  • 362.
    Statistical Tests t-Testand F test • The t-distribution is a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown • t-Test assumes continuous data (interval or ratio)
  • 364.
    Statistical Tests t-Testand F test • The t-value is calculated using the formula as shown; t-value equals the mean difference divided by the difference in standard deviations • Degrees of freedom-The number of degrees of freedom is equal to the number of observations minus the number of algebraically independent linear restrictions placed on them • In an array of four scores 2,3,5,and 6 and knowing the mean (M=4) only the first three scores are free to vary while the last score drawn is not free to vary. Therefore df=3 (df=n-1) • http://web.mst.edu/~psyworld/texample.htm best • http://faculty.clintoncc.suny.edu/faculty/michael.gregory/fil es/shared%20files/Statistics/Examples_t_Test.htm Use # 3
  • 365.
    Statistical Tests t-Testand F test • One-tailed versus two-tailed tests- If the test statistic is always positive (or zero), only the one-tailed test is generally applicable, while if the test statistic can assume positive and negative values, both the one-tailed and two- tailed test are of use-if you are hypothesizing a difference but not predicting direction then it will be a two tailed test • An example of when one would want to use a two-tailed test is at a candy production/packaging plant. Let's say the candy plant wants to make sure that the number of candies per bag is around 50. The factory is willing to accept between 45 and 55 candies per bag. It would be too costly to have someone check every bag, so the factory selects random samples of the bags, and tests whether the average number of candies exceeds 55 or is less than 45
  • 366.
    Example of t-Test •Hypothesis-people who are allowed to sleep for only four hours will score significantly lower than people who are allowed to sleep for eight hours on a cognitive skills test. Sixteen subjects are recruited in the sleep lab and randomly assigned to one of two groups. In one group subjects sleep for eight hours and in the other group subjects sleep for four and all are given a cognitive test the next day groups Scores • df=n-1+n-1=14 8 hours sleep group (X) 5 7 5 3 5 3 3 9 4 hours sleep group (Y) 8 1 4 6 6 4 1 2 • Mx=5 My=4
  • 368.
    α (1 tail)0.05 0.025 0.01 0.005 0.0025 0.001 0.0005 α (2 tail) 0.1 0.05 0.02 0.01 0.005 0.002 0.001 df 1 6.3138 12.7065 31.8193 63.6551 127.3447 318.4930 636.0450 2 2.9200 4.3026 6.9646 9.9247 14.0887 22.3276 31.5989 3 2.3534 3.1824 4.5407 5.8408 7.4534 10.2145 12.9242 4 2.1319 2.7764 3.7470 4.6041 5.5976 7.1732 8.6103 5 2.0150 2.5706 3.3650 4.0322 4.7734 5.8934 6.8688 6 1.9432 2.4469 3.1426 3.7074 4.3168 5.2076 5.9589 7 1.8946 2.3646 2.9980 3.4995 4.0294 4.7852 5.4079 8 1.8595 2.3060 2.8965 3.3554 3.8325 4.5008 5.0414 9 1.8331 2.2621 2.8214 3.2498 3.6896 4.2969 4.7809 10 1.8124 2.2282 2.7638 3.1693 3.5814 4.1437 4.5869 11 1.7959 2.2010 2.7181 3.1058 3.4966 4.0247 4.4369 12 1.7823 2.1788 2.6810 3.0545 3.4284 3.9296 4.3178 13 1.7709 2.1604 2.6503 3.0123 3.3725 3.8520 4.2208 14 1.7613 2.1448 2.6245 2.9768 3.3257 3.7874 4.1404 15 1.7530 2.1314 2.6025 2.9467 3.2860 3.7328 4.0728 16 1.7459 2.1199 2.5835 2.9208 3.2520 3.6861 4.0150 17 1.7396 2.1098 2.5669 2.8983 3.2224 3.6458 3.9651 18 1.7341 2.1009 2.5524 2.8784 3.1966 3.6105 3.9216 19 1.7291 2.0930 2.5395 2.8609 3.1737 3.5794 3.8834 20 1.7247 2.0860 2.5280 2.8454 3.1534 3.5518 3.8495 21 1.7207 2.0796 2.5176 2.8314 3.1352 3.5272 3.8193
  • 369.
    Statistical Tests t-Testand F test • The F test is an extension of the t test. If a study has only one independent variable with two groups then F and t are basically identical. With more than two levels of an independ. variable and when there are two or more independent variables in a factorial design. Similar to the t, the larger the F ratio the more likely it is that the results are significant • The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two variances (Analysis of Variance-ANOVA) Each Mean Square = SS/df • • http://www.chem.utoronto.ca/coursenotes/analsci/StatsTuto rial/ftest.html
  • 370.
    Zebras Taking Flight •A z-test is used for testing the mean of a population versus a standard, or comparing the means of two populations, with large (n ≥ 30) samples whether you know the population standard deviation or not • A t-test is used for testing the mean of one population against a standard or comparing the means of two populations • An F-test is used to compare 2 populations’ variances. The samples can be any size. It is the basis of ANOVA. The F-test is designed to test if two population variances are equal This is the F-test, and plays an important role in the analysis of variance
  • 371.
    Chi-square test • TheChi-square test is intended to test how likely it is that an observed distribution is due to chance • The "t" test and the F test are called parametric tests. They assume certain conditions about the parameters of the population from which the samples are drawn(assume interval or ratio data). • Parametric and nonparametric statistical procedures test hypotheses involving different assumptions • Parametric statistics test hypotheses based on the assumption that the samples come from populations that are normally distributed. Nonparametric tests make fewer and less stringent assumptions than their parametric counterparts. Nonparametric tests usually result in loss of efficiency
  • 373.
    Chi-Square example • Supposethat the ratio of male to female students in the Science Faculty is exactly 1:1, but in the Pharmacology Honors class over the past ten years there have been 80 females and 40 males. Is this a significant departure from expectation? Now we must compare our X2 value with a(chi squared) value in the X2 table with n-1 degrees of freedom (where n is the number of categories, i.e. 2 in our case - males and females) If our calculated value of X2 exceeds the critical value of then we have a significant difference Female Male Total Observed numbers (O) 80 40 120 Expected numbers (E) 60*3 60*3 120 *1 O - E 20 -20 0 *2 (O-E)2 400 400 (O-E)2 / E 6.67 6.67 13.34 = X2
  • 374.
    Freedom Probability, p 0.99 0.950.05 0.01 0.001 1 0.000 0.004 3.84 6.64 10.83 2 0.020 0.103 5.99 9.21 13.82 3 0.115 0.352 7.82 11.35 16.27 4 0.297 0.711 9.49 13.28 18.47 5 0.554 1.145 11.07 15.09 20.52 6 0.872 1.635 12.59 16.81 22.46 7 1.239 2.167 14.07 18.48 24.32 8 1.646 2.733 15.51 20.09 26.13 9 2.088 3.325 16.92 21.67 27.88 10 2.558 3.940 18.31 23.21 29.59 11 3.05 4.58 19.68 24.73 31.26 12 3.57 5.23 21.03 26.22 32.91 13 4.11 5.89 22.36 27.69 34.53 14 4.66 6.57 23.69 29.14 36.12 15 5.23 7.26 25.00 30.58 37.70 16 5.81 7.96 26.30 32.00 39.25
  • 375.
    Statistical Significance• Thegoal of a test is to allow you to make a decision about your results. Significance levels show you how likely a result is due to chance. The most common level, used to mean something is good enough to be believed, is .95 (.05)This means that the finding has a 95% v chance of being true. When you have a large sample size, very small differences will be detected as significant (.05 is the traditional level chosen). • The more analyses you perform on a data set, the more results will meet "by chance" the conventional significance level. For example, if you calculate many correlations between different variables then you should expect to find by chance that one in every 20 correlation coefficients are significant at the p .05 level, even if the values of the variables were totally random and those variables do not correlate in the population
  • 376.
    Type I andType II Errors • The decision to reject the null hypothesis is based on probabilities rather than certainties. In reviewing the decision matrix below there are two possible decisions (reject or accept the null Hypothesis) and two possible truths (the null hypothesis is true or false). There are also two correct decisions (correctly accepting the H0 when it is true and correctly rejecting the H0 when it is false) and two errors • Type I error-we reject the H0 when it is true and Type II error we accept the H0 when it is false • Decision matrix
  • 377.
    Type I andType II Errors • A test's probability of making a type I error is denoted by α. A test' probability of making a type II error is denoted by β • Type I errors occur when we obtain a large value (t or F) by chance and we incorrectly decide that the Ind.Var. had an effect when the significance level set to reject the H0 is .05 then the probability of a Type I error is .05 (α). The rate of the type II error is denoted by the Greek letter β (beta) and related to the power of a test (which equals 1−β). The power of a statistical test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false (i.e. the probability of not committing a Type II error). -blood tests for a disease will falsely detect the disease in some proportion of people who don't have it, and will fail to detect the disease in some proportion of people who do have it
  • 378.
    Type I andII errors • If a jury in a criminal trial must decide guilt or innocence the example of error type remains same pg 274-5 H0 = person is innocent • H0 true H0 false • Reject H0 Accept H0 • Type I error reject the null when it is true- We may obtain a large t of F value by chance Type I error is determined by choice of significance level (α) With .05 α then 5 out of 100 times (1 out of 20) may make mistake. Can change α to .01 to lessen error. Type II error occurs when we accept the null but the null is incorrect. Probability of Type II is β and is low. If we lower the significance level (e.g. .001) makes it more difficult to reject the null hypothesis decreasing chances of Type II error but it increases chances of Type I error (Use decision grid for marriage- Which error is worse? Guilty Type 1 error α Guilty Correct decision 1-β Innocent Correct decision 1-α Innocent Type II error β
  • 379.
    Choosing a SignificanceLevel • Researchers traditionally use either a .05 or a .01 significance level For a juror which type error is more serious? Type I or Type II; for a physician Type I or Type II? • false positive false negative false positive false negative Found guilty (incorrect) Type I Found guilty (correct) Found innocent (correct) Found innocent (incorrect) Type II H0 = not guilty Operate incorrectly Type I error Operate correctly Don’t operate correctly Don’t operate incorrectly Type II error H0 = no operation needed
  • 380.
    Significance • Research isdesigned to demonstrate that there is a relationship between variables not to say that the variables are unrelated (i.e. accepting the null Hypothesis) • A study may come up with nonsignificant results when there is an effect (type II error) due to inadequate explanation to subjects, a weak manipulation or a measure of the dependent variable that is not reliable etc. (see threats to validity) A meaningful result is more likely to be over looked when the significance level is very low(.001) Type II error pg 278 • Type II errors may result from too small sample sizes and effect sizes. However while nonsignificant results do not necessarily indicate that the null hypothesis is correct, significant results do not necessarily indicate a meaningful relationship. As your sample size increases, so does the
  • 381.
    Long-term psychosocial consequencesof false-positive screening mammography Brodersen J & Siersma VD, Ann Fam Med. 2013 Mar-Apr;11(2):106-15 • PURPOSE: Cancer screening programs have the potential of intended beneficial effects, but they also inevitably have unintended harmful effects. In the case of screening mammography, the most frequent harm is a false-positive result. Prior efforts to measure their psychosocial consequences have been limited by short-term follow-up, the use of generic survey instruments, and the lack of a relevant benchmark-women with breast cancer. • METHODS: In this cohort study with a 3-year follow-up, we recruited 454 women with abnormal findings in screening mammography over a 1-year period. For each woman with an abnormal finding on a screening mammogram (false and true positives), we recruited another 2 women with normal screening results who were screened the same day at the same clinic. These participants were asked to complete the Consequences of Screening in Breast Cancer-a validated questionnaire encompassing 12 psychosocial outcomes-at baseline, 1, 6, 18, and 36 months. • RESULTS: Six months after final diagnosis, women with false-positive findings reported changes in existential values and inner calmness as great as those reported by women with a diagnosis of breast cancer (Δ = 1.15; P = .015; and Δ = 0.13; P = .423, respectively). Three years after being declared free of cancer, women with false-positive results consistently reported greater negative psychosocial consequences compared with women who had normal findings in all 12 psychosocial outcomes (Δ >0 for 12 of 12 outcomes; P <.01 for 4 of 12 outcomes) • CONCLUSION: False-positive findings on screening mammography causes long-term psychosocial harm: 3 years after a false-positive finding, women experience psychosocial consequences that range between those experienced by women with a normal mammogram
  • 382.
    Choosing a samplesize: Power analysis • We can select a sample size on the basis of desired probability of correctly rejecting the null hypothesis This probability is called the power of the statistical test Power =1-p(Type II error) • Power refers to the probability that your test will find a statistically significant difference when such a difference actually exists. In other words, power is the probability that you will reject the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted that power should be .8 or greater; that is, you should have an 80% or greater chance of finding a statistically significant difference when there is one • http://meera.snre.umich.edu/plan-an-evaluation/related-topics/power-analysis- statistical-significance-effect-size • http://www.surveysystem.com/sscalc.htm#one
  • 383.
    Replications • Scientists donot attach too much importance to the results of a single study. Better understanding comes from integrating the results of numerous studies of the same variable(s) pg280 • Replicating Milgram-Would People Still Obey Today?Jerry M. Burger Santa Clara University • Seventy adults participated in a replication of Milgram’s Experiment 5 up to the point at which they first heard the learner’s verbal protest (150 volts). Because 79% of Milgram’s participants who went past this point continued to the end of the shock generator’s range, reasonable estimates could be made about what the present participants would have done if allowed to continue. • Obedience rates in the 2006 replication were only slightly lower than those Milgram found 45 years earlier. Contrary to expectation, participants who saw a confederate refuse the experimenter’s instructions obeyed as often as those who saw no model. Men and women did not differ in their rates of obedience, but there was some evidence that individual differences in empathic concern and desire for control affected participants’ responses.
  • 384.
    Replicating Milgram • 79%of the people who continued past 150 volts (26 of 33) went all the way to the end of the shock generator’s range. In short, the 150-volt switch is something of a point of no return. Nearly four out of five participants who followed the experimenter’s instructions at this point continued up the shock generator’s range all the way to 450 volts. This observation suggests a solution to the ethical concerns about replicating Milgram’s research. Knowing how people respond up to and including the 150-volt point in the procedure allows one to make a reasonable estimate of what they would do if allowed to continue to the end. Stopping the study within seconds after participants decide what to do at this juncture would also avoid exposing them to the intense stress Milgram’s participants often experienced in the subsequent parts of the procedure.
  • 385.
    Replicating Milgram • Burgerscreened out any potential subjects who had taken more than two psychology courses in college or who indicated familiarity with Milgram’s research. A clinical psychologist also interviewed potential subjects and eliminated anyone who might have a negative reaction to the study procedure. • In Burger’s study, participants were told at least three times that they could withdraw from the study at any time and still receive the $50 payment. Also, these participants were given a lower- voltage sample shock to show the generator was real – 15 volts, as compared to 45 volts administered by Milgram. • Several of the psychologists writing in the same issue of American Psychologist questioned whether Burger’s study is truly comparable to Milgram’s, although they acknowledge its usefulness.
  • 386.
    Computer Analysis ofData • Most analysis is carried out via computer programs such as SPSS, SAS,SYSTAT and others although the general procedure are very similar in all of the programs
  • 387.
    Selecting the appropriateStatistical Test • Parametric statistical procedures rely on assumptions about the shape of the distribution (i.e., assume a normal distribution) in the underlying population and about the form or parameters (i.e., means and standard deviations) of the assumed distribution. Nonparametric statistical procedures rely on no or few assumptions about the shape or parameters of the population distribution from which the sample was drawn • http://www.ats.ucla.edu/stat/mult_pkg/whatstat/choosestat.html
  • 388.
    Parametric vs. Nonparametrictests • Parametric and nonparametric are two broad classifications of statistical procedures. • Parametric tests are based on assumptions about the distribution of the underlying population from which the sample was taken. The most common parametric assumption is that data are approximately normally distributed. • Nonparametric tests do not rely on assumptions about the shape or parameters of the underlying population distribution. If the data deviate strongly from the assumptions of a parametric procedure, using the parametric procedure could lead to incorrect conclusions. If you determine that the assumptions of the parametric procedure are not valid, use an analogous nonparametric procedure instead. • The parametric assumption of normality is particularly worrisome for small sample sizes (n < 30). Nonparametric tests are often a good option for these data. • Nonparametric procedures generally have less power for the same sample size than the corresponding parametric procedure if the data truly are normal. Interpretation of nonparametric procedures can also be more difficult than for parametric procedures.
  • 389.
    Review of Scalesof Measurement• A categorical variable, also called a nominal variable, is for mutual exclusive, but not ordered, categories. For example, your study might compare five different genotypes. You can code the five genotypes with numbers if you want, but the order is arbitrary and any calculations (for example, computing an average) would be meaningless. • A ordinal variable, is one where the order matters but not the difference between values. For example, you might ask patients to express the amount of pain they are feeling on a scale of 1 to 10. A score of 7 means more pain that a score of 5, and that is more than a score of 3. But the difference between the 7 and the 5 may not be the same as that between 5 and 3. The values simply express an order. • A interval variable is a measurement where the difference between two values is meaningful. The difference between a temperature of 100 degrees and 90 degrees is the same difference as between 90 degrees and 80 degrees. • A ratio variable, has all the properties of an interval variable, and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable. Variables like height, weight, enzyme activity are ratio variables. Temperature, expressed in F or C, is not a ratio variable. A temperature of 0.0 on either of those scales does not mean 'no temperature'. A temperature of 100 degrees C is not twice as hot as 50 degrees C, because temperature C is not a ratio variable. A pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable
  • 391.
    Nonparametric vs ParametricTests • Nonparametric statistical tests • Nonparametric statistical tests are used instead of the parametric tests we have considered thus far (e.g. t-test; F-test), when: • The data are nominal or ordinal (rather than interval or ratio). • The data are not normally distributed, or have heterogeneous variance (despite being interval or ratio). • The following are some common nonparametric tests: Chi square: • 1. used to analyze nominal data • 2. compares observed frequencies to frequencies that would be expected under the null hypothesis • Mann-Whitney U • 1. compares two independent groups on a DV measure with rank-ordered (ordinal) data • 2. nonparametric equivalent to a t-test • Wilcoxon matched-pairs test • 1. used to compare two correlated groups on a DV measured with rank-ordered (ordinal) data • 2. nonparametric equivalent to a t-test for correlated samples • Kruskal-Wallis test • 1. used to compare two or more independent groups on a DV.with rank-ordered (ordinal) data • 2. nonparametric alternative to one-way ANOVA
  • 392.
    Generalizing Results Chp14 ExternalValidity is the extent to which findings may be generalized Even though a researcher randomly assigns participants to experimental conditions rarely are those subjects randomly selected from the general population; subjects are selected because they are available (e.g college freshmen and sophomores who must fulfill course requirements) Such subjects represent a very restricted population and as they are often older adolescents usually have a developing sense of identity, social and political attitudes that are also developing and a high need for peer approval These student/subjects are rather homogenous as a group but different from older adults. We know about general principles of psychological functioning may be limited to a select and unusual group Although the use of rats is convenient many research findings have been applied to humans particularly in the fields of memory, sexuality, drugs, brain function etc.
  • 393.
    Generalizing Research results•While college students represent a ready group of volunteers those researchers using different populations are even more dependent on volunteers than university researchers. Volunteers may be a unique population • However college student populations are increasingly diverse and representative of society. Studies with certain college populations are replicated at other colleges using different mixes of students and many studies are later replicated with other populations • Rosenthal and Rosnow (1975) stated that volunteers tend to be more highly educated, higher SES and more social • Different kinds of people volunteer for different kinds of experiments. Titles of experiments may change who volunteers (e.g. “problem solving vs. interaction in small groups”) pg 289 • Internet surveys also solicit volunteers. Those individuals who use the internet more frequently. Higher internet use is associated with living in an urban area, being younger and college educated, and having a higher
  • 394.
    Gender and subgroups •A study published in July 2006 in Genome Research compared the levels of gene expression in male and female mice and found that 72 percent of active genes in the liver, 68 percent of those in fat, 55.4 percent of the ones in muscle, and 13.6 percent of genes in the brain were expressed in different amounts in the sexes. • In an analysis of 163 new drug applications submitted to the Food and Drug Administration between 1995 and 2000 that included a sex analysis, drug concentrations in blood and tissues from men and women in 11 of the drugs varied by as much as 40 percent. However, the applications included no sex- based dosing recommendations. Source Melinda Wenner Moyer Slate Magazine
  • 395.
    Gender and subgroupsNature465, 665 (10 June 2010) éditorial • Admittedly, there can be legitimate reasons to skew the ratios. For instance, researchers may use male models to minimize the variability due to the estrous cycle, or because males allow them to study the Y chromosome as well as the X. And in studies of conditions such as heart disease, from which female mice are thought to be somewhat protected by their hormones, scientists may choose to concentrate on male mice to maximize the outcome under study • However justifiable these imbalances may be on a case-by-case basis, their cumulative effect is pernicious: medicine as it is currently applied to women is less evidence-based than that being applied to men Moreover, hormones made by the ovaries are known to influence symptoms in human diseases ranging from multiple sclerosis to epilepsy. apart from a few large, all-female projects, such as the Women's Health Study on how aspirin and vitamin E affect cardiovascular disease and cancer, women subjects remain seriously
  • 396.
    Gender and subgroups •Journals can insist that authors document the sex of animals in published papers — the Nature journals are at present considering whether to require the inclusion of such information. Funding agencies should demand that researchers justify sex inequities in grant proposals and, other factors being equal, should favor studies that are more equitable. • Drug regulators should ensure that physicians and the public alike are aware of sex-based differences in drug reactions and dosages. And medical-school accrediting bodies should impress on their member institutions the importance of training twenty-first-century physicians in how disease symptoms and drug responses can differ by sex.
  • 397.
    Hypothetical study onaggression and crowding for males and females pg291 • A B • High high • Aggression males Aggression males females females low low low crowding high low crowding high C D high high males Aggression males Aggression females females low crowding high low high Figure A males and females essentially equal no interaction Figure B main effect for crowding but also for gender Figure C Interaction between males and crowding no effect for females Figure D interaction Positive relationship for males and crowding with a negative relationship for females and crowding C and D results for males cannot be generalized to females
  • 398.
    Cultural Considerations Arnett etal (2008) state that psychology is built on the study of WEIRD (Western, Educated, Industrialized, Rich, Democratic) people pg293 Traditional theories of self concept are built upon western concepts of the self as separate or individualistic while in some other cultures self-esteem is derived more from the relationships to others “Asian-Americans are more likely to benefit from support that does not involve the sort of intense disclosure of personal stressful events and feeling that is the hallmark of support in many European American groups”pg 293 However many studies find similarities across cultures
  • 399.
    Generalizing from LaboratorySettings • Laboratory research has the advantage of studying the effect of an independent variable under highly controlled conditions but does the ‘artificiality’ of the laboratory limit its external validity? • Anderson, Lindsay and Bushman (1999) compared 38 pairs of studies for which there were similar laboratory and field studies on areas including aggression, helping memory and depression and found that the effect size of the independent variable on the dependent variable was very similar in the two types of studies (which raises the confidence in the external validity of the studies) pg296
  • 400.
    Replications • Replications area way of compensating for limitations in generalizing from any single study • An exact replication is an attempt to precisely follow the procedures of a study to determine if the same results will be obtained. An exact replication may be followed when a researcher is attempting to build on a previous study and wants to be confident in the external validity of the study to proceed with his/her own follow-up • Review the findings of the “Mozart Effect” in which students who listened to 10 minutes of a Mozart Sonata showed a higher performance on a spatial reasoning task (S-B-IQ scale) (Rauscher, Shaw and Ky,1993) which resulted in many failures to replicate the original result. An alternative explanation may be that the effect is limited to music that also increases arousal or that the original study made a type I error (Incorrect rejection of the null hypothesis) or that results occur only under special conditions pg297
  • 401.
    Conceptual Replications • Ina conceptual replication researchers attempt to understand the relationships between variables • One way this is accomplished is to redefine the operationalized definition of a variable. While the original definition of exposure to music was defined as a 10 minutes of the Mozart Sonata for two pianos in D minor a new operationalized definitions may include a different selection of Mozart or a different composer • When conceptual replications produces similar results this increases our confidence in the external validity of the original findings and demonstrates that the relationship between the theoretical variables holds
  • 402.
    Generalizations Literature Reviewsand Meta-Analyses • You can evaluate the external validity of a study by conducting a literature review which summarizes and evaluates a particular research area. The literature review synthesizes and provides information which • 1) summarizes what has been found to date 2) tells the reader what findings are strongly supported or not in the literature 3) points out inconsistencies in the findings and 4) discusses future direction for this area of research • Meta-analysis-gives a thorough summary of several studies that have been done on the same topic, and provides the reader with extensive information on whether an effect exists and what size that effect has. The analysis combines the results of a number of studies (e.g. by use of effect size) Traditional reviews do not usually calculate effect sizes or attempt to integrate information from different experimental designs used across studies cited but is a more qualitative approach while a meta-analysis is a more quantitative approach pg299
  • 403.
    Generalization and Variation •Variations in the service quality of medical practices Ly DP & Glied SA Am J Manag Care 2013 Nov 1;19(11) • There was substantial variation in the service quality of physician visits across the country. For example, in 2003, the average wait time to see a doctor was 16 minutes in Milwaukee but more than 41 minutes in Miami; the average appointment lag for a sick visit in 2003 was 1.2 days in west-central Alabama but almost 6 days in Northwestern Washington. Service quality was not associated with the primary care physician-to-population ratio and had varying associations with the organization of practices. CONCLUSIONS: • Cross-site variation in service quality of care in primary care has been large, persistent, and associated with the organization of practices. Areas with higher primary care physician-to-population ratios had longer, not shorter, appointment lags.
  • 404.
    Regional Differences inPrescribing Quality Among Elder Veterans and the Impact of Rural Residence Brian C. Lund Journal of Rural Health 29 (2013) 172–179 Regional variation often reflects discrepancies in the implementation of best practices, and comparisons of high versus low performing sites may identify mechanisms for improving performance. A recent analysis of national Medicare data revealed significant regional variation, with the highest concentration of potentially inappropriate prescribing found in the Southern United States and the lowest rates in the Northeast and upper Midwest.22 Similar geographic distributions of prescribing quality have been previously reported among older adults in both outpatient and inpatient settings. The most direct interpretation of these findings are differences in provider-level characteristics, where different approaches to pharmacotherapy lead to patients in low performing regions being exposed to riskier medication regimens. However, prescribing is also influenced by system-level factors such as differences in health system organization, access to prescription drug benefits, and higher copayments for newer (and potentially safer) medications.
  • 405.
    "Real World" AtypicalAntipsychotic Prescribing Practices in Public Child andAdolescent Inpatient Settings Elizabeth Pappadopulos, et al. Schizophrenia Bulletin, Vol. 28, No. 1, 2002 • The widespread use of atypical antipsychotics for youth treated in inpatient settings has been the focus of increasing attention, concern, and controversy. Atypical antipsychotic medications have supplanted traditional neuroleptics as first line treatments for schizophrenia and other psychotic disorders in adult populations. A similar trend has also been observed in the treatment of child and adolescent psychiatric patients, although data on the safety and efficacy of atypical agents in youth are scarce • Among child and adolescent inpatients, atypical antipsychotics are mainly prescribed for aggression rather than for psychosis. Current debates revolve around whether these agents are appropriately monitored and managed. In an effort to address these concerns, a survey was developed and administered to physicians at four facilities and to a group of 43 expert clinicians and researchers.
  • 406.
    "Real World" AtypicalAntipsychotic Prescribing Practices in Public Child and Adolescent Inpatient Settings • Taken together, these studies show that as many as 98 percent of children and adolescents in psychiatric hospitals are treated with psychotropic medications during their inpatient stay and approximately 45 percent to 85 percent of these patients receive multiple medications simultaneously. Antipsychotics are the most commonly prescribed agents across most inpatient settings for the treatment of aggression • While overall rates of psychotropic prescribing (ranging from 68% to 79% of patients) did not differ across inpatient units, preferences for particular classes of medications varied by facility. In addition, a higher percentage of patients were given antipsychotics in the county-university hospital (74%) than in the State hospital (57%) or the private hospital (35%). While these trends may be due to differences in the patient populations treated at each facility, Kaplan and Busner note that the use of antipsychotics for nonpsychotic disorders was statistically equivalent across settings.
  • 407.
    "Real World" AtypicalAntipsychotic Prescribing Practices in Public Child and Adolescent Inpatient Settings • Atypical antipsychotics represent a major advance in the treatment of schizophrenia and psychosis among adults because of their superior efficacy and side effect profile in comparison to conventional antipsychotics. However, because these benefits have not been reliably established in children (Sikich 2001), antipsychotic prescribing practices for child and adolescent psychiatric inpatients have largely developed from clinical experience rather than from scientific evidence. • A recent literature review shows that published data on treatments for aggression are primarily from open studies and case reports. Much of the research conducted involves aggressive youth with compromised intelligence and are not easily applied to the general population of youngsters with aggressive behavior problems.
  • 408.
    "Real World" AtypicalAntipsychotic Prescribing Practices in Public Child and Adolescent Inpatient Settings • Concerns about side effects, such as weight gain, elevated prolactin levels, and abnormal electrocardiograms, especially in children, have yet to be resolved by research. In the face of limited data from clinical trials, intensive study is needed on factors that influence physicians' antipsychotic prescribing preferences and that result in unnecessary treatment variability. • Taken together, the audit of patient charts reveals much- needed real-world information about the administration of antipsychotics and other psychotropic medications in this set of public inpatient facilities for children and adolescents. The children and adolescents treated in these settings represent a particularly severe and comorbid patient population. Despite the fact that inpatient youth diagnosed with psychosis accounted for only a fraction (20%) of the population, antipsychotics were commonly prescribed in this sample and were often used in combination with other agents.
  • 409.
    "Real World" AtypicalAntipsychotic prescribing practices Antipsychotics are administered to children and adolescents in public inpatient settings in high proportions for complex comorbid conditions involving aggression. Ironically, this real-world patient population is excluded from clinical research, leaving clinicians to rely on clinical experience rather than empirical evidence, data reveal that there are great disparities in the use of antipsychotics across facilities, and this may be due in part to the lack of available data to guide these practices. Several findings regarding the administration of psychotropic medications surprised us and raised important areas of concern. The number and proportion of medications on admission were very similar to medication regimens at discharge. One would expect that after an average stay of more than 3 months, more adjustments would be made to the medication regimen. The rationales for this lack of change in treatment regimen are situation makes it difficult to determine whether and how changes in medication might affect
  • 410.
    Prescription practices • Theadministration of two or more psychotropic medications (polypharmacy) is also an area of concern. In our chart review, because the number of medications given to patients tended not to change over the course of treatment, it is possible that polypharmacy in these facilities represents treatment inertia. In other words, physicians at these facilities tend to sustain, rather than initiate, the use of polypharmacy. Patients' charts did not provide enough information regarding rationale for physicians' medication strategies, and given that cases are often seen by a number of physicians, there is little evidence of continuity in medication use. For example, one study found that nearly half of patients given risperidone in a State hospital were taken off their medication within 15 days after discharge by their outpatient physician • A clear rationale for medication strategy was often missing from medication progress notes. This is particularly important given the great concern over antipsychotics' side effects, a concern that was repeatedly raised during focus groups. In these ways, physicians' actual practices did not match experts‘ agreed-upon best practices. Many current practices
  • 411.
    Prenatal exposure toultrasound waves impacts neuronal migration in mice PNASAng et al. August 22, 2006 vol. 103 no. 34 • Neurons of the cerebral neocortex in mammals, including humans, are generated during fetal life in the proliferative zones and then migrate to their final destinations by following an inside-to outside sequence. The present study examined the effect of ultrasound waves (USW) on neuronal position within the embryonic cerebral cortex in mice. We used a single BrdU (Bromodeoxyuridine commonly used in the detection of proliferating cells in living tissues) injection to label neurons generated at embryonic day 16 and destined for the superficial cortical layers. • Our analysis of over 335 animals reveals that, when exposed to USW for a total of 30 min or longer during the period of their migration, a small but statistically significant number of neurons fail to acquire their proper position and remain scattered within inappropriate cortical layers and or in the subjacent white matter. The magnitude of dispersion of labeled neurons was variable but systematically increased with duration of exposure to USW. These results call for a further investigation in larger and slower-developing brains of non-human primates and continued scrutiny of unnecessarily long prenatal ultrasound exposure.
  • 412.
    Prenatal Exposure toUltrasound Schematic representation of the progression of neuronal migration to the superficial cortical layers in the normal mouse. (A–D) Most cells labeled with BrdU at E16 arrive in the cortex by E18, and, by P1, those cells become surpassed by subsequently generated neurons. Eventually, these cells will settle predominantly in layers 2 and 3 of the cerebrum. (E–H) Model of the USW effect. When cells generated at E16 are exposed to USW, they slow down on E17, and some remain in the white matter or are stacked in the deeper cortical layers.
  • 413.
    Effect Size • EffectSize refers to the strength of association between variables. The Pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between two variables pg 252 Cozby & Bates • The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program. These are both examples of "absolute effect sizes", meaning that they convey the average difference between two groups without any discussion of the variability within the groups. For example, if the weight loss program results in an average loss of 30 pounds, it is possible that every participant loses exactly 30 pounds, or half the participants lose 60 pounds and half lose no weight at all
  • 414.
    Socioeconomic Inequality inthe Prevalence of Autism Spectrum Disorder Durkin MS et al.PLoS One. 2010 Jul 12;5(7) • The prevalence of ASD increased in a dose-response manner with increasing SES, a pattern seen for all three SES indicators used to define SES categories Prevalence per 10001 of ASD by three SES indicators based on census block group of residence. 1Thin bars indicate 95% confidence intervals. Within each SES indicator, both the trend test and x2 tests were significant at p,0.0001. 2MHI refers to median household income.
  • 415.
    • The mainresults of this study were consistent with the only study larger than this to examine the association between ASD risk and an indicator of SES. That study, published in 2002 by Croen and colleagues, looked at more than 5000 children with autism receiving services coordinated by the California Department of Developmental Services and found a stepwise increase in autism risk with increasing maternal education • Epidemiologists long have suspected that associations between autism and SES are a result of ascertainment bias, on the assumption that as parental education and wealth increase, the chance that a child with autism will receive an accurate diagnosis also increases
  • 416.
    • Paranormal phenomenaSignal to Noise ratio
  • 418.
    Path Analysis • Pathanalysis is a straightforward extension of multiple regression. Its aim is to provide estimates of the magnitude and significance of hypothesized causal connections between sets of variables. This is best explained by considering a path diagram. • To construct a path diagram we simply write the names of the variables and draw an arrow from each variable to any other variable we believe that it affects. We can distinguish between input and output path diagrams. An input path diagram is one that is drawn beforehand to help plan the analysis and represents the causal connections that are predicted by our hypothesis. An output path diagram represents the results of a statistical analysis, and shows what was actually found.
  • 419.
    • To constructa path diagram we simply write the names of the variables and draw an arrow from each variable to any other variable we believe that it affects. We can distinguish between input and output path diagrams. An input path diagram is one that is drawn beforehand to help plan the analysis and represents the causal connections that are predicted by our hypothesis
  • 420.
    • An outputpath diagram represents the results of a statistical analysis, and shows what was actually found
  • 421.
  • 422.
    Dispersion Sum ofSquares • Subject Score(x) X2
  • 423.
    Dispersion Sum ofSquares • Subjects Score X X 2 x X2 • 1 0 0 -5 25 • 2 1 1 -4 16 • 3 2 4 -3 9 • 4 4 16 -1 1 • 5 5 25 0 0 • 6 6 36 1 1 • 7 7 49 2 4 • 8 8 64 3 9 • 9 8 64 3 9 • 10 9 81 4 16 • N=10 T=50 ∑X2= 340 = 0 ∑ = =90
  • 425.
    A Modified Constraint-Induced Therapyprogram • Answer the following questions about the article • 1) A constraint-induced movement therapy (CIT) program is what kind of intervention (pg1 abstract) • 2) Describe the Subjects: (how many) children with (what disorder) were placed in (what kind) of design (pg 1 under Methods in abstract) • 3) What were the two procedures being compared? _________vs. __________ • 4) What were the two specifically designed tests? Name then __________and ________ • 5)How many times were the tests administered?_____ At what points in the study were they administered________? • Was there a significant difference between the groups? (yes or no)? • Which of the two groups or procedures was more effective?__________ Type out the above questions on a separate sheet and fill in the blanks and turn in the paper with your name, class & title at the top. Each blank is worth 2 points 12 blanks=25 points (24 + 1 bonus point)
  • 426.
  • 430.
    Organization of report/articleAppendix A The body of the paper will have the following sections; Introduction, Methods, Results and Discussion • Introduction includes 1) the problem under study 2) literature review 3) rationale and hypothesis of the study- Introduction progresses from broad theories and research findings to specific current details • Method provides reader with details information about how the study was conducted. Often there are subsections describing subjects, apparatus materials and the procedure(s) used. Number and relevant characteristics of subjects are stated. Any equipment used is described and the procedure section states how the study was conducted step by step in temporal order. Methods also describes how extraneous variables were controlled and how randomization was used
  • 431.
    Organization of report/articleAppendix A • Results-In this section you offer the reader a straightforward description of you analyses with no explanation of the findings. Present your results in the same order as your predictions were made in the introduction section. State which statistical test was used and what level of alpha was set at. In APA style, tables and figures are not presented in the main body of the manuscript but rather placed at the end of the paper. Avoid duplication of tables, figures as well as statements in the text • Discussion-In this section the interpretations of the results are described considering what is the relationship between our results and past research and theory. Explain how the study either did or did not obtain the results expected, what flaws and limitations were in the methods used and if you can generalize your results and the implications for future research
  • 432.
    Organization of report/articleAppendix A • Introduction -1) What is known 2) What is not known that this study addresses • Methods –Subjects Who are they, Where did you get them, What did you do with them (how assigned to groups, conditions etc.) • Results- What happened? Did the result match the prediction or not? • Discussion-What do the results mean (interpret them) for this study, the field in general and the future
  • 433.
    • Stephan Cowans,a Boston man who spent six years in prison for the shooting of a police sergeant, was released in 2004 after the discovery that the fingerprint used to convict him was not his. • That same year, the FBI mistakenly linked Brandon Mayfield, an Oregon lawyer, to a fingerprint lifted off a plastic bag of explosive detonators found in Madrid after commuter train bombings there killed 191 people. Two weeks after Mayfield’s arrest, Spanish investigators traced the fingerprint to an Algerian man
  • 434.
    Diabetes and CognitiveSystems in Older Black and White Persons • Introduction • Diabetes has long been associated with impaired cognition in white individuals and although the prevalence of diabetes is increasing this association with cognition has not been fully tested in black individuals • Methods • Subjects were older community dwelling persons recruited from senior and private residential housing in the Chicagoland area. All subjects were enrolled in 1 of 2 studies of aging and cognition (Minority Aging Research Study and the Memory and Aging Project with 336 and1,187 subjects respectively). After 80 subjects were eliminated due to a diagnosis of dementia the remaining subjects (mean age73.1 and 79.9years, mean education 14.8 and14.3years, 92.8%white and 6.3% white in the second study and all black in the first) underwent clinical, neurological and neuropsychological evaluation including tests of semantic memory, episodic memory, working memory, perceptual speed and visuospatial abilities