Correlation to the nth degree

The nth Degree 17670556
1
Correlation to the nth Degree:
Does Sample Size Matter?
17670556
Psych. 4000
Dr. McGahan
October 20, 2015
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Correlations for All Samples
n=3 n=7 n=30 n=100

2
Correlation to the nth Degree:
Michael Guice
17670556
Psych. 4000
Dr. McGahan
October 20, 2015
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
n=3 n=7 n=30 n=100

3
Abstract
The age old debate between freewill and determinism still stands today. Everything boils down
to essentially one question, who/what is in control? Some individuals see it as they are
completely in control of their lives, while others see it as be controlled by something else. For
freewill to exist there cannot be a predetermined causal chain. Using Karl Pearson’s Product
moment correlation coefficient we try to determine whether randomness exist. Following these
lines, if randomness does exist, would it vary as a function of sample size? This idea permeates
as the theme of this study. Looking back at the Central Limit Theorem we see that normal
distributions tend to arise when an object is subjected to a larger number of independent
disturbances. This being said, a study has been designed to measure the effect of sample size on
randomness. Four conditions exist within this study. Each condition has a sample size associated
with it, and each condition was subjected to 30 trials a piece. When comparing the conditions we
used aggregate means to allow for greater understanding. The means showed us that as sample
size increased the standard deviations decreased. This means that the group became a tighter fit
around the predicted mean of 50. Randomness showed us its existence with a sample size of
100. It was predicted that as sample size increased we would fall more within the bounds of the
normal distribution. Although this may be true, condition 4 showed signs of chance.
Throughout the study only 7 type I errors were committed. This means seven null hypotheses
were incorrectly rejected.

4
Correlation to the nth degree:
In one of Anthony Burgess’s more famous works, A Clockwork Orange, the protagonist
is put through a classical conditioning scenario. After the scenario is over Alex, the protagonist,
complains of how he had lost his freewill. What Alex was referring to was his inability to have
control over his actions. Merriam-Webster defines Free Will as the ability to make choices that
are not controlled by fate or God. Some believe that free will is a function of many “inputs,
including genetic and environmental factors” [Bradley, 2012]. This type of free will is known as
incompatibilist free will. Bradley referred to a coin flipping machine when mentioning this issue.
He said that if the machine had incompatibilist free will, “the exact moment and the manner of
(a) release is inherently unpredictable”. This unpredictability breaks a metaphorical causal chain
of events. The causal chain that locked Alex and his sickness together was the same chain that
removed his ability to steer his way through life. Without being able to steer Alex was more, or
less, dragged through life by something different than free will. But what was doing the
dragging? Alex believed it to be determinism.
Determinism
The most basic meaning of determinism is that only one course of events is possible.
James Bradley’s work, “Randomness and God’s Nature”, elaborates on the various type of
determinism. In his work Bradley defines determinism as “the philosophical position that
ontological randomness does not exist in the physical world”. Ontological randomness assumes
that randomness is a property of the very nature of things; as a side note, Bradley also mentioned
the idea of epistemic randomness (apparent randomness-a function of human perception of
things and not their nature). In comparison, ontological is “true randomness” and epistemic is
only random to the perceiver. With this being said, determinism would hold that free will is
imaginary from an epistemic viewpoint. Looking at Darwin’s Theory of Evolution, this would
suggest that evolution is a causal chain of predetermined events. If anything or anyone could
have complete control over any specific moment in life, the whole layout could change.
The idea of free will is the basic theme of Darwin’s Theory of Evolution. In this theory
Darwin stresses the idea that “the most suited part will exist to reproduce”. This idea delineates
from the idea of determinism in that there is the factor of “most suited”. Going back to Alex, we
see that anytime he thought of doing something “bad” a sickness feel over his entire body and
forced him to stop. This sickness created a disadvantage for Alex. Without control Alex
believed he could not be held responsible for any action he did, or did not, commit. At this point
he began to believe his fate was sealed. This deterministic mindset forced Alex into a depression
and later an attempted suicide. The very people who “helped” the protagonist later tried to
dispose of him. **Spoiler Alert** Just so happens luck was on his side and he lived through it.
The question is, was the luck predetermined or was it a matter of “blind, purposeless chance”
[Bradley, 2012]?
Randomness
Chance is usually mentioned in the study of randomness and proportionality. A popular
conception of randomness is “not having a governing design, method, or purpose; unsystematic”
[Bradley, 2012]. A fair die has six sides and when thrown has the probability of landing on one
of those six sides. Much like the results in Neuringer’s study, “humans failed to produce random

5
like-behavior”, die tend to fail to produce random like-behavior. For instance, when an
individual plays a game of chance and rolls a die they expect one-out-of-six possibilities. What
happens when the die balances on a corner? What if the die rolls of the table and disappears
forever? These questions give way to randomness. The question proposed now is, does
randomness function as sample size fluctuates? Will the die produce any response other than
one of the six possibilities mentioned before? Seeing as those chance is a function of
randomness this study was aimed at determine whether randomness existed, and if so does it
vary as a function of sample size.
Correlation and Prediction
The Pearson product-moment correlation allows researchers the ability to detect linear
relationships. In order to use the method three assumptions must be meet;
1.) “The Sample is independently and randomly selected from the
population of interest.”
2.) “The population distributions of X and Y are such that their joint
distribution (that is, their scatterplot) represents a bivariate normal
distribution. This is called the assumption of bivariate normality and
requires that the distribution of Y scores be normal in the population at
every value of X.”
3.) “The variances of the Y scores are equal at every value of X in the
population.”
What does a correlation coefficient do? A correlation coefficient measures the strength
of the association between two variables if there is one. “The Pearson product-moment
correlation coefficient measures the strength of the linear association between
variables.”[StatTrek]. The coefficient bounds are between negative one and positive one. Using
this scale allows the researcher to depict how influence on one variable may affect the other. A
correlation of negative one should represent a perfect negative-relationship between the two
variables. Variable one should move incrementally away from variable two when two is
manipulated, and vice versa. When two variables move sequentially in the same direction when
the other is adjusted we would consider this a positive relationship. Correlation coefficients
equal to zero do not represent “zero relationship between two variables; rather, it means zero
linear relationship” [StatTrek]. What is a linear relationship?
A linear relationship is a relationship of proportionality. Generally this relationship is
plotted on a graph and the points create a straight line. Any change in one variable, the
independent variable, will emit a corresponding change in another variable, the dependent
variable. Practice makes perfect. As an individual spends more time practicing their ability
should increase within that activity. Consequently, by reducing the time spent practicing one’s
ability should diminish as well.
Galton:
The concept of correlation was first discovered by Francis Galton in 1888. This came
after his intensified research on heredity. Galton wanted to “reconcile an empirical test with a
mathematical theorem” [Stigler, 1989]. Galton used a Quincunx, an ingenious analogue
computer, to help him with his correlation formulations. Although Galton used median
deviations from the median, today we “express measurements in terms of the number of standard
deviations units from the mean” [Stigler, 1989]. Although Galton is credited with the discovery

6
of correlation, Karl Pearson is credited with the discovery of the Pearson r product-moment
correlation.
Pearson:
Before Pearson’s discovery, Venn diagrams were used to determine linear relationships.
Karl Pearson was quoted saying that “recess deserves a commerative tablet as the birth place of
the truce conception of correlation” [Sigler, 1989]. It’s believed that Pearson meant no
disrespect towards Galton. More simply stated, the idea of correlation did not “click” for Galton
until he took time to reflect.
Pearson was raised as a Quacker until the age of 24. While Pearson studied under Edward
Routh at Cambridge he began to lose his religious faith. It’s been said he began to adhere to
agnosticism or “freethought” [Brittancia]. Freethought holds that positions regarding truth
should be formed on the basis of logic, reason, and empiricism. This idea is directly negatively
correlated with the idea of authority, tradition, revelation or other dogma. Once the correlational
method became an integral part of science it opened the door for a new zeitgeist.
Intelligence Testing and General Model of Reliability
Intelligence Testing:
Intelligence testing was first introduced by Lewis Terman in 1916. Terman was a
psychologist at Stanford University. At the time the test was administered to two year old and
older participants. These test were individually administered and consisted of “an age-graded
series of problems whose solutions involved arithmetical, memory, and vocabulary skills.”
[Britannica]
The most popular intelligence tests are the Stanford-Binet Intelligence Scale and the
Wechsler Scales. The Stanford-Binet is the American adaptation of the original French Binet-
Simon intelligence test. An IQ, or an intelligence quotient, is a concept first suggested by
William Stern. It was originally computed as the “ratio of a person’s mental age to his
chronological (physical) age, multiplied by 100”. [Britannica] Today the mental age has fallen
by the wayside. Test results still yield an IQ, but the concept is now configured on the basis of
statistical percentage of people who are expected to have a certain IQ.
Intelligence test scores follow an approximately “normal” distribution. This normal
distribution suggest most people score near the middle of the distribution curve. As one diverts
attention away from the mean scores tend to drop off fairly rapidly in frequency. Even though
intelligence testing seems simple there is still room for error.
Errors:
There are two types of errors that plague the scientific community, type 1 and type 2.
Type 1 error is seen as incorrectly rejecting the null hypothesis. The scientific community is
protected against over hastily rejection of the null hypothesis by statistical testing. Although
statistics are never 100% accurate this allows “some policing powers to members who would
rather live by the law of small numbers” [Kahneman and Tversky, 1971]. Kahneman and
Tversky went on to state “there are no comparable safeguards against the risk of failing to
confirm a valid research hypothesis (Type II Error).
Replication

7
For this study samples of 30 will be collected. This idea goes back to the article
“Probable error of a correlation coefficient” published in 1908. At the time William Gosset was
publishing under the pseudonym Student. William, head brewer of Guinness at the time,
proposed the idea that “with samples of 30…the mean value (of the correlation coefficient)
approaches the real value (of the population) comparatively rapidly”. Why should we replicate
in the first place?
Kahneman and Tversky (1971) stated that “the decision to replicate a once obtained finding
often express (es) a great fondness for that finding and a desire to see it accepted by a skeptical
community”. In this same research, Kahneman and Tversky showed that 88% of this skeptical
community believed the results of a single significant results are likely do to chance. This idea
falls directly in line with the Central Limit Theorem (CLT). Stigler, 1989, referenced the CLT
and said “that the normal distribution arises when an object is subjected to a larger number of
independent disturbances, no few of them dominant”.
Design to Test
One study with four conditions has been developed. The study requires the use of “true
random numbers”. These numbers will be retrieved from a random generator, Random.org.
Unlike pseudo random numbers, created from algorithm, Random.org generates numbers from
atmospheric noise. Conditions were separated by sample size;
Condition 1 Sample Size of 3
As the conditions change so do the sample size. This is due to Kahneman and Tversky’s
idea that “a replication sample should often be larger than the original”.
Variables X and Variable Y were pooled from the environment. Due to the nature of
randomness variable X should not predict the Variable Y. Variable Y should not predict
Variable X. This being the case Variable X and Variable Y should not be highly correlated.
When correlated the Pearson-product moment correlation should be non-significant at an alpha
level of .05. This alpha level was selected due to its ability to produce a confidence level of 95%.
This is also the most prominent alpha within the scientific community. Also, the aggregate mean
of the Pearson-product moment correlations between the samples should not exceed the critical
values. Critical values for the samples are listed below.
Sample Size: Critical Value: Degrees of Freedom
(two-tailed test)
N=3 0.997 1 (3-2=1)
N=7 0.754 5 (7-2=5)
N=30 0.361 28 (30-2=28)
N=100 0.197 98 (100-2=98)
Degrees of freedom are essential to this research and thus it was added to the table of
critical values.
Degrees of freedom and accuracy of estimates tend to be positively correlated. As the
sample size increases the degrees of freedom increase. With this the accuracy of the estimate,
the aggregate Pearson-moment correlation means, should be more closely related to the
population mean. As the sample sizes increase the aggregate means should group more tightly
around zero, with zero being understood as representing no linear relationship. This in turn
should be representative of smaller standard deviations. Each condition’s standard deviation

8
“represents an average deviation from the mean” [Jaccard, J., 2010]. In theory as sample size
increases one should be better able to predict the effect size of the variable in question. For this
study the question does randomness function as a variable of sample size, has been posed. The
hypothesis for this study is that randomness does vary as a function of sample size.
To test this hypothesis random numbers will be generated by Random.org. Two
variables will be drawn for each subject within the conditions and formatted into two columns.
Column1 will thus be denoted as Variable 1 and Column2 will be denoted as Variable 2. Each
condition should use a multiplier of 2 for each participant. This means there will be a total of six
variables for a sample size of three, fourteen for a sample size of 7, sixty for a sample size of 30,
and two hundred for a sample size of 100.
Each condition should have thirty trials. The first initial trial and twenty-nine
replications. At the end of each trial a test of correlation will be ran to determine the relationship
between the two variables. While testing for the correlation, standard deviations and means for
each variable should be collected for later comparison. Once all 30 trials have been ran for each
condition the Pearson correlation results should be averaged to determine the grand mean of the
samples. “The mean of a sampling distribution of the mean will always be equal to the
population mean (of the raw scores)” [Jaccard, J., 2010]. This process should also be completed
for the standard deviations, as well as the means, of Variable1 and 2 for all conditions.
Sample Size
As mentioned before, sample size is a key variable for this study. In fact, it is the only
variable and thus is labeled as the independent variable (IV).
N=3
Ever heard either of the following sayings, “The third times the charm” or “All good/bad
things come in threes”? Three seems to be a special number when it comes to people in general,
but especially in religion. For instance, Christians tend to symbolize the Holy Trinity in terms of
three (Father, Son, and Holy Spirit). Even God’s attributes tend to be focused in threes
(omniscience, omnipresence, and omnipotence). The Christian faith is not the only faith where
this trend of three’s appears. Taoist believe in the Great Triad which includes heaven, human
and Earth. Throughout the research 3’s and religion go hand and hand. Even in everyday life
the number three pervades our minds (past, present, and future). Due to its evasive presence in
the natural, and supernatural, world condition 1 will include a sample size of three. It only
seems appropriate to study natural evasiveness when researching randomness. Also, this
particular sample size will produce one degree of freedom. If the accuracy of a prediction varies
as a function of degrees of freedom, then condition1 should have aggregate means much
different than the other three conditions. This idea is supported by the Law of Large Numbers,
the more samples you use the closer your estimates will be to the true population values. Only
when a self-correcting tendency is in place should a small sample be highly representative and
similar to that of a larger sample [Kahneman and Tversky, 1971]. Due to the nature of our data,
self-correcting tendencies are nor, and should not be in place. Thus condition 1 should show the
largest variance in comparison to the population so long as the other conditions represent larger
samples sizes.
N=7
The number 3 is not the only significant number that appears throughout religion and our
daily lives. It took seven days for God to create the Earth. “Seven is the second perfect number”

9
only losing to three. [Scripture] In the book of Revelation seven makes thirty-five (5 x 7 = 35)
appearances. Similarly, “seven is a significant number in the natural world: mammals and birds
have a gestation of multiples of 7” [Scripture]. These issues alone make seven seem like an
attractive number to include within a study of randomness. Then there’s Miller’s “Magical
Number of Seven, plus or Minus Two”. Miller’s research concluded that a person’s memory
span is limited to seven chunks, plus or minus two. For these reasons condition2 has been
identified as the sample size of seven.
N=30
A sample size of thirty was particularly interesting for this study. In “God Counts” W.E.
Filmer translated Bible verses into its native tongue, Hebrew. After doing so he associated a
numerical value to an idea. This is known as Bible numeric. “As each idea is introduced the
associated number echoes throughout all manner of divisions and classifications in a way which
cannot be put down to mere chance” [Filmer]. This research became relevant with further
investigation of the value of thirty. Filmer shows us that 30 seems to representative of the idea
of “Blood of Christ; Dedication”. What was really cool was the degrees of freedom for a sample
size of thirty. A sample size of thirty has twenty-eight degrees of freedom. The idea associated
with twenty-eight degrees of freedom is “Eternal Life”, according to Filmer. A sample size of
thirty was a must-have for condition 3.
N=100
For the fourth condition the study required a sample size that was significantly larger than
any of the other three. Looking back to the Law of Large numbers, Kahneman and Tversky
stated that “(the law) guarantees that very large samples will indeed by highly representative of
the population from which they are drawn” [1971]. What’s larger than 100%? Although we can
never really achieve 100% in almost anything in real life, the idea is attractive. In fact, scientist
have acknowledged the fact that we are unable to obtain a 100% accurate finding due to minimal
levels of error. Another strange factoid, 100 is 3.33 the size of condition 3 (all those threes!!).
For these reasons condition 4 is identified as sample size of 100.
Method
Rather than running the correlation by hand I chose to test the data with statistical
software. So long as the data was entered correctly and the correct boxes where checked,
statistical software has a better chance of acquiring the correct outcome. Secondly, the software
saves valuable time.
Generating random numbers.
As mentioned before, random.org generates random numbers through the use of “true
randomness” [random.org]. Using this websites features allows the ability to test whether “true
randomness” exist. By typing the web address into the URL, “randomness.org”, and initiating
the search, a webpage should appear. Welcome to Random.org! Now for the numbers that you
will be using for data.
Near the top of the webpage there should be a strand of blue, hyperlink, words. The one
we are most concerned with today is the hyperlink entitled “Numbers”. Scroll over this

10
hyperlink and a drop box should appear. The drop box has a list of choices in the following
order; integers, sequences, integer sets, Gaussian numbers, decimal fractions, and raw bytes.
The choice most relevant to this study is “integers”. Reason being is you want plain data. Now,
click “integers”.
When the new page is finished loading you will notice you are now looking at the
“Random Integer Generator”. This generator will “allow you to generate random integers” that
are produced by atmospheric noise. [random.org] before you begin you should become familiar
with the layout of the webpage. There are two very important sections to this site; Part 1: The
Integers and Part 2: Go! Part 1 is where you inform the generator how many integers you want
to produce, the range in which you wish the numbers to be derived, and how many columns you
want the generator to format your data.
For the first condition, sample size of three, the generator should produce 6 random
integers with a value between 0 and 100. This scale has been used because students tend to
understand this scale seeing as though most grading rubrics have a range of 101, 0-100. Also,
the data should be formatted into two columns. By doing so, we produce three samples with two
variables each. Column one should be recognized as variable one and column two as variable
two. Once the form has been filled out correctly the generating can begin! Advert your attention
to Part 2: Go! At this point three options are available; get numbers, reset form, switch to
advanced mode. For this experiment the first option, Get Numbers, will suffice. Option 2 will
clear all your hard work thus far and opinion three is
beyond the scope of this particular topic. Now generate.
As your generated numbers appear you will see two
columns with three rows each. For instance, if you
generated a set of numbers that look like table 2.1 you did
it correctly. What this table tells its audience is that
subject 1 had a variable 1 equal to 27. Likewise, subject 3
has a response of 14 for its variable 2. The pattern
continues as you follow the flow of the table. In its entirety, table 2.1 represents trial 1 for a
sample size of 3. For each condition in this study 30 trials will be ran for each sample size.
Returning to the website, near the bottom left of the webpage there are two options.
Option 1(Again!) and option 2 (Go Back). The quota for the condition has not been met. 29
trials still remain, thus making option 2 an unlikely candidate for selection. With that being said,
Option 1 should be selected. This action will provoke the generator to produce six more integers
in the same format. Now for the other three conditions.
The other three conditions require a sample size of 7, 30 and 100 be used respectively.
The same process that is listed above for a sample size of 3 will be used with a few exceptions.
The range and the number of variables (columns) are controlled within the respective confounds
of this experiment. These controls allow the researcher to pin point the relationship between
variability and sample size. For a sample size of 7, fourteen integers will be generated. These
fourteen integers will be divided into two separate columns providing two variable responses for
seven subjects. This method is continued for both sample sizes of 30 and 100. For the sample
Table 2.1
Variable 1 Variable 2
Subject 1 27 37
Subject 2 92 5
Subject 3 16 14

11
size of 30, sixty responses will be required. You guessed it! Two hundred responses will be
required for the sample size of 100. Remember!! Each condition requires thirty trials. Now
what do with this generated data?
Each data set should be placed into a safe place for keeping and further study. For this
study we have chosen to insert our data into SPSS, Statistical Package for the Social Sciences,
software. This program has been chosen due to the nature of the data and the high utility of the
software. It’s worth mentioning that version 22 of SPSS was used for this particular study. Upon
opening the SPSS software the user is met with two windows. The first of which inquires about
the user’s need, and the second should be the dataset viewer page. On the first window, near the
top left you should double-click “New Dataset” underneath the New Files tab. By doing so, the
software program will present a document usually entitled “Output 1”. This is where all of your
formulations and equations will send their answers. For now, let’s put our attention on the other
window now available, “Data Editor”.
Once inside this window you should notice various amounts of columns and rows. The
rows should be numbered in an ascending manner, and the columns should be labeled “var”. For
instance, column will be list as “Var1”. Also, near the bottom left hand side of the window there
should be two tabs. The tabs should be labeled “Data View” and “Variable View”. Variable
view allows the user to determine the scale of measurement for each variable. If there are any
concerns on this matter please see S.S. Stevens’s article entitled “On the Theory of Scales of
Measurement”. In this article Stevens refers to the four scales of measurement, NOIR, and how
each is used. For the moment any further elaboration is beyond the scope of this study. The
“Data View” is where the user will spend most of their time. As mentioned before, there are two
variables for each trial and this theme permeates throughout the entire experiment.
Inserting the Data:
After generating the numbers in the random generator one must transpose these numbers
into the SPSS software. Column 1 created by the random generator will be inserted into the first
available odd-numbered column inside SPSS. For instance, if VAR1 and VAR5 were both
empty, VAR1 would be the home for the first column of the first round of data. VAR2 would be
the home for the second column of generated data. Each column of generated data should be
entered consecutively. The goal is to create sixty VARs within each condition. This decision is
based on the idea that each condition requires 30 trials, and each trial requires 2 sets of variables.
As the sample size increases the amount of variables, VAR, do not increase; only the amount of
participants under each column heading will do so. These steps have been take to ensure
accuracy within the calculations, as well as to avoid any confusing results.
Churning the Data:
All four conditions, sample size of 3,7,30, and 100, will be analyzed through SPSS.
Although each condition must be performed in the same manner, each must be performed
separately. This protects against the misrepresentation of data and later confusions as to the
results.

12
Once all the data has been inserted and saved into its respectable files the data can then
be “churned”, or analyzed. Near the top-left corner of the SPSS window there is an option
called “File”. This particular item is unimportant for the time being, but its location is pivotal in
a manner of speaking. To the right of “File” should be “Edit”. Continue to look further right
and you will come across an item referred to as “Analyze”. Remember the location for this
selection. It is a key path way to analyzing your data. Now select “Analyze”. A scroll down bar
should pop up and the first item for selection should be “Reports”. Continue to look further
down until you see the selection “Correlate”. Upon placing the cursor over the “Correlate”
function another drop box appears. Select the option “Bivariate”. This option has been selected
because our data involves two variables; variable 1 and variable 2. After selecting “Bivariate” a
new window entitled “Bivariate Correlations” should pop-up. This is where you will select the
variables you want to “churn”.
The left side of the new window contains all the variable you have entered. Again, for
each condition there should be 60 variables listed. By highlighting the variable in the left
column and either double-clicking, or manually inserting with the arrow, you can turn your
available variables into variables of interest. Once the two variables, VAR1 and 2 (or VAR9 and
10), have been successfully been transformed in the right side you can begin to inform the
program what it is you need. The “Options…” tab allows the user to add other useful data inside
the output. This data may be beneficial if selected and depicted in the proper manner. For this
experiment “means and standard deviations” underneath the statistics heading should be selected.
Once this has been selected you can select “Continue” near the bottom of the page. This will
send you back to the “Bivariate Correlations” page. While in this page be sure to select the
“Pearson” box under Correlation Coefficients. The lets the program know you are looking for
the Pearson r moment-correlation, as mentioned before. The test of significance should also
have the option “two-tailed” selected. Lastly, be sure to check the box “Flag significant
correlations”. This will make you work much easier while looking at the outputs. Once all the
necessary steps have been completed select “OK” at the bottom of the page. This process will
need to be completed 30 times for each condition. After the first correlation is ran there is no
need to go through the options selection again. SPSS saves your selection choices so long as you
do not exit the program. The key thing to remember when running these analyses is the variable
must be replaced with each replication. For instance, VAR 1 and 2 must be removed from the
right side so VAR 3 and 4 can be analyzed.
SPSS: “Output”
After selecting “OK” in the bivariate correlations page the program should direct
attention to the output window. This is where the churned data can be analyzed and viewed.
Some numbers of interest on this page include; the mean of both VAR1 and VAR2, the standard
deviation for both variables, and the Pearson correlation for VAR1 as compared to VAR2. The
Pearson correlation for VAR2 as compared to VAR1 should be identical due to the mirror effect
of the matrix. These numbers should be entered and saved into a Microsoft Excel workbook
page. Data labels for each condition should be five columns wide. Underneath these labels there
should be five column labels listed horizontally as follows; “r, Mean V1, Std. Dev V1, Mean V2,
and Std. Dev V2”. Along the vertical axis of the workbook should be the word “Trial (with its

13
respective number of 1-30)”. The data that will be inserted into this workbook will be derived
from the previous application in SPSS.
Excel: Tables and Charts
Visual representations have the potential to help the viewer follow the flow of the data
and allow them to observe its physical nature. Once the new data is plugged into Excel it can
now be used to create visual representations of the data. Particular visual of interest for this
study include; Pearson r scatter plots, Pearson r line charts, sample means line chart from each
condition, as well as the sample means scatter plots.
A Pearson r scatter plot can be created by highlighting the correlation data for all four
conditions and selecting the option “Insert”. This option can be located at the top of the Excel
window to the right of “Home”. Once the tab has adjusted items such as “Recommended
Charts”, “Pivot Chart”, and “Tables” should be visible. There should also be a diagram eluding
to that of a scatterplot. The diagram appears to have an X and Y axis with small dots spatially
organized. If this option does not appear click any image of a chart and redefine your selection
on the right side of the pop-up window; if it does, click it. This new window allows you to
manipulate the visual aspects of the chart itself. Pick the best way to represent your data that’s
not misleading or uninformative.
Pearson r line charts have the potential to be confusing. This may be true because all four of
the conditions
correlations are laid on-
top-of one another. If
there is lots of variation
between the conditions
this chart has the potential
to be confusing.
Nonetheless, the
confusion may allow the
user the potential to
differentiate between the
conditions. This type of
chart can be created by
highlighting all the variables of interest and inserting them into a line chart the same way the
scatter plot was completed, but by selecting line chart instead of scatter plot.
Results
For each condition one chart has been selected to represent both variables. Var1 for each
condition will be used for a visual reference point.
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
n=3 n=7 n=30 n=100

14
The results of this study show that for condition 1,
n=3, the aggregate means for Var1 and Var2 are
47.4332 and 49.9444, respectively. Similarly the
aggregate mean for the standard deviation of Var1
is 20.16363 and Var2 is 19.26425. The grand
mean for the ranges for Var1 and Var2 are 83.33
and 71.00, respectively. The chart entitled
“Var00001” resembles the general sample of
condition one. Notice how the x-axis ranges from
0-100. Also notice the shape of a “normal bell
curve” and how the histogram does not exactly
“fit”.
For condition 2, n=7, the aggregate
means of Var1 and Var2 are 50.0097 and
49.7048, respectively. Aggregate
mean of the standard deviation for Var1
was 10.39435, while Var2 was 9.06379.
The two grand means for the ranges were
38.43 and 35.86, respectively. The chart
entitled “VAR00003” represents Var1
from condition 2. Be sure to notice the
x-axis is no longer bound to 0-100, but
rather 20-70. This reveals the tightening
of the bounds. Also, be aware of the
tighter grouping as compared to the
graph illustrated in condition 1. As the
standard deviation begins to decrease the
means begin to fall under the normal distribution.
Condition 3, n=30, aggregate means of Var1 and
Var2 were 49.9722 and 49.8778. The aggregate
mean of the standard deviations were 5.11790 and
5.32132. The grand means for range equaled
19.77 and 20.17. The graph entitled “VAR0005”
represents Var1 from condition 3. Notice once
again how the x-axis bounds have decreased. The
bounds are now restricted between 40-65. As the
sample size increases more of the means tend to
fall under the normal distribution.
Lastly, condition 4, n=100, the aggregate means
were 48.9833 and 50.0453. The aggregate means
of the standard deviations were 2.66304 and
2.58452, respectively. The grand means of range

15
equaled 9.71 and 10.09. The graph
entitled “VAR00007” represents the
Var1 from conditon 4. Yet again as
sample size decreases the bounds of the
x-axis have decreased. However, unlike
the other graphs to this point, this graphs
means tend to be more spread out from
the curve.The pattern till now was a
tighter-more compressed grouping. It
would appear as though this sample size
was effected by chance.
Correlation Results
Correlation Results
Critical
Values
Mean Standard
Deviation
Range
Condition 1 0.997 -.1000 .60864 1.93
Condition 2 0.754 .0248 .49867 1.59
Condition 3 0.361 -.0344 .18345 .72
Condition 4 0.197 .0009 .12054 .43
The table entitled “Correlation Results” shows the correlation results for this study. The
columns entitled “Mean”, “Standard Deviation” and “Range” should all be seen as aggregate
means. The means were taken from the 30 trials within each condition and collapsed into one
average. By doing so, we have saved valuable time and space for both the reader and researcher
alike. This table shows that on average none of the conditions grand mean were greater than the
critical values associated with each sample size. Thus no significant results were found to
disprove that randomness existed. Similarly, as predicted, as the sample size increased
throughout the conditions the aggregate mean of the standard deviation shrank incrementally.
Likewise, the range followed suite. On the contrary it’s worth mentioning that although the
aggregate mean of the correlations did not exceed the critical values, some individual trials did.
For the entire study only seven type I errors were made. These errors are known due to
the nature of the data. The data is random data and should not portray a significant correlation as
mentioned before. In the table below I have listed the amount of type I errors per condition;
Condition: Number of Type I Errors:
Condition1 1
Condition 2 4
Condition 3 1
Condition 4 1
Total Type I Errors 7

16
Scatterplots for the Pearson r Product-Moment correlations for this study are shown and
described below.
This image has made an appearance multiple times throughout this study. Now it’s time to
dissect it. The blue line, n=3, shows the variation in the correlations. Notice how it seems to
take greater leaps “of faith” as in crosses the flat horizontal line. The horizontal line can be seen
as 0, or having no linear relationship. In all the chaos it seems as though only two lines know
exactly where it is they are headed, the grey and yellow line. These lines represent condition 3
and 4 respectively. Although they both seem to be navigating quite well through the chaos,
condition 4 seems to hug the horizontal line a little tighter. This would fall right in line with the
idea that two random variables should not significantly correlate with one another. As sample
size increases, the degrees of freedom increase. When this happens it makes it easier for
statisticians, researchers, and students alike to predict linear relationships.
Conclusion:
The age old debate between freewill and determinism is not a simple cut-and-dry
scenario. If it were it probably would have been resolved long ago. The monkey wrench being
thrown into the mix is something called randomness. Randomness behaves in a spurious way. It
can be said that randomness is a function of chance. In this study we used Pearson product
moment correlation to determine the effect of sample size on randomness. Four conditions were
created with various amounts of samples. Sample size was the independent variable for this
study. Thus it should be noted that condition 1 had a sample size of 3, condition 2 had a sample
size of 7, condition 3 had a sample size of 30, and condition 4 had a sample size of 100. Another
key to this research was the amount of replications. Kahneman and Tversky’s advice from the
law of small numbers was used to help predict our outcomes. We predicted that as sample size
increased we would be more able to predict the nature of randomness. This turned out to be
false. As sample size increased we were able to predict to a certain degree plus a minus a certain
degree, but we were never able to predict the next number to be generated. Ideas for future
research include, but are not limited to a sample size of 1000. With a sample size of 1000 it is
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
n=3 n=7 n=30 n=100

17
believed that the aggregate means of the standard deviations will be relatively small in
comparison to our data.

18
References:
http://www.britannica.com/biography/Karl-Pearson
http://www.britannica.com/science/intelligence-test
Bradley, J. (2012). Randomness and God’s Nature. Perspectives on Science and Christian Faith,
Vol. 64, 2, 75-89.
Filmer, W.E. (1984) God Counts: Amazing Discoveries in Bible Numbers.
Jaccard, J. Becker, M. (2010). Statistics for the Behavioral Sciences. Cengage Learning.
Belmont, CA.
Neuringer, A. (2002). Operant Variability and the Power of Reinforcement. The Behavior
Analyst Today, Vol. 10, 2, 319-343.
Stigler, S. (1989). Francis Galton’s Account of the Invention of Correlation. Statistical Science,
Vol 4, No. 2, 73-86.
S.S. Stevens. (1946). On the Theory of Scales of Measurement. Science, Vol. 103, 2684, 677-
680.
Tversky, A. Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 2,
105-10.
“Scripture”
The Significance of Threes (1988). Agape Bible Study.

Correlation to the nth degree

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Correlation to the nth degree

Similar to Correlation to the nth degree (18)

Correlation to the nth degree