Wk1 statnotes


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Wk1 statnotes

  1. 1. http://www.slideshare.net/statcave/week8-finalexamlivelecture-2010june http://www.facebook.com/statcave Data Types General speaking, statistical techniques are determined by the type of data. A basic understanding about the data types is helpful for choosing statistical procedures. In SPSS, a column is for a variable and a row is for a case. There are, generally speaking, two major types of data: Qualitative variables: The data values are non-numeric categories. Examples: Blood type, Gender. Quantitative variables: The data values are counts or numerical measurements. A quantitative variable can be either discrete such as # of students receiving an 'A' in a class, or continuous such as GPA, salary and so on. Another way of classifying data is by the measurement scales. In statistics, there are four generally used measurement scales: Nominal data: data values are non-numeric group labels. For example, Gender variable can be defined as male = 0 and female =1. Ordinal data (we sometimes call 'Discrete Data'): data values are categorical and may be ranked in some numerically meaningful way. For example, strongly disagree to strong agree may be defined as 1 to 5. Continuous data: Interval data : data values are ranged in a real interval, which can be as large as from negative infinity to positive infinity. The difference between two values are meaningful, however, the ratio of two interval data is not meaningful. For example temperature, IQ. Today is 1.2 times hotter than yesterday is not much useful nor meaningful. Ratio data: Both difference and ratio of two values are meaningful. For example, salary, weight. NOTE: The statistical procedures mentioned below are demonstrated using movie clips in the Statistical Procedures Page. http://www.cst.cmich.edu/users/lee1c/spss/datatype.htm
  2. 2. Nominal Nominal data are items which are differentiated by a simple naming system. The only thing a nominal scale does is to say that items being measured have something in common, although this may not be described. Nominal items may have numbers assigned to them. This may appear ordinal but is not -- these are used to simplify capture and referencing. Nominal items are usually categorical, in that they belong to a definable category, such as 'employees'. Example: The number pinned on a sports person. A set of countries. Ordinal Items on an ordinal scale are set into some kind of order by their position on the scale. This may indicate such as temporal position, superiority, etc. The order of items is often defined by assigning numbers to them to show their relative position. Letters or other sequential symbols may also be used as appropriate. Ordinal items are usually categorical, in that they belong to a definable category, such as '1956 marathon runners'. You cannot do arithmetic with ordinal numbers -- they show sequence only. Example The first, third and fifth person in a race. Pay bands in an organization, as denoted by A, B, C and D. Interval Interval data (also sometimes called integer) is measured along a scale in which each position is equidistant from one another. This allows for the distance between two pairs to be equivalent in some way. This is often used in psychological experiments that measure attributes along an arbitrary scale between two extremes. Interval data cannot be multiplied or divided. Example My level of happiness, rated from 1 to 10. Temperature, in degrees Fahrenheit.
  3. 3. Ratio In a ratio scale, numbers can be compared as multiples of one another. Thus one person can be twice as tall as another person. Important also, the number zero has meaning. Thus the difference between a person of 35 and a person 38 is the same as the difference between people who are 12 and 15. A person can also have an age of zero. Ratio data can be multiplied and divided because not only is the difference between 1 and 2 the same as between 3 and 4, but also that 4 is twice as much as 2. Interval and ratio data measure quantities and hence are quantitative. Because they can be measured on a scale, they are also called scale data. Example A person's weight The number of pizzas I can eat before fainting Mode: The mode of a set of data is the value in the set that occurs most often. Problem: The number of points scored in a series of football games is listed below. Which score occurred most often? 7, 13, 18, 24, 9, 3, 18 Solution: Ordering the scores from least to greatest, we get: 3, 7, 9, 13, 18, 18, 24 Answer: The score which occurs most often is 18. This problem really asked us to find the mode of a set of 7 numbers. Mode: The mode of a set of data is the value in the set that occurs most often. Biomodal: a setof data is bimodal if it has 2 modes(i.e.,twonumbersthatoccurmost often,andthe same numberof times). No Mode: When each value occurs only once in the data set, there is no mode for this set of data. Zero Mode, when 0 occurs most often in the set. 0 mode and no mode are two different data sets.
  4. 4. In a blindexperiment,the subjectsdonotknow whethertheyare inthe treatmentgroup or the control group.In orderto have a blindexperimentwithhumansubjects,itisusuallynecessarytoadministera placebotothe control group. Blindingisabasic tool to preventconsciousaswell assubconsciousbiasinresearch.Forexample,in opentaste testscomparingdifferentproductbrands,consumersusuallychoose theirregularbrand. However,inblindtaste tests,wherethe brandidentitiesare concealed,consumersmayfavoradifferent brand. In a double-blindexperiment,neitherthe subjects northe people evaluatingthe subjectsknowswhois inthe treatmentgroup andwhois inthe control group.Thismoderates the placeboeffectandguards againstconsciousandunconsciousprejudicefororagainstthe treatmentonthe part of the evaluators. Double-blindmethodscanbe appliedtoanyexperimental situationwhere there isthe possibilitythat the resultswill be affectedbyconsciousorunconscious biasonthe part of the experimenter. Random assignmentof the subjecttothe experimentalorcontrol groupis a critical part of double-blindresearch design.The keythatidentifiesthe subjectsandwhichgrouptheybelongedtoiskeptbya thirdparty and notgivento the researchersuntil the studyisover Computer-controlledexperimentsare sometimesalsoreferredtoasdouble-blindexperiments,since software shouldnotcause anybias. Ananalogyto the above,the part of the software thatprovides interactionwiththe humanisthe blindedresearcher,while the partof the software thatdefinesthe key isthe thirdparty.An example isthe ABXtest,where the humansubjecthasto identifyanunknown stimulusXas beingeitherA or B. http://www.worldlingo.com/ma/enwiki/en/Blind_experiment http://www.worldlingo.com/ma/enwiki/en/Blind_experiment Randomsamplingensuresthateverymemberof the populationhasanequal chance of beingselected. Simple randomsample isasamplingmethod Randomizationisthe processof makingsomethingrandom. Randomization-basedinference isespeciallyimportantinexperimental designandinsurveysampling. Randomizationinvolvesrandomlyallocatingthe experimental unitsacrossthe treatmentgroups.For example,if anexperimentcomparesanew drugagainsta standard drug,thenthe patientsshouldbe allocatedtoeitherthe newdrugor to the standarddrug control usingrandomization.
  5. 5. Simple randomsamplingreferstoasamplingmethodthathasthe followingproperties.  The populationconsistsof N objects.  The sample consistsof n objects.  All possible samplesof nobjectsare equallylikelytooccur. The main benefitof simple randomsamplingisthatitguaranteesthatthe sample chosenis representative of the population.Thisensuresthatthe statistical conclusionswillbe valid. There are manywaysto obtaina simple randomsample.One waywouldbe the lotterymethod.Eachof the N populationmembersisassignedaunique number.The numbersare placedinabowl