BASIC CONCEPTS OF STATISTICS by : DR. T.K. JAIN AFTERSCHO ☺ OL centre for social entrepreneurship sivakamu veterinary hospital road bikaner 334001 rajasthan, india FOR – PGPSE / CSE PARTICIPANTS [email_address] mobile : 91+9414430763
My words..... My purpose here is to give a few questions on fundamentals of statistics. I welcome your suggestions. I also request you to help me in spreading social entrepreneurship across the globe – for which I need support of you people – not of any VIP. With your help, I can spread the ideas – for which we stand....
What were the root words of statistics ? Latin = status Germany = statistik Italian = statista french = statistique
Who carried out first cencus of the world ? Pharaoh (over 1000 years before Christ)
What are the subjects where statistics has application ? Every subject including the following : business management economics commerce industry etc.
What are 2 major sources of data? 1. primary data : collected for the first time during the research 2. secondary data : which are already available published data – they were collected for some other purpose, but they can be used for the present research
From where can we get secondary data ? 1. industry report 2. previous researches 3. published data 4 annual reports 5. statistical department 6. directories / reports / data bases
From where can we get primary data ? 1. interview 2. survey 3. schedule / questionnaire 4. observation 5. experimentation
What do we do after collection of data ? Scrutinise = remove data which are defective then we arrange them we try to tabulate them for this we have to fix classification of data then we we can prepare graphs / tables / charts from data and then we can analyse data
Why do we classify data ? After classification, data can easily be analysed. We can easily interpret data
How can we classify data ? 1. chronologically (data wise / year wise) 2. geographically (north v/s south zone) 3. qualitative ( order = like first, second, third) 4. quantitative data analysis (use of tools for quantitative analysis)
Name a few international bodies that publish data (which we can use as secondary source of data)? IBRD, IMF, ADB, ILO, UNO, WTO, WHO etc.
What is the difference between primary and secondary data ? Primary data is first hand original in nature whereas secondary data is in the form of compilation of existing data or already published data. The collection of primary data involves huge resources in terms of money and time, finance and energy whereas secondary data is relatively less costly. Primary data is usually collected by keeping in mind the purpose for which it is collected so its suitability will be more in comparison to secondary data
What is difference between census and sample survey ? Under the census or complete enumeration method, data are collected for each and every unit of the population or universe which is a complete set of items which are of interest in any particular situation in sample, we pick up only a few items and from them we collect data. So reliability is less comparatively
What are the steps in presentation of data ? Classification of data (put data in classes) Tabulation of data (prepare table from data) Frequency distribution of data (identify frequencies) Diagrammatic presentations of data (prepare diagrams) Graphic representation of data (prepare graphs).
What do you understand from tabulation ? Tabulation is a systematic and logical arrangement of data in columns and rows in accordance with some salient features and characteristics.
What are the parts of a table ? Table Number Title of the Table Sub-title or Head Note Captions and stub Body Footnotes Source Note
What is class limit ? The end numbers or the highest and lowest values that can be included in a class interval are known as the class limits of that class. For example, in above table 40-50 and 80-100 are the lower and upper class limits.
What is class interval ? It is the difference between the upper limit and lower limit of the same class. The lower limit of a class is usually represented by symbol I1 and upper limit by I 2 .
What is Class frequency ? The number of observations included in a particulars class is known as the frequency of that class.
It refers to that classification where both the class limits are included in the class itself while determining the class intervals.
What are the 3 methods of data presentation ? 1. textual presentation = present data in the form of text – write reports etc. 2. graphical presentation = prepare graphs, pie chart, bar chart, histo gram etc. 3. tabular presentation : prepare tables of data for better analysis
POPULATION ?? All the elements of set, which are of the interest of researcher
Statistical inference The process of using data obtained from a sample to make estimate or test hypothesis about the characteristics of the population
Qualitative data ? Data that are labels or names used to identify categores of items
Quantitative data ? The data that indicate how much and how many ?
Frequency distribution ?? A tabular summary of data showing number and frequency of each of nonoverlapping classes
What is median ? Measure of central location, when data are arranged in ascending order
What is percentile ? When some % of value are above some specified value, it is called that percentile 50 th percentile = median = 2 nd quartile
Quartile ? 25% data sets we have 3 quartile 1 st quartile = 25% 2 nd quartile = 50% 3 rd quartile = 75%
Range ? Measure of variability largest - smallest value
Interquartile range (3 rd quartile - 1 st quartile)
Variance ? Squared deviations of data from mean
Standard deviation Positive square root of variance
Coefficient of variance ? Standard deviation / mean * 100
Z score (Xi – mean) / standard deviation it is a standardised value = showing difference from mean + & - 1 standard deviation =68.27% + & - 2 standard deviation = 95.45% + & - 3 standard deviation = 99.73%
Empirical rule ? In a bell shaped distribution (normal distribution), we have data in 1 or 2 or 3 standard deviation to mean in some % of total data
Outlier ? Unusually small or unusually large data
Tree diagram A graphical representation helpful in identifying the sample points of an experiment involving multiple steps
What is permutation & combination? Permutation = it denotes order / Sequence but combination = it only denotes that some objects are together example : ABC can have only one combination taking all of them together. But permutations are many : - ABC,ACB,BCA,BAC,CBA,CAB
What is relative frequency method ? Method of assigning probability on the basis of histrorical data
Subjective method of probability Method of assigning probability on the basis of judgement
Venn diagram Graphical representation showing sample space and operations involving events sample space = rectangle event = circle within sample space
What is formula of permutation ? Npr = n! / (n-r)! p=permutation n= total number of objects r=how many objects you are taking at a time ! = multiply with reducing numbers till it reaches 1 example : 5p5 = 5! / (5-5)! 5!=5*4*3*2*1 0! = 1 thus answer = 120 answer
How many different 4 digit letters can you make out of A,B,C,D,E? N = 5 (A,B,C,D,E) R = 4 formula = Npr = n! / (n-r)! =5!/(5-4)! = 120 answer
How many different 4 digit numbers can you make out of 1,2,3,4,0? N = 5 (1,2,3,4,0) R = 4 but 0 cannot come in the first digit for first digit we have 4 options (1,2,3,4), for next digits, we can use 0. thus we have 4*4*3*2*1 = 96 options OR formula = Npr = n! / (n-r)! =5!/(5-4)! but this contains all those numbers which start with 0. so let us keep 0 as fixed for 1 st digit and solve it. Now we have to pick up 3 digit out of 4 contd.
contd..... If it is not 0, permutation will be : formula = Npr = n! / (n-r)! =5!/(5-4)! = 120 Zero fixed for 1 st potion, we have these options : Npr = n! / (n-r)! n=4,r=3 4!/(4-3)! = 24 deduct this 24 from 120 120 -24 = 96 answer you can use any formula (out of these 2), you get the same answer
How many different 4 digit numbers can you make out of 1,2,3,4,0 which are divisible by 2? Start with 96 of the last question now pick up all those which are ending with 1 : 3*3*2*1 = 18 similarly those which are ending with 3 3*3*2*1 = 18 thus 96 – (18+18) = 60 seems to be the answer
In how many ways can Raj invite any 3 of his 7 friends? This is a question of combination. Here order (sequence) is not important, his friends can come in any order. Thus this is a case of combination. Formula : N! / ((n-r)!*r!) you can calculate combination by dividing permutation by r! =7! / ((7-3)!*3!) =(7*6*5)/(3*2*1) = 35 answer
How many different words can you frame from FUTURE ? Here we have two U total we have 6 digits. Formula : N ! / L! N= total number of digits L = those digits which are repeated. Answer = 6! / 2! = 360 answer
How many different words can you frame from DALDA ? Here we have two D & A total we have 5 digits. Formula : N ! / L! N= total number of digits L = those digits which are repeated. Answer = 5! / (2!*2!) = 30 answer
In how many ways can 8 person sit around a round table ? For questions relating to round table , we have to use the following formula : (n-1)! So here answer = (8-1)! = 7! =5040 answer
How many 4 digit numbers can be formed out of 1,2,3,5,7,8,9 if no digit is repeated. Total number ofdigits = 7 formula = Npr n =7 r 4 7p4 = 7! / 3! =7*6*5*4 = 840
How many numbers greater than 2000 can be formed from 1,2,3,4,5. No repeatition is allowed. 5 digit numbers = 5! = 120 4 digit numbers,: we cant take 1 in the beginning. We have 4 options for 1 st digit 4 for 2 nd digit 3 for 3 rd digit ... 4*4*3*2*1 = 96 total = 216 answer
There are 6 books on english, 3 on maths, 2 on GK. In how many ways can they be placed in shelf, if books of 1 subject are together? We have 3 subjects so 3! books of same subjects can be interchanged. So answer : 3!*6!*3!*2! =6*720*6*2 = 51840 answer
How many words can we make out of DRAUGHT, the vowels are never separated? Number of vowels = 2 other digits = 5 we will treat vowels as 1 word so we have 6!. Vowels can be interchanged so 2! so answer = 6!*2! = 1440 answer
In how many ways can 8 pearls be used to form a necklace ? In questions of necklace, we use the following formula : ½ (N-1)! Here we can take reverse order of left to right or right to left, so divide by ½ =1/2 (8-1)! =2520
In how many number of ways can 7 boys form a ring ? (7-1) ! = 6! = 720 answer
50 different jewels can be set to form necklace in how many ways ? ½ ( n -1) ! = ½ (50 -1)! =1/2 (49)!
How many number of different digits can be formed from 0,2,3,4,8,9 between 10 to 1000? Let us assume that repeatition is not allowed Let us make 2 digit numbers : for first digit we have 5 option, for 2 nd digit also we have 5 options (including 0) = 25 for 3 digit numbers : 5*5*4 = 100 total 125 if repeatition is allowed : for 2 digit : 5 * 6 = 30 for 3 digit : 5*6*6 = 180 total = 210 answer
What is the number of permutations of 10 different things taking 4 at a time in which one thing never comes ? = 9 p 4 = (9*8*7*6) =3024
There are 5 speakers (A,B,C,D,E) , in how many ways can we arrange their speach that A always speaks before B For A and then B without gap : Let us take A and B as one. 4! = 24 for A and then B let us keep B at 3 rd place and A at 1 st place =3! there are total 6 such possibilities so we have 6*6 = 36 total possibilities = 60 answer
5 persons are sitting in a round table in such a way that the tallest person always sits next to the smallest person? Keep tallest and smallest person as 1. we have (4-1)! = 6 the tallest and the smallest person can be interchanged = 2 =12
How many words can be formed from MOBILE so that consonent always occupies odd place ? There are 3 odd and 3 even places. We have 3! *3! =36 answer
In how many ways can we arrange 6 + and 4 – signs so that no two – signs are together? + + + + + + there are 5 places between 2 +. one on extreme left and one on extreme right. We have 7 positions for – sign 7c4 we have 6 places for 6 + sign, so we have 6c6 total = 35 answer
There are 10 buses between Bikaner and Jaipur. In how many ways can Gajendra go to Jaipur and come back without using the same bus in return journey? There are 10 options while going there are 9 options while returning (one bus used earlier will not be used) 10*9 = 90 answer
In how many ways can yamini distribute 8 sweets to 8 persons provided the largest sweet is served to Jigyasha? 1 sweet is fixed so we have 7! = 5040 answer
Yamini & Jigyasha go to a train and they find 6 vacant seats. In how many ways can they sit? Yamini has 6 options but Jigyasha has only 5 options left = 6*5 = 30 answer
How many words can you make from DOGMATIC? 8! 40320 answer
Gajendra has 12 friends out of whom 8 are relatives. In how many ways can he invite 7 in such a way that 5 are relatives? 8c5 * 4c2 =56*6 =336 answer
There are 8 points on a plane. No 3 points are on a straight line. How many traiangles can be made out of these ? 8c3 = 56 answer
In how many ways can you form a committee of 3 persons out of 12 persons ? 12c3 =220 answer
How many different factors are possible from 75600 ? The factors are : 2^4* 3^3*5^2 *7 formula = (number of factors +1) (number of factors +1) .... - 1 (4+1)(3+1)(2+1)(1+1) -1 =119 answer
A box contains 7 red 5 white and 4 blue balls. How many selections can be made that we pick up 3 balls and all are red? It is a question of combination. Total possibilities = 7c3 7c3 = 7*6*5 / 3*2*1 = 35 thus there are 35 chances of getting
A box contains 7 red 5 white and 4 blue balls. What is the probability that in our selections we pick up 3 balls and all are red? Total possibilities for red = 7c3 7c3 = 7*6*5 / 3*2*1 = 35 total possibility of 3 balls : 16c3 =(16*15*14/3*2*1) =560 probability - thus there are 35/560 chances of getting red in all the three selections
What is the probability of getting 3 heads when I toss a coin 5 times? This is a case of binomial probability (where there are only 2 outcomes possible, we can use this theory) Here we can use this formula : Ncr (p)^r * (q)^(n-r) =n =5, p = ½ q = (1-p) = ½ , r = 3 5c3 (1/2)^3*(1/2)^2 =5/48 answer
In how many ways can Gajendra invite some or all of his 5 friends in party hosted by him? (at least 1) Frmula of combination of 1 to all = 2^n – 1 = 2^5 - 1 = 32-1 =31 answer
How many words can be formed by using all the letters of the word DRAUGHT so that a. vowels always come together & b. vowels are never together? A There are 2 vowels. We treat them as 1. solution : 6!*2! = 1440 answer b. total possibilities = 7! = 5040 number of cases when vowels are not together = 5040-1440 = 3600 answer
In how many ways can a cricket eleven be chosen out of a batch of 15 players. 15c11 =15! / ((15-11)!*11!) =15!/(4!*11!) =(15*14*13*12)/(4*3*2*1) 1365 answer
In how many a committee of 5 members can be selected from 6 men 5 ladies consisting of 3 men and 2 ladies 6c3 *5c2 =[(6*5*4)/(3*2*1)] [(5*4)/(2*1)] =20*10 =200 answer
How many 4-letter word with or without meaning can be formed out of the letters of the word 'LOGARITHMS' if repetition of letters is not allowed 10p4 =(10*9*8*7) =5040 answer
how many ways can the letter of word 'LEADER' be arranged We have two e, so divide 6p6 by 2 6!/2! =720 / 2 =360 answer
How many arrangements can be made out of the letters of the word 'MATHEMATICS' be arranged so that the vowels always come together Let us treat all 4 vowels as 1 total digits are 11 we we take 11 – 4+1 = 8 digits vowels can be arranged among themselves = 4!/2! =8!/ (2!*2!) * 4!/2! = 120960 answer
In how many different ways can the letter of the word 'DETAIL' be arranged in such a way that the vowels occupy only the odd positions We have 3 odd and 3 even positions =3! *3! =36 answer
How many 3 digit numbers can be formed from the digits 2,3,5,6,7 and 9 which are divisible by 5 and none of the digits is repeated? Last digit must be 5 now we have 5 options for 1 st and 4 options for 2 nd digit =5*4 = 20 answer
In how many ways can 21 books on English and 19 books on Hindi be placed in a row on a self so that two books on Hindi may not be together? We have 22 places for Hindi books. 22p19 *21!
Out of 7 constants and 4 vowels how many words of 3 consonants and 2 vowels can be formed? Selection of 5 digits =7c3 *4c2 =35*6 = 210 5 digits can be arranged in 5! ways =120 total options : 210*120 = 25200 answer
What is effective rate of interest ? In the case of compound interest questions, the effective rate is generally higher than the rate. For example: if rate is 20% compounded quarterly , (4 times in a year) it will be equal to : (1+20/400)^4 =1.2155 so effectiveinterest here is 21.55% answer
What is present value ? When you are trying to find the present worth of some money which is due after some time, it is called present value. Due to factors like inflation, risk, uncertainity, present value is always less. Suppose you have to get 1100 after 1 year, at a discount rate of 10% its present value is 1000. (you can see here that there is a discount of 100) Money due – discount for time factor = present value
What is future value ? Future value takes up interest and therefore it is more than the sum invested. If I invest 1000 today, with an interest rate of 10%, it will become 1100 after 1 year.
Formula for present value ? Amount / (1+rate) ^ number of years suppose 1221 is due after 3 years and rate of interest is 10%, present value is : 1221 / (1+10/100)^3 =917.35 answer
What is the formula for future value ? Amount *(1+rate) ^ number of years suppose 1000 is invested for 3 years and rate of interest is 10% annually compounding, future value is : 1000 * (1+10/100)^3 =1331 answer
How to calculate EMI? You may use the formula for present value of annuity. Here you need a factor formula = ((1+rate)^n -1) / (rate(1+rate)^n) here n= number of instalments rate = rate % / number of instalments in a year*100 EMI = amout to pay / factor of annuity(calculated from above formula)
What will be EMI for Rs. 5 lakh rate of interest = 10%, payable in 20 annual instalments = ((1+rate)^n -1) / (rate(1+rate)^n) ((1+10/100)^20 - 1)/(10/100 (1+10/100)^20) =5.73/.67 =8.55 EMI=500000/8.55 =58479 ANSWER
What will be EMI for Rs. 5 lakh rate of interest = 10%, payable in Monthly instalments in 20 years. = ((1+rate)^n -1) / (rate(1+rate)^n) ((1+10/1200)^240 - 1)/(10/1200* (1+10/1200)^240) 6.328 / .061 =103.624 EMI = 500000 / 103.624 =4825 ANSWER
What is sinking fund ? If you deposit a sum of money every year and you are able to have a lot of money after some time this is sinking fund you create sinking fund to purchase a new machinary / building etc it is just reverse of the EMI (where you were looking at present value of annuities), because here you are taking future value of annuities.
How to calculate sinking fund contribution? For calculation of sinking fund contribution, we have to use the following formula : = ((1+rate)^n -1 )/(rate) here n = number of instalment rate = rate / number of instalments in a year*100.
Jigyasa has to collect 1 ml. After 5 years to start a new factory. How should she save every month? Rate = 12% = ((1+rate)^n -1 )/(rate) =((1+12/1200)^60 -1) / (12/1200) =.8167 / .01 dividing factor =81.669 monthly savings = 1000000/81.669 =12244.44 per month answer
What is a sample ? Instead of contacting every person, we may contact only a few persons, this is called sample. Suppose we go to check the quality of wheat to purchase. Instead of checking all the bags, we pick up one bag randomly and pick out a few grains, this is also a sample.
What are the methods of sampling ? 1. random sampling = purely by chance – just like a lottery 2. judgement sampling – here we are using some basis for judgement – the basis of judgement is related to our purpose of research. 3. quota sampling – taking some number of persons from each group 4. cluster sampling – here we divide populationin clusters (based on their geography / demography / location / etc.) and then pick up a few clusters (groups) of people and study them all
contd... Stratified ramdom sampling : here we divide population in different stratas (strata = population divided on some logical criteria) then we randomly take a few % of persons from each strata. Convenience sampling = taking sample on the basis of your convenience
What is confidence level ? It is the confidence created / associated with an interval estimate If we are using a confidence level of 95%, it means that there are 95% chances that our estimate will be close to population parameter (mean).
What is the difference between population parameters and sample statistics ? Population = actual population – but it is not possible to collect all the information about population due to our own resource constraints we dont have time or resources to collect data about population. Therefore we go for sample. When we use sample, we are using sample statistics. We try to estimate population parameters from sample statistics.
What is population parameter? If you go for census study (you contact each element in the population and take their data), you can calculate population parameter. There are different parameters which are of use like : mean, mode, median, standard deviation, etc. But we actually take sample so we estimate population parameters from sample statistics.
What is sample statistics? Sample characteristics like mean, mode, median, standard deviation etc. Which are used to estimate population parameter
What is sampling error? The difference in the value identified by sample and the population parameter is called sampling error. For example, population mean is 20 but sample mean is 18, so sampling error = 2
What is quantitative data and qualitative data ? quantitative data = data which tell about what and how much qualitative data=data which only contain nominal scale – just name / labels etc.
What are the various types of scales of data ? 1 nominal scale = only names are there – like ram, shyam 2. ordinal scale - they give order or ranks 3.interval scale: they have identifiable gaps, but they dont have zero 4. ratio scale – they can be used to calculate ratio – they have a zero and ratio can also be calculated, they are the best in numerical analysis
What are the various methods to present data ? Scatter chart / diagrams bar chart Histogram Ogive Dot plot etc.
What is statistical inference ? When we try to estimate or test hypothesis using sample data, it is called statistical inference (here we use sample data, not the population parameters).
What is a variable ? It is a characteristic of some interest relating to some element. It can take different values. Variables are denoted by X,Y,Z etc. Examples of variables are : for people = their education, for car=their car, fuel efficiency etc.
What is cross sectional data ? Data collected at the same point of time from different segments
What is cross tabulation? There are two variables, their data are presented in one table – one variable as X axis and other variable as Y axis for example : Age and Height or Marks and Attendance
Can we take up same element again in sampling ? Yes, it is possible (by chance) there are two types of sampling : 1. sampling with replacement 2. sampling without replacement in sampling with replacement, it is possible that by chance we may pick up same element again (we should avoid).
What is normal distribution ? There are many types of probabilty distributions, normal distribution is used most widely. It assumes that the data are bell shaped and mean=mode=median. Normal distribution assumes that most of the data are near mean and extreme data are very few.
How do you calculate mode ? Mode is that element, which has highest frequency if there is continuous data,you may use the following formula : Mode = L1 + (D1 / (D1+D2) * class interval) L1 = lower limit of the modal class D1=higest frequency – frequency in preceding class D2=higest frequency – frequency in succeeding class
Example of mode : 2,3,5,6,7,8,9,11,13,13,14,14,14,15,17,21,22,34,43 out of these mode is 14 (because its frequency is 3)
Example of mode ? Class frequency 10 to 20 4 20 to 30 8 30 to 40 12 40 to 50 4 apply the formula : modal class = 30 to 40 = 30 + ((12-8) / ((12-8)+(12-4)) * class interval = 30 + 4/12 * 10 = 30+3.3 = 33.3 answer
What is median ? Median = exact mid point in the data formula = n/2 or (n+1) / 2 example : 1,3,5,7,9 thre are 5 values, so n = 5 (5+1)/2 = 3 so 3 rd value is median. Median = 5 answer
Formula for median ? L1 + ((M-C) / F)* class interval L1 = lower limit F = frequency M=median = n/2 C = cumulative frequency of the previous class
Example of median ? Class frequency C.F 10 to 20 4 4 20 to 30 8 12 30 to 40 12 24 40 to 50 4 28 L1 + ((M-C) / F)* class interval M=28/2 = 14, so median class is 30 to 40 30 + (( 14-12)/12) * 10 =30+1.6 = 31.6 answer
What is cumulative frequency ? When you add up frequencies, it is called cumulative frequencies in the previous example , 10to 20 is 4, but 20 to 30 is shown as 16 (4 of 10 to 20 is added in it) cumulative frequency
What is relative frequency ? Formula = frequency of a class / number of items
Find mean, mode and median on following data ? Class freq. C.F x*f 10 to 20 5 5 75 20 to 30 12 17 300 30 to 40 12 29 420 40 to 50 5 34 225 total 34 1020 mean = 1020/34 =30,
solution... Median = 20+(17-5)/12 * 10 = 30 mode cannot be calculated because there are two equal modal values, so we use the following formula Mode = 3median – 2 mean mode = 30 answer k
Calculate rank correlation using the following data ? X Y 2 11 4 8 6 3 8 1
Solution Calculate their ranks X Y Rx Ry D^2 2 11 4 1 9 4 8 3 2 1 6 3 2 3 1 8 1 1 4 9 d=rx-ry so D^2 = (Rx-ry)^2 D^2 = 20
What is formula of quartile deviation ? (q3 – q1)/ 2
What is formula of coefficient of quartile deviation ? (q3-q1) / (q3+q1)
What is formula of coefficient of mean deviation ? Mean deviation / Median or mean deviation / mean
calculate combined standard deviation. Means A=8 B = 3, std. Deviation A = 2 B = 1 n1 of a = 20 n2 =30 Formula = sqrt ((n1s1 +n2s2 +n1d1+n2d2)/(n1+n2)) d1 = mean of a – combined mean d2 = mean of b -combined mean combined mean = (160+90)/50 = 5 d1=3 d2 =-2 sqrt ((20*2 +30*1 +20*3+30*(-2))/(20+30)) =1.18 answer
FORMULA OF RANK CORRELATION = 1- (6 ∑ D^2) / (N^3 -N) = 1 – (6*20)/(64 -4) =1 - 120/60 =1-2 =-1 Thus two series have perfectly negative correlation
What is sample space? A set of all experimental outcomes is called sample space
What is experiment ? In research, we manipulate some data, we change some variables that is called experiment,
What is experimental group? There are generally two types of groups – one on which you undertake experiment (experimental group) and one on which you dont do any experiment, just do observation.(control group) Example – if you have two plants, on one plant you pour fertilisers and on the other you dont put any fertilizer, then the former is experimental group and 2 nd is control group.
What is standard deviation? Deviation = difference here we find the difference of each value with mean and this will create standard deviation. Formula = square root of (sum of squares of difference of each element from mean)
Example : of standard deviation.. X dx^2 2 4 3 1 5 1 6 4 average = 16/4 = 4, dx = x-average = 2-4 = -2 average of dx^2 = variance = 10 / 4 = 2.5 standard deviation = square root of variance = sqrt(2.5) =1.58 answer
Steps in calculation of standard deviation ? 1. calculate average. For this total all the values of X and then divide it by n (in our example, we have divided 16/4, where 16 is total of all values and 4 is number of elements. 2. find dx (difference of x from mean) 3. square the dx to get dx^2 4 . find average of dx^2 this is called variance. 5. find square root of variance. This is called standard deviation.
What is covariance ? If there are two data series – let us say X andY, then we can find their relation, we need covariance. Co = together Variance = difference formula of covariance = total of dx*dy /number of elements
Example of covariance : X Y dx dy dx*dy 2 6 -2 2 -4 3 5 -1 1 -1 5 3 1 -1 -1 6 2 2 -2 -4 average of X=16/4 =4 , average of Y = 16/4=4 dx = difference of each element from X dy = difference of each element from Y total of dxdy=-10 covariance = -10/4 = -2.5 answer
What is correlation and regression Correlation just tells you that there is a relation between two variable. It doesnt tell you which is the dependent and which is independent variable. If you want to predict / forecast, you have to use regression. In regression, we have two variables – one dependent and one independent. Regression tells you about relation of these two variables. Based on regression, you can predict / forecast.
How to calculate correlation? There are many methods to calculate correlation, but the carl pearson's method is the most popular method. Formula of correlation = covariance / (product of standard deviation of X * standard deviation of Y) suppose covariance of X and Y is -4 and standard deviation of X is 2 and standard deviation of Y is also 2, then correlation = -4 / (2*2) = -1
What is the maximum and minimum value in correlation ? Maximum correlation = 1 (perfectly positive relation) minimum correlation = -1(perfectly negative relation – one falls other declines) no relation = 0
Example of correlation? X Y dx dy dx*dy 2 6 -2 2 -4 3 5 -1 1 -1 5 3 1 -1 -1 6 2 2 -2 -4 average of X=16/4 =4 , average of Y = 16/4=4 total of dxdy=-10 total of dx^2 = 10, standard deviation of x = sqrt(2.5) and standard deviation of Y = sqrt(2.5) covariance = -10/4 = -2.5 correlation = -2.5 / (sqrt (2.5) * sqrt (2.5)) = -1 answer
What is regression ? The basic model of linear regression (one dependent and one independent variable) is as under : y = a+ bx+e a = intercept b=slope e=error since error is random and moves in either direction, so we generally write as y=a+bx
What is regression? It is a simple tool to predict data. Regression assumes that there are at least two data sets, one is dependent on another. Example : if you say that demand is based on price, then we can have regression between price and demand. Price will be independent variable (called X), and demand will be dependent variable (called Y)
What is slope and intercept ? Simplest form of regression is linear regression (a straight line between dependent and independent variable). Here we need two things : slope and intercept. Slope is denoted by B and intercept is denoted by A. Formula of regression is : Y = A +BX 1. A is the point (value) of Y when X = 0 2. B denotes the rate of change in Y in response to change in X.
How to calculate slope? In the formula of y=a+bx, we use b to denote slope. It denotes change in y with reference to change in x. Slope can be calculated with the following formula = = covariance / (variance of x) once we calculate b, we can easily calculate a by putting in the formula y=a+bx thus we can get both a and b, then we can calculate yhat or Ỷ = a+bx (because a and b are known and with the help of x we can predict y)
Example of regression X Y dx dy dx*dy 2 6 -2 2 -4 3 5 -1 1 -1 5 3 1 -1 -1 6 2 2 -2 -4 average of X=16/4 =4 , average of Y = 16/4=4 variance of x = 2.5 covariance = -10/4 = -2.5 b=covariance/variance of x, and covariance =-2.5, variance of x=2.5 b= -2.5 / 2.5 = -1 now put it in formula to get a y=a+bx take y=4, x=4, b=-1 so 4 = a+(-1) 4 = or a = 8 thus a = 8, b = -1 so we can now predict y
What is coefficient of determination ? the percent of the variation that can be explained by the regression equation. the explained variation divided by the total variation the square of r (r denotes correlation) it is also called r squared we calculate it by taking difference of estimated y and average of y
Example of coefficient of determination Suppose estimated Y = 4 actual Y = 3 average of Y = 7 now total variation is 3-7 = -4 explained variation (determination) = 4-7 = -3 unexplained variation (error) = 3 -4 =- 1 here coefficient = -3/-4*100 =75%
What is coefficient of variation ? = Standard deviation / mean * 100 suppose standard deviation = 2 suppose mean = 4 =2/4 *100 = 50% coefficient of variation = 50%
What is skewness ? When the data are not normallydistributed, they are skewed. They are either towards left or towards right side. If the data are not skewed, it looks like a bell shaped data. But if it is skewed, it looks like a slope or like a see – saa. Formula = (mean – mode ) / standard deviation
What is bar chart ? It is a chart which represents thick lines (bars) to denote frequencies of X variables (on X axis) length of the bar should be equal to frequency it is similar to histogram (but there we use connected rectangles
What is ogive ? It is a chart. It indicates data on cumulative basis. Here you first calculate cumulative frequency and then find its %. Data may be expressed using a single line. You can display the total at any given time. The relative slopes from point to point will indicate greater or lesser increases. Ogive can be from left to right or from right to left
Example of ogive (here data are absolute in cumulative frequency – not in %)
What is class interval ? It denotes the width of class for example : 10 to 20 here class interval is 20-10 =10 class interval is calculated by following formula : (highest – least)/ number of classes desired
What is the difference between continuous and discrete data? Continuous data can take any value like 10.00073 it is writtes like : 10 to 20 (so here any value between 10 to 20 can come) but discrete data can take only certain numerical values like 3,4,5,6 etc.
Which of the following are linear equations? a) y = 4x − 5 b) 2x − 3y + 8 = 0 c) y = x² − 2x + 1 d) 3x + 1 = 0 e) y = 6x + x^3 f) y = 2 answer : out of these all those equation which result in straight line make a linear equation. C and E dont make any straight line. Rest all are linear equations.
Which of these ordered pairs solves the equation y = 5x − 6 ? A (1, −2) b (1, −1) c (2, 3) d (2, 4) answer : b & d
There are two lines : 2x+3y+5=0 and 4x-5y+2 = 0, find the point of their intersection? You may multiply the first equation by 2 and then subtract the second equation, you wil get : 11Y=-8 or Y = -8/11 putting this value, we get X = 1/11
Are these points are collinear ? A = 2,3 B = 4,1 C= -2,7 the points are collinear, if they are on one line. They are on one line if they satisfy the following formula : Xa(Yb-Yc) +Xb(Yc-Ya)+Xc(Ya-Yb) =0 =2(1-7)+4(7-3)-2(3-1) = -12+16-4 =0 so these points are collinear
Find the equation of the line which is parallel to 4x+7y+5=0, and passes through 5, -4. In case of parallel lines, the slope remains same thus only constant changes. Here constant is 5. 4(5)+7(-4)+k=0 k=8 4x+7y+8=0 answer
Are these points colinear? Make an equation from them? (3,1), (5,-5),(-1,13) 3(-5-13) +5(13-1)+-1(1--5) =-54+60-6 =0 these points are colinear Y-Y1/Y2-Y1 =X-X1/X2-X1 Y-1/-6 = X-3/2 2Y-2=-6x+18 Y+3X=10 answer
Find the equation of the line parallel to the line joining (7,5) and (2,9) and passing through (3,4) ? Y-Y1/Y2-Y1 =X-X1/X2-X1 Y-5/9-5 = X-7/2-7 -5Y+25=4X-28 =4X+5Y -53 =0 for parallel, constant = k 4(3) +5(4)+k=0 k = -32 so equation = 4x+5y-32=0 answer
What is a variable ? It can take different values. Generally variable is denoted by X,Y,Z, and constant is denoted by a,b,c variable can be of two types : 1. discrete – it takes only integer values example: number of houses 2.continuous – it can take any values example : height of a person
What is a function? It shows relation between two variable – one is dependent and one independent dependent variable is dependent on independent variable example : price = f(demand) here we want to show that price is dependent on demand, so price is a function of demand. Dependent variable = price, independent variable = demand
What are the various types of functions ? 1. linear function example : Y = A +bx here there is a straight line on a graph paper – and there is a direct linear relation between the two variables 2. polynomial function : there are multiple independent variables Y = a+bx1+cx2 .... 3. absolute value function - no impact of negative values
What are the measures of central tendency ? Mean = arithematic average (sum / number) Mode = the Variable which has highest frequency Median = the exact mid point of data. For example : 2,3,8,11,11 here Median = 8, mean = 7, Mode = 11
Formula of mean ? Add all the values and divide by number in the previous example : add all the values of 2,3,8,11,11 = 35 there are 5 values so divide 35 by 5 = 7 mean is denoted by Xbar
What is relation between mean, mode and median? mode=3median-2 mean in our example it should be : = 3*8 – 2 * 7 =10. but we have found 11.actually you will see, that the mode here should be 10 – as we discuss in later exercises
Calculate 1 st quartile from the following data ? X Freq. C.F 10 TO 20 4 4 20 TO 30 6 10 30 TO 40 8 18 40 TO 50 7 25 50 TO 60 5 30
SOLUTION FORMULA = L 1 (q1 – c) / f * (class interval) Q1 = n/4 =first quartile= 30/4 = 7.5 7.5 falls in 20 to 30 = 20 + (7.5 – 4) / 6 * (10) =20 + ((3.5/6) *10) =20 + 5.8 = 25.8 ANSWER
Calculate 31 st percentile from the following data ? X Freq. C.F 10 TO 20 4 4 20 TO 30 6 10 30 TO 40 8 18 40 TO 50 7 25 50 TO 60 5 30
Solution FORMULA = L 1 (31p – c) / f * (class interval) 31p= n/100 *31 = 30/100 *31 =9. 3 9.3 falls in 20 to 30 = 20 + (9.3 – 4) / 6 * (10) =20 + ((5.3/6) *10) =20 + 8.8 = 28.8 ANSWER
Can mean, mode and median be equal? Yes - in normal distribution, mean, mode and median are all equal. In normal distribution, we have 3 characteristics : 1. data are symmetrical 2. data are more in central values and less as we move apart 3. mean=mode=median most of statistical formula require normal distribution.
How to calculate median in bigger data : Formula :( N+1 )/ 2 n=number of data for example : 1,2,3,4,5,6,6,7 here we have 8 values , so (8+1)/2 = 4.5 so we should take mid value between 4 and 5, which is 4.5 answer
What types of data series are there ? There are many types of data series : individual data discrete series continuous series in continuous data series, there is no value which is not possible. (for example : 0 to 10, 10 to 20, 20 to 30) ....
What are the measures of dispersion? Dispersion = how the data is looking in comparison to mean. If data is wide apart from mean, there is high dispersion. If the data is just close to mean, there is very less dispersion. If data has more dispersion, there is less uniformity in the data. We have many tools to measure dispersion like range, variance etc.
Example of high and less dispersion of data : Low dispersion : 6,6,7,7,7,7,8,8,9, high dispersion : 1,4, 8, 19,20,50,60,80,100 you can see, the first data set has far more consistency and dispersion is less. Tools to measure dispersion are : range, standard deviation, variance, mean deviation etc. Range = highest – least value. In the first case range = 9-6 = 3, in 2 nd case range = 100-1 = 99
What is standard deviation? Here we find the difference between each value and mean. Then we square the difference and find the average. This is called variance. Square root of variance is called standard deviation. This gives us an estimate of dispersion of data.
Example of standard deviation? X has 5 values : 1,2,3,4,5 its total is 10. average = 15/5 = 3 now we take difference of each value : (1-3) = -2, (2-3)=-1, (3-3) = 0... so we get : -2,-1,0,1,2, now square them = 4,1,0,1,4 total =10 now find the mean=10/5 = 2 (this is variance) square root of 2 = 1.4 is the standard deviation.
Example of intercept and slope ? If X change by 10% but Y changes by 20%, so slope = 20/10 = 2 if it is written that X,Y points are : (0,2), (2,4),(4,6),(6,8) ... here you can see that there is a linear relation between X and Y. (first digit is X and second digit is Y). Intercept is 2, because when X is 0, Y is 2.
Find slope in the following example? X Y 2 11 4 8 6 3 8 1
Solution Slope (b ) = covariance / variance of X so first we shall calculate covariance
What are the types of data ? 1. primary (which you collect yourself) 2. secondary (which is already collected for some other purpose, but you can also use it).
What are the various types of statistical analysis? 1. descriptive statistics : here you collect data and present it (for example data on market share) 2. inductive statistics : here you undertake statistical inferences and estimate for future 3. statistical decision theory : here you have to take decision about a situation based on statistics
What are the basic tools for statistical analysis ? 1. be clear about problem 2. formulate hypothesis 3. set significance level (how much accuracy do you want) 4. set sampling frame, research design & collect data 5. analyse data and draw inferences
What is hypothesis ? What do you want to test. We frame 2 hypothesis at least. One of them is null hypothesis and one is alternate thesis. Based on literature review & our own experiences, we frame some understanding on the subject. We have to frame null hypothesis which is opposite of this idea. Then we have to frame alternate hypothesis. We test out null hypothesis.
What is type I and type II error ? If we reject a null hypothesis, which is actually true, we are having type I error if we accept a null hypothesis which is actually false – we are having type II error. We have to set standards for both these errors. If you become liberal for type I error, then type II error will increase and vice versa.
What is alpha ά ? Type I error is called alpha
How do we test alpha ? We calculate P value. If P value is less than alpha, we reject null hypothesis if P value is more than alpha then we cant reject null hypothesis
What is p value ? It is actual calculation about what is the possibilitity of error. It is calculated to be compared with alpha. Alpha is determined in advance, but P value is actual observation.
How does statistics & econometrics help you in business decisions? You can test your decisions using data. You can also build models. There are various types of model : 1. physical, 2. geographic 3. schematic 4. analog 5. mathematical / statistical / econometrics based statistics and econometrics can help you in building the last types of models (5 th type)
What types of statistical analysis are possible ? 1. univariate (there is a single set of data) 2. bivariate (there are two sets of data) 3. multivariate (there are many sets of data)
What are univariate tools? Mean, mode, median, time series analysis, moving average analysis etc.
What are bivariate tools ? Correlation, regression, etc.
What are multivariate tools ? There are many like : conjoint analysis, multivariate regression etc. Here we have many variables : example : demand is dependent on
An biased die is tossed.Find the probability of getting a multiple of 3? The possible options are : 1 to 6. there are only 2 multiples of 3 : 3,6 so probability is (number of favourable outcomes ) / (total number of possibilities) = 2/6 = 1/3 answer
In a simultaneous throw of a pair of dice,find the probability of getting a total more than 7? We can have 36 possibilities (6*6) however, we need only those cases where the total is 8 or more. These are : (6,2),(6,3),(6,4),(6,5),(6,6),(5,3),(5,4),(5,5),(5,6),(4,4),(4,5),(4,6),(3,5),(3,6),(2,6) =15 answer = 15/36 = 5/12 answer
A bag contains 6 white and 4 black balls .Two balls are drawn at random .Find the probability that they are of the same colour? Both are white : 6/10*5/9 both are black = 4/10*3/9 add them : =42/90 or 7/15 or : 6c2/10C2*1/2 + 4c2/10c2 =21/45 = 7/15 answer
Two dice are thrown together.What is the probability that the sum of the number on the two faces is divisible by 4 or 6? The possibilities are : (1,3)(1,5) (2,2) (2,4),(2,6),(3,1),(3,3),(3,5),(4,2),(4,4),(5,1),(5,3),(6,2),(6,6) thus we are able to get 14 out of 36. so answer = 7/18 answer
Two cards are drawn at random from a pack of 52 cards What is the probability that either both are black or both are queens? Both are black = 26/52 * 25/51=25/102 both are queens : 4/52 * 3/51=3/663 both are black queens : 2/52*1/51 = 1/1326 now add them : (25/102 + 3/663 – 1/1326) =(325+6-1)/1326 =330/1326 or .25 answer
Two dices are tossed the probability that the total score is a prime number? Prime numbers are : 1,2,3,5,7,11 totals are : (1,2),(1,1),(1,4),(1,6),(2,1),(2,3),(2,5),(3,2),(3,4),(4,1),(4,3),(5,2),(5,6),6,1),(6,5) =15/36 answer
Two dice are thrown simultaneously .what is the probability of getting two numbers whose product is even? If any one of the two numbers is an even number, the product will be even number. Thus we should pick up all those cases when both the numbers are odd numbers : (1,1),(1,3),(1,5),(3,1),(3,3),(3,5),(5,1),(5,3) (5,5) thus there are only 9 such cases. Remove them from 36, we get : 27 cases answer : 27/36 answer
In a lottery ,there are 10 prizes and 25 blanks.A lottery is drawn at random. what is the probability of getting a prize ? 10/(10+25) =10/35 or 2/7 answer
In a class ,30 % of the students offered English,20 % offered Hindi and 10 %offered Both.If a student is offered at random, what is the probability that he has offered English or Hindi? 30+20-10 = 40% or .4 answer
Two cards are drawn from a pack of 52 cards .What is the probability that either both are Red or both are Kings? Both are red ½ * 25/51 both are king = 4/52 + 3/51 now add both these answers =55/221
one card is drawn at random from a pack of 52 cards.What is the probability that the card drawn is a face card? Face cards are : Jack, queen, king total = 12 12/52 answer
A man and his wife appear in an interview for two vacancies in the same post.The probability of husband's selection is 1/7 and the probabililty of wife's selection is 1/5.What is the probabililty that only one of them is selected? Husband + not wife =1/7 * 4/5 = 4/35 wife + not husband =1/5 * 6/7 = 6/35 add = 10/35 answer
From a pack of 52 cards,one card is drawn at random.What is the probability that the card is a 10 or a spade? 4/52 + 13/52 – 1/52 =16/52 answer
A bag contains 4 white balls ,5 red and 6 blue balls .Three balls are drawn at random from the bag.What is the probability that all of them are red ? 5/15*4/14*3/13 or 5c2/15c2 = =2/91
A box contains 10 block and 10 white balls.What is the probability of drawing two balls of the same colour? Both are black : 10/20 * 9/19 =9/38 +both are white : 10/20 * 9/19 =9/38 or black : 10c2 / 20c2 +white : 10c2 / 20c 2 =90/190
A box contains 20 electricbulbs ,out of which 4 are defective, two bulbs are chosen at random from this box.What is the probability that at least one of these is defective ? In such questions (at least one type), it is better to reverse the question, solve it and deduct the answer from 1. So here we shall first calculate the probability of getting no defective bulb. Let us assume that no bulb is defective : 16/20 * 15/19 = 12/19 at least one is defective = 1 -12/19 = 7/19 answer
Two cards are drawn together from apack of 52 cards.What is the probability that one is a spade and one is a heart ? First is spade and 2 nd heart : 13/52 * 13/51 = 13/204 First is heart and 2 nd spade : 13/52 * 13/51 = 13/204 add them : 13/102 answer
The probability that a card drawn from a pack of 52 cards will be a diamond or a king? 13/52 + 4/52 – 1/52 =16/52
What is hypothesis ? What you think or what you want to check out or what you want to study is called hypothesis. We prepare two types of hypothesis : 1 null hypothesis (just opposite of what we think or what we are testing out) 2. alternate hypothesis (what we want to check out). We study and check null hypothesis only.
What is systematic sampling ? If you pick up first unit by random sampling thereafter you pick up each value systematically it is called systematic sampling. Suppose you pick up first unit randomly, this is 12, no you take up every 4 th element, it is systematic sampling, you take up 12, 16, 20, 24, 28 ... so on thus this type of sampling saves time and creates the virtues of random sampling also.
What are continuous and discrete distributions ? Continuous distributions are : 1. normal 2. exponential Discrete distributions are : 1. pascal 2.poisson 3. binomial
What is theoretical distribution ? If you pick up a sample it will not give you exactly the same value as population. If you pick up a large number of samples out of population, and plot the values of these samples, you will get what we call as theoretical distribution. If the sample size is large, the theoretical distribution will approximate the real population.
What is normal distribution ? It is also called the Gaussian distribution. It is defined by two parameters mean ("average" m) and standard deviation (σ). A theoretical frequency distribution for a set of variable data, usually represented by a bell-shaped curve symmetrical about the mean. MEAN=MEDIAN=MODE & data symmetrical bell shaped
What is Z, if mean=100 and standard deviation (σ) 6 find P(X<106) Step 1: For a given value X=106 formula of Z = value – mean / standard deviation Z = (106-100)/6 = 1 Step 2: Find the value of 1 in Z table Z = 1 = 0.3413 Step 3: Here the X value is greater than mean (bell shaped curve is half = .5 in both side equally, .3413 is in right side of this curve, but left side is also included, so .5 of left side) P(X) = 0.5 + 0.3413 = 0.8413
What is probability density function? PDF of a continuous random variable is a function which can be integrated to obtain the probability that the random variable takes a value in a given interval. PDF is used to find the point of Normal Distribution curve. Continuous Probability Density Function of the Normal Distribution is called the Gaussian Function.
Formula of PDF ? ((1/(σsqrt(2π)))*e^(x-m)^2 / (2σ^2)
What is type I error ? When you reject null hypothesis when it is actually correct, it is called type I error it is also called alpha
What is type II error ? When you accept null hypothesis when it is actually incorrect, it is called type II error
What is type III error ? Rejecting a null hypothesis for wrong reason is called type III error it is rarely used.
What is binomial distribution ? The Binomial Distribution is one of the discrete probability distribution. It is used when there are exactly two mutually exclusive outcomes of a trial. These outcomes are appropriately labeled Success and Failure. The Binomial Distribution is used to obtain the probability of observing r successes in n trials, with the probability of success on a single trial denoted by p.
Example of binomial distribution Take up the case of coins. What is the probability of getting 3 heads on 4 trials (coin has only two outcomes = head, tail) (what is probability of observing 3 successes in 4 trials, with the probability of success on a single trial denoted by p = .5) formula : P(X = r) = nCr p^r (1-p)^(n-r). C = combination
Solution .... N = 4, r = 3 p = .5 formula of combination = N! / ((N-r)! * r!) C = 4! / ((4-3)! * 3!) =24 / (1*6) = 4 4! = 4*3*2*1 = 24 Ncr = 4, p^r = (.5)^3 = .125 (1-p)^(n-r). = (1-.5)^(4-3) = .5 solution = 4*.125*.5 =.25 answer
What is poisson distribution? It is one of the discrete probability distribution. This distribution is used for calculating the possibilities for an event with the given average rate of value(λ). A poisson random variable(x) refers to the number of success in a poisson experiment.
Formula of poisson distribution f(x) = ((e^-λ)(λ^x)) / x! where, λ is an average rate of value. x is a poisson random variable. e is the base of logarithm(e=2.718)
in an office 2 customers arrived today (take it as average). Calculate the possibilities for exactly 3 customers to be arrived on tomorrow. Here λ (lembda) (mean arrival) = 2, & x (value to calculate) = 3 Step1: Find e^-λ. where, λ=2 and e=2.71828 e-λ = (2.718)^-2 = 0.135. Step2: Find λ^x. where, λ=2 and x=3. λx = 2^3 = 8. Step3: Find f(x). f(x) = e-λλx / x! f(3) = (0.135)(8) / 3! = 0.18.
How to carry out hypothesis testing ? First of all fix level of significance and alpha. If you keep alpha of 5% it means that you will consider level of significance of 95%. That means that there would be 5% chance of error (which you are willing to tolerate). When we calculate Z (in normal distribution), we try to see whether it will fall in our level of significance or not and what is the p value. If p value is less than our acceptable error, we reject the null hypothesis.
What is alpha in statistics ? It is the error that you are willing to tolerate. Alpha is also denoted by type I error
What is the variance of binomial distribution ? N * p * q n = number of units p = probability q = (1-p)
Calculate coefficient of concurrent deviation ? (a type of correlation) X Y 4 8 5 4 6 2 8 1
Solution Formula = + / - sqrt ( + / - ((2c – m) / m) ) c = number of positive signs as concurrent deviations m = totle number of pairs
Solution X Y dx dy dxdy 4 8 5 4 + - - 6 2 + - - 8 1 + - - here m = 3, c = 0 (C is number of + signs in dxdy) = -sqrt (-(0-3) / 3) =- 1 so there is correlation of -1. answer
What is finite population multiplier ? When we are taking samples from finite population without replacement, then the properties of normal distribution get distorted, because the probability of 2 nd item depends on 1 st item and so on. Therefore we have to use finite population multiplier with all our formula : sqrt( (N-n) / N-1)) N=population size; n=size of sample
What is standard error mean ? Error = fluctuations standard deviation of mean is also called standard error of mean its formula = standard deviation / sqrt(n) n = number of items in sample
Formula for standard error mean ? Standard deviation of mean / (sqrt n) n = size of sample if population is finite, use finite population multiplier
Why do we do sampling ? When we are collecting any data – there are two options – 1. contact each unit and collect data from this – called census 2. pick up only a few and on the basis of their response try to infer the response of the entire population – called sampling sampling saves time, resources, but there is little bit possibility of error – which can be minimised by systematic research process.
What are the main types of sampling ? Probability or non probability sampling probability sampling = where each element has equal probability of selection non-probability sampling = where due to some or other reason, there is inqual probability of selectio of item example : in a fair contest, every one has equal chance / lottery – these are examples of probability sampling non-probability sampling - reservation / selection of your own friends / nepotism
What are probability sampling methods ? 1. simple random sampling (just like lottery) 2. systematic sampling (select first item randomly thereafter pick up each item on fixed gap) 3. stratified sampling (divide population in different strata / group / classification and then pick up randomly some persons from each strata) 4. cluster sampling (here pick up one or a few cluster out of a large number of clusters)
When should we use which types of sampling ? It depends on our purpose, population, type of universe and the situation. Suppose, you are able to get clusters, which have elements representing the entire population, you may go for cluster sampling. If you want to really use a good method, have random sampling in that method.
What are the methods of nonprobability sampling ? Here we undertake sampling on the basis of some criteria / convenience : 1. convenience sampling (example : you pick up people from your friends / relations ) 2. quota sampling (example our reservation system) 3. judgemental sampling (example : select sample on some criteria)
What is multi-stage sampling ? When we undertake sampling in different stages, it is called multistage sampling. Example : suppose you want to study rural development in the world - First you pick up nation to study, then you pick up state, then you pick up district, then you pick up village and finally the sample
What is cluster sampling ? Suppose that population is so divided that there are many clusters and each cluster is a mini representation of the entire population, then we can go for cluster sampling.
What is sampling distribution ? Sampling distribution is the distribution of all the means of the various samples that are possible from a population. Example : suppose our population is 1,2,3,4,5,6 and we are picking up samples of 3 units out of this. Population mean = 3.5 sample means could be : (1,2,3) = 2, (2,3,4) = 3, (3,4,5) = 4 and so on. So we can have sample means like 2,3,4,5, etc. If we plot these sample means, it will give us distribution which is similar to the population itself. Larger the sample, more accurate will be the estimation.
What is sampling distribution of mean? The distribution (on graph paper) of the sample means is called sampling distribution of mean
What is central limit theory and Z ? Normal distribution is when we have mean=median=mode and all these are in the centre of data (data is bell shaped) Z = (sample mean - population mean ) / (standard deviation * sqrt(n)) based on Z we can calculate probability of a value taking some value on the graph.
Example of central limit theory Population mean = 100 variance = 36 sample size = 25 what is the Z that sample has mean of 90 ? (90 – 100) / (6 * (5) ) = 10 / 30 = - .33 thus Z = - .33 from this we can make inference here we should use t distribution instead of z
What is the difference between Z and T ? Z denotes the limits as per normal distribution t is also for Z. But when the sample size is less than 30, we have to use t instead of z as the sample size increases, t approaches z if the sample size is more than 30, we have to use z instead of t
What is the z for proportion ? When we are taking population data in % or in proportion, we use the followng formula : = (sqrt ((p * q ) / n ))) p = possibility / probability / porportion which is desired q = 1-p n = sample size
What is one tailed or two tailed test ? When we are comparing both the sides (increase or decrease) it is two tailed test when we are comparing only one side, it is one tailed test.
What is the procedure in hypothesis testing ? 1. frame two hypothesis : null and alternate 2. fix level of significance 3. define critical region 4. compare actual values with desired values 5 conclude
Govt data say that 65% of Indian students rent out their bikes. In a sample of 200 only 80 claimed to have rented out their bikes. Prepare null hypothesis and test it. Here we are compare data with 65% so Ho (null hypothesis) = .65 H1 = < .65 testing = (.4 - .65) / sqrt(200)
What is chi square ? Compare actual and expected values x y a 11 12 23 b 9 8 17 totl: 20 20 40 in order to calculate expected value we use folllowing formula : (row total * column total) / grand total =for first value of 11, we have : (23*20 ) / 40 = 11.5
Table of expected values .... x y a 11.5 11.5 23 b 8.5 8.5 17 totl: 20 20 40
Find square of the difference 1 st = .25 2 nd ..25 3 rd .25 4 th .25
Divide this value by expected value 1 st = .25 /11.5 =.02 2 nd ..25 /11.5= .02 3 rd .25/8.5=.029 4 th .25 / 8.5 =.029 total these values = .1 this is chi=square value at 5% significance level, the standard chi square value is 3.84 which is more than our value, so we cant reject null hypothesis and we conclude that both the groups are similar.
Question : 600 rich and 400 poor students take a test. Use chi square test to find whether their marks are significantly different or not ? H L R 460 140 600 P 240 160 400 TOT. 700 300 1000
Start = frame hypothesis and significance level Null hypothesis = both groups are similar alternate hypothesis = both groups are not similar significance level:- ά =5% (there are 5% chances that an incorrect hypothesis is rejected)
CHI SQUARE – STEP 1. find expected values for each of these values The formula is =: (row total * column total)/grand total expected values are as under : H L R 420 180 600 P 280 120 400 TOT. 700 300 1000
Step 2 : find difference between observed and expected values and square them up. Formul a = (o – e)^2 H L R 1600 1600 P 1600 1600
Step 3 : divide (o-e)^2 by expected values Formula (o-e)^2 / expected value H L R 1600/420 1600/180 P 1600/280 1600/120
Step 4 : add them all : this is chi square value = 3.81 + 8.9 + 5.71 + 13.33 total = 31.75
Step 5 : compare this value with standard value. Standard value can be calculated by a formula or you can also see chi-square table to find the standard value. Table has two dimensions : one dimension shows degree of freedom and one dimension denotes level of significance degree of freedom = (row -1) (column -1) =(2-1*(2-1) = 1 at 1 degree of freedom and ά =5% we find the chi square table value is 3.84. so compare the value with 3.84
Step 6 : derive conclusion If the calculated value of chi square is more than the table value, then reject the null hypothesis. If the calculated value of chi square is less than the table value, then accept the null hypothesis. In this case, our calculated value of chi square is 31.75, which is higher than table value of chi square (3.84) so we can reject the null hypothesis
Conclusion We can conclude that null hypothesis is rejected and there seems to be significant difference between the two groups.
Graphical presentation 3.84 31.75 Rejection zone Acceptance zone 0
10% of the tools produced turn out to be defective. What is the probability that out of 10 tools chosen randomly, exactly 2 are defective ? Here we can use binomial distribution or poisson distribution to solve this problem. Let us solve using binomial distribution : formula : Ncr * p ^r * q^ (n-r) n = 10, r = 2, p = 10% or .1 q = (1-p) = (1-.1) = .9 c = combination formula
Step 3 : multiply both 45 * .0043 =.19 it means that there is 19% chance that exactly 2 tools are defective.
Solve this question using poisson distribution ... Formula : = (e^ -lemda * m ^ x) / x! e =2.71828 m = probability = .1 or 10% or our sample = 10*.1 = 1 x = our question here X is 2 ( because we want to know whether 2 are defective or not)
Step 1 apply formula : first part e^ -lemda = 2.71828 ^ (-1) =.37
Step 2 : solve second part of formula m ^ x = 1^2 = 1
Step 3 solve 3 rd part of formula X! = 2! = 2
Step 4 : combine all these calculations (.37 * 1) / 2 =.19 here we can see that 19% probability is there that there are 2 tools which are defective. Answer
What is the probable error of coefficient of correlation for r =.6 and N = 64 also set limits ? PE = .6745 (1-r^2) / sqrt(n) = .054 limits : .6+.054 and .6-.054 answer
Download links for material in english http://www.authorstream.com/presentation/tkjainbkn-146799-english-error-spotting-sentence-im-law-cat-gmat-mba-management-business-research-cfp-cfa-frm-cpa-ca-cs-icwa-india-rajasthan-improvement-entertainment-ppt-powerpoint/ http://www.docstoc.com/docs/3921499/ENGLISH-%E2%80%93-ERROR-SPOTTING-AND-SENTENCE-IMPROVEMENT http://www.slideshare.net/tkjainbkn/english-error-spotting-and-sentence-improvement-presentation http://www.scribd.com/doc/19641980/Error-Spotting http://www.scribd.com/doc/11629005/English-Error-Spotting-and-Sentence-Improvement http://www.scribd.com/doc/14660441/English-Afterschoool-23-May http://www.scribd.com/doc/6583519/English-Afterschoool-21-May http://www.scribd.com/doc/6583520/English-Afterschoool-21-May-2
Download links for material on English http://www.scribd.com/doc/6583315/English-Improvement-Afterschoool http://www.scribd.com/doc/6583518/English-20-May-Afterschoool http://www.scribd.com/doc/28531795/Mock-Paper-Cat-Rmat-Mat-Sbi-Bank-Po-Aptitude-Tests
Free download useful material ... http://www.scribd.com/doc/23393316/general-knowledge http://www.scribd.com/doc/23609752/Group-Discussion-Afterschoool http://www.scribd.com/doc/6583547/General-Knowledge-24-May
THANKS.... GIVE YOUR SUGGESTIONS AND JOIN AFTERSCHOOOL NETWORK / START AFTERSCHOOOL SOCIAL ENTREPRENEURSHIP NETWORK IN YOUR CITY / CONDUCT WORKSHOP ON SOCIAL ENTREPRENEURSHIP IN YOUR COLLEGE / SCHOOL / CITY [email_address] JOIN OUR NETWORK TO PROMOTE SOCIAL ENTREPRENEURSHIP