Quality of data


Published on

A case study that explains how quality of data is much better in case of online surveys, with guidelines on how sampling and non-sampling errors are eliminated.

Published in: Marketing, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Quality of data

  1. 1. Quality of data is definitely better in case of online surveys
  2. 2. Types of errors There are two kinds of errors that can creep in during a survey – sampling errors and non-sampling (human) errors. 2
  3. 3. Sampling errors  Sampling errors are those that occur when the statistical characteristics of a population are estimated from a sample of that population.  A way to lower this error is to have randomized sampling. Now, in online surveys, the number of contacts is really high, and with low incidence rates and low completion rates, the level of randomness that is achieved is really not possible in an offline study. 3
  4. 4. Sampling errors  Also, if required, we do a process known as “weighting”. .  Every year, we conduct a baseline study covering 109 urban centres, 196 villages, 80 out of 88 NSSO regions, covering 30,066 households and 1,21,311 individuals, covering 28 states and 4 UTs. Using this baseline study “Juxt India Consumer Landscape”, we create a matrix of unique weights for each age-gender-location combination.  Using this matrix, we can project the data for any survey to a nationwide population and remove the sampling error and the selfselection bias also in this weighting process. 4
  5. 5. Non-sampling (human/system) errors In an offline study, the questionnaire administration is done by a human, who reads it out in his interpretation, which may result in bias and errors. However, in the online study, it is the respondent's interpretation, which is why we use extremely simple english, and the survey can even be done in local languages, thus removing this non-sampling error. 5
  6. 6. Non-sampling (human/system) errors Now, for there can be “bad respondents” also. So, to “clean” this data,  We clear out the junk respondents, we just don‟t believe in „response cleaning‟, we delete the case/respondent itself  We remove all the “straight liners”, respondents who fill the surveys in patterns  We also do “mode time cleaning”. The completion times for majority of responses fall within the 2/3 to 4/3 region of the mode time, this can be flexible depending on type of questionnaire. Outliers outside this band are discarded. A sample of the mode time cleaning can be seen in the next slide. 6
  7. 7. Typical scatter plot of survey response times Time 30 Mode time (most commo occurring completion time) – 13 minutes Clean Outliers lying outside 4/3rd of mode time 25 20 Time 15 Most of responses occur within 2/3rd and 4/3rd of mode time 10 5 0 200 400 600 800 Clean Outliers lying outside 2/3rd of mode 1000 1200 time 7
  8. 8. Normality, reliability and validity tests There are also some tests that can be done at client‟s request for ensuring statistical validity of data. Let us see them one by one. 8
  9. 9. Normality Test  The objective of sample normality tests is to ensure the sample is normally distributed and randomly selected.  It is important that the normality of the sample will be confirmed before subjecting it to inferential and differential analyses.  Let us take the example of a normality test on the age of respondents 9
  10. 10.  Histogram – graphical method  An initial impression of the normality of the distribution can be gained by examining the histogram. From the above Figure, it is evident that the collected data (of age) is very near to normally distributed curve. 10
  11. 11.  Normal Q-Q Plot of Age  In this Normal Q-Q plot, if the variable were normally distributed, the dots would fit the line very closely. In this case, the points in the upper right of the chart indicate the some skewing caused by the extremely large data values, otherwise data seems to be normally distributed. 11
  12. 12. Reliability test  It is the extent to which a measuring procedure yields consistent results on repeated administrations of the scale.  The objective of the reliability test is to ensure that the measurable items of each variable were measuring the same underlying construct.  The reliability test of this instrument will be examined through Cronbach‟s Alpha Coefficient. 12
  13. 13. Cronbach alpha (α)  The average of all possible split-half‟ correlation coefficients resulting from different ways of splitting the scale items  It‟s value varies from 0 to 1  α < 0.6 indicates unsatisfactory internal consistency reliability (see Malhotra & Birks, 2007, p.358)  Note: alpha tends to increase with an increase in the number of items in scale  The Cronbach alpha reliability coefficient for the choice factors scale (in our sample questionnaire) as a whole was 0.78071, indicating that the scale as a whole has acceptable internal consistency and reliability and no items were deleted. 13
  14. 14. Validity test  While the reliability test is necessary, it is not sufficient  The objective of the validity test is to identify whether the proposed items in a study are valid for measuring the underlying concept, how accurately the concept corresponds to the real world  In a test case, the concept referred to the respondents‟ perceived importance of factors influencing their intention to study at X 14
  15. 15. Sample validity test Importance of the aspects related to content & structure of course offered a12_7 Correlations a12_1 a12_4 a12_2 a12_5 a12_6 a12_3 1.00 -0.07 -0.06 0.00 -0.09 -0.17 -0.12 -0.07 Adaptability to professional environment (a12_7) 1.00 -0.05 -0.18 -0.13 0.04 -0.21 -0.06 -0.05 1.00 -0.17 -0.12 -0.33 -0.16 0.00 -0.18 -0.17 1.00 0.01 -0.11 -0.28 -0.09 -0.13 -0.12 0.01 1.00 -0.25 -0.26 -0.17 0.04 -0.33 -0.11 -0.25 1.00 -0.06 -0.12 -0.21 -0.16 -0.28 -0.26 -0.06 1.00 Reasonableness of the minimum qualification requirement (a12_1) Specialized programs in the offing (a12_4) Range of courses offered (a12_2) Reasonableness of the course duration (a12_5) Topicality of course content (a12_6) Flexibility in selection of course (a12_3) 15
  16. 16. Validity test  The questionnaire for the test study was developed using choice factors from similar studies as a point of reference, which was then adapted to the Indian context and in fact correlation between the factors was minimum  Thus, the content validity of the questionnaire was addressed 16
  17. 17. Thank you www.juxtconsult.com www.getcounted.net 17