Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data collection & management

2,863 views

Published on

  • Be the first to comment

Data collection & management

  1. 1. Medicine & Society IICollecting & Managing Data Dr Azmi Mohd Tamil Dept. of Community Health, Faculty of Medicine, UKM notes partially based on a lecture by Assc. Prof. Dr. Roslina Abd. Manap
  2. 2. Sampling Choosing a relatively small subset such that it can adequately represent the entire spectrum of population subjects Aim to extrapolate results back to a substantially larger population to save time, money, efficiency and safety.
  3. 3. SAMPLINGPROBABILITY NON- SAMPLING equal chance of being PROBABILITY selected SAMPLING • simple random, • convenience, • systematic, • • quota, stratified, • multistage, • purposive. • cluster
  4. 4. SAMPLING & TYPE OF POPULATION Selection representative of population ? sampling methods - simple random sampling (may not be practical in national study) - stratified random sampling (in heterogenous pop./stratum) - multistage sampling (national-state-district-sub district-village) - cluster sampling
  5. 5. Data Collection Data collection begins after deciding on design of study and the sampling strategy
  6. 6. Data Collection Sample subjects are identified and the required individual information is obtained in an item-wise and structured manner.
  7. 7. Data Collection Information is collected on certain characteristics, attributes and the qualities of interest from the samples These data may be quantitative or qualitative in nature.
  8. 8. Types of Variables Qualitative - categorised based on characteristics which differentiate it e.g. ethnic - Malay, Chinese, Indian etc. Qualitative variables can be classed into nominal & ordinal. Quantitative - numerical values collected by observation, by measurement or by counting. Can either be discrete or continuous.
  9. 9. Variable Classification QuantitativeQualitative  discrete - from Nominal - no rank counting ie no of nor specific order children/wives e.g. ethnic; M, C, I &  continuous - can be in O. Ordinal - has fractions, from measurement e.g. rank/order between blood pressure, categories but the haemoglobin level. difference cannot be measured.
  10. 10. Types of DataTable 1.1 Exam ples of types of data QuantitativeContinuous DiscreteBlood pressure, height, w eight, age Number of children Number of attacks of asthma per w eek CategoricalOrdinal (Ordered categories) Nom inal (Unordered categories)Grade of breast cancer Sex (male/female)Better, same, w orse Alive or deadDisagree, neutral, agree Blood group O, A, B, ABhttp://www.bmj.com/collections/statsbk/
  11. 11. SO WHAT!So what’s the big deal about data types?
  12. 12. Statistical Tests - Qualitative
  13. 13. Type of Data Dictates Type of Analysis - Quantitative
  14. 14. Data Collection Techniques Use available information Observation Interviews Questionnaires Focus group discussion
  15. 15. Using Available Information Existing Records • Hospital records - case notes • National registry of births & deaths • Census data • Data from other surveys
  16. 16. Disadvantages of using existing records Incomplete records Cause of death may not be verified by a physician/MD Missing vital information Difficult to decipher May not be representative of the target group - only severe cases go to hosital
  17. 17. Disadvantages of using existing records Delayed publication - obsolete data Different method of data recording between institutions, states, countries, making comparison & pooling of data incompatible Comparisons across time difficult due to difference in classification, diagnostic tools etc
  18. 18. Advantages of using existing records Cheap convenient in some situations, it is the only data source i.e. accidents & suicides
  19. 19. Observation Involves systematically selecting, watching & recording behaviour and characteristics of living beings, objects or phenomena Done using defined scales Participant observation e.g. PEF and asthma symptom diary Non-participant observation e.g. cholesterol levels
  20. 20. Interviews Oral questioning of respondents either individually or as a group. Can be done loosely or highly structured using a questionnaire
  21. 21. Administering Written Questionnaires Self-administered via mail by gathering them in one place and getting them to fill it up hand-delivering and collecting them later Large non-response can distort results
  22. 22. Questionnaires Influenced by education & attitude of respondent esp. for self-administered Interviewers need to be trained open ended vs close ended the need for pre-testing or pilot study
  23. 23. Issues at stake Content validity Structural validity Criterion validity
  24. 24. Content Validity
  25. 25. Construct Validity
  26. 26. Criterion Validity
  27. 27. Focus group discussion Selecting relevant parties to the research questions at hand and discussing with them in focus groups examples in your own field of interest?
  28. 28. Source of biases during data collection Defective instruments • close ended questions with poor choice of options • open ended questions with no guidelines • vaguely-phrased questions • illogical sequences of questions • weighing scales that are not standardised
  29. 29. Source of biases during data collection Observer bias • reporting of radiographs Effect of interview on respondent Attitude of respondent • cough may be ignored by a smoker • stigmatised diseases may not be disclosed
  30. 30. Plan for data collection Permission to proceed Logistics - who will collect what, when and with what resources Quality control
  31. 31. Quality of Data How well do the variables designed for the study represent the phenomena of interest? E.g. How well does FBS represent control of diabetes
  32. 32. Accuracy & Reliability Accuracy - the degree which a measurement actually measures the measures the characteristic it is supposed to measure Reliability is the consistency of replicate measures
  33. 33. Reliability
  34. 34. Reliability & Accuracy
  35. 35. Accuracy & Reliability Both are reduced by random error and systematic error from the same sources of variability; • the data collectors • the respondents • the instrument
  36. 36. Strategies to enhance accuracy & reliability Standardise procedures and measurement methods training & certifying the data collectors Repetition Blinding
  37. 37. Data handling Check the data gathered storing of data - backup, backup & backup some more!
  38. 38. Data Management Data processing • Categorising • Coding • Data entry • Verification/validation
  39. 39. Labels & Coding
  40. 40. Variable Labels• Unique• Not more than 8 characters• Consists of letters and numbers only• Begins with a letter instead of a number.• Try to give a label that means something
  41. 41. Coding• Determine the coding to be used for each variable.• For qualitative variables, it is recommended to use numerical-codes to represent the groups; eg. 1 = male and 2 = female, this will also simplify the data entry process. The “danger” of using string/text is that a small “male” is different from a big “Male”,• see Table I.
  42. 42. Coding for Dichotomus Variable It is advisable to use 1=present, 0=absent. Or 1=higher risk, 0=lower risk
  43. 43. Coding for Missing Value @ blank responses Usually required only for qualitative variables Conventionally coded using a value that is not part of a valid response. For example; • Gender; M=1, F=2, MV=9 • Ethnic in East Malaysia; Codes 1 till 14 for races, MV=99
  44. 44. Advantage of Coding Reduce time for “data entry”. Make analysis possible e.g. SPSS wont analyse string responses of more than 8 characters Need a proper coding manual How to define variables and coding for application such as SPSS and Excel are available at the dept website http://161.142.92.104/spss/ http://161.142.92.104/excel/
  45. 45. Data Entry
  46. 46. “Data Entry”
  47. 47. Data Entry Independent operator verification Random check of data entered against the original <5% error by convention Some checks are built-in by the software i.e. EpiInfo
  48. 48. Thank you!Gracias!

×