Data collection & management

2,260
-1

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,260
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data collection & management

  1. 1. Medicine & Society IICollecting & Managing Data Dr Azmi Mohd Tamil Dept. of Community Health, Faculty of Medicine, UKM notes partially based on a lecture by Assc. Prof. Dr. Roslina Abd. Manap
  2. 2. Sampling Choosing a relatively small subset such that it can adequately represent the entire spectrum of population subjects Aim to extrapolate results back to a substantially larger population to save time, money, efficiency and safety.
  3. 3. SAMPLINGPROBABILITY NON- SAMPLING equal chance of being PROBABILITY selected SAMPLING • simple random, • convenience, • systematic, • • quota, stratified, • multistage, • purposive. • cluster
  4. 4. SAMPLING & TYPE OF POPULATION Selection representative of population ? sampling methods - simple random sampling (may not be practical in national study) - stratified random sampling (in heterogenous pop./stratum) - multistage sampling (national-state-district-sub district-village) - cluster sampling
  5. 5. Data Collection Data collection begins after deciding on design of study and the sampling strategy
  6. 6. Data Collection Sample subjects are identified and the required individual information is obtained in an item-wise and structured manner.
  7. 7. Data Collection Information is collected on certain characteristics, attributes and the qualities of interest from the samples These data may be quantitative or qualitative in nature.
  8. 8. Types of Variables Qualitative - categorised based on characteristics which differentiate it e.g. ethnic - Malay, Chinese, Indian etc. Qualitative variables can be classed into nominal & ordinal. Quantitative - numerical values collected by observation, by measurement or by counting. Can either be discrete or continuous.
  9. 9. Variable Classification QuantitativeQualitative  discrete - from Nominal - no rank counting ie no of nor specific order children/wives e.g. ethnic; M, C, I &  continuous - can be in O. Ordinal - has fractions, from measurement e.g. rank/order between blood pressure, categories but the haemoglobin level. difference cannot be measured.
  10. 10. Types of DataTable 1.1 Exam ples of types of data QuantitativeContinuous DiscreteBlood pressure, height, w eight, age Number of children Number of attacks of asthma per w eek CategoricalOrdinal (Ordered categories) Nom inal (Unordered categories)Grade of breast cancer Sex (male/female)Better, same, w orse Alive or deadDisagree, neutral, agree Blood group O, A, B, ABhttp://www.bmj.com/collections/statsbk/
  11. 11. SO WHAT!So what’s the big deal about data types?
  12. 12. Statistical Tests - Qualitative
  13. 13. Type of Data Dictates Type of Analysis - Quantitative
  14. 14. Data Collection Techniques Use available information Observation Interviews Questionnaires Focus group discussion
  15. 15. Using Available Information Existing Records • Hospital records - case notes • National registry of births & deaths • Census data • Data from other surveys
  16. 16. Disadvantages of using existing records Incomplete records Cause of death may not be verified by a physician/MD Missing vital information Difficult to decipher May not be representative of the target group - only severe cases go to hosital
  17. 17. Disadvantages of using existing records Delayed publication - obsolete data Different method of data recording between institutions, states, countries, making comparison & pooling of data incompatible Comparisons across time difficult due to difference in classification, diagnostic tools etc
  18. 18. Advantages of using existing records Cheap convenient in some situations, it is the only data source i.e. accidents & suicides
  19. 19. Observation Involves systematically selecting, watching & recording behaviour and characteristics of living beings, objects or phenomena Done using defined scales Participant observation e.g. PEF and asthma symptom diary Non-participant observation e.g. cholesterol levels
  20. 20. Interviews Oral questioning of respondents either individually or as a group. Can be done loosely or highly structured using a questionnaire
  21. 21. Administering Written Questionnaires Self-administered via mail by gathering them in one place and getting them to fill it up hand-delivering and collecting them later Large non-response can distort results
  22. 22. Questionnaires Influenced by education & attitude of respondent esp. for self-administered Interviewers need to be trained open ended vs close ended the need for pre-testing or pilot study
  23. 23. Issues at stake Content validity Structural validity Criterion validity
  24. 24. Content Validity
  25. 25. Construct Validity
  26. 26. Criterion Validity
  27. 27. Focus group discussion Selecting relevant parties to the research questions at hand and discussing with them in focus groups examples in your own field of interest?
  28. 28. Source of biases during data collection Defective instruments • close ended questions with poor choice of options • open ended questions with no guidelines • vaguely-phrased questions • illogical sequences of questions • weighing scales that are not standardised
  29. 29. Source of biases during data collection Observer bias • reporting of radiographs Effect of interview on respondent Attitude of respondent • cough may be ignored by a smoker • stigmatised diseases may not be disclosed
  30. 30. Plan for data collection Permission to proceed Logistics - who will collect what, when and with what resources Quality control
  31. 31. Quality of Data How well do the variables designed for the study represent the phenomena of interest? E.g. How well does FBS represent control of diabetes
  32. 32. Accuracy & Reliability Accuracy - the degree which a measurement actually measures the measures the characteristic it is supposed to measure Reliability is the consistency of replicate measures
  33. 33. Reliability
  34. 34. Reliability & Accuracy
  35. 35. Accuracy & Reliability Both are reduced by random error and systematic error from the same sources of variability; • the data collectors • the respondents • the instrument
  36. 36. Strategies to enhance accuracy & reliability Standardise procedures and measurement methods training & certifying the data collectors Repetition Blinding
  37. 37. Data handling Check the data gathered storing of data - backup, backup & backup some more!
  38. 38. Data Management Data processing • Categorising • Coding • Data entry • Verification/validation
  39. 39. Labels & Coding
  40. 40. Variable Labels• Unique• Not more than 8 characters• Consists of letters and numbers only• Begins with a letter instead of a number.• Try to give a label that means something
  41. 41. Coding• Determine the coding to be used for each variable.• For qualitative variables, it is recommended to use numerical-codes to represent the groups; eg. 1 = male and 2 = female, this will also simplify the data entry process. The “danger” of using string/text is that a small “male” is different from a big “Male”,• see Table I.
  42. 42. Coding for Dichotomus Variable It is advisable to use 1=present, 0=absent. Or 1=higher risk, 0=lower risk
  43. 43. Coding for Missing Value @ blank responses Usually required only for qualitative variables Conventionally coded using a value that is not part of a valid response. For example; • Gender; M=1, F=2, MV=9 • Ethnic in East Malaysia; Codes 1 till 14 for races, MV=99
  44. 44. Advantage of Coding Reduce time for “data entry”. Make analysis possible e.g. SPSS wont analyse string responses of more than 8 characters Need a proper coding manual How to define variables and coding for application such as SPSS and Excel are available at the dept website http://161.142.92.104/spss/ http://161.142.92.104/excel/
  45. 45. Data Entry
  46. 46. “Data Entry”
  47. 47. Data Entry Independent operator verification Random check of data entered against the original <5% error by convention Some checks are built-in by the software i.e. EpiInfo
  48. 48. Thank you!Gracias!

×