• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data collection & management

Data collection & management






Total Views
Views on SlideShare
Embed Views



4 Embeds 116

http://drtamil.wordpress.com 97
http://ifolio.ukm.my 7
http://drtamil.me 7
https://dmacc.blackboard.com 5


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Data collection & management Data collection & management Presentation Transcript

    • Medicine & Society IICollecting & Managing Data Dr Azmi Mohd Tamil Dept. of Community Health, Faculty of Medicine, UKM notes partially based on a lecture by Assc. Prof. Dr. Roslina Abd. Manap
    • Sampling Choosing a relatively small subset such that it can adequately represent the entire spectrum of population subjects Aim to extrapolate results back to a substantially larger population to save time, money, efficiency and safety.
    • SAMPLINGPROBABILITY NON- SAMPLING equal chance of being PROBABILITY selected SAMPLING • simple random, • convenience, • systematic, • • quota, stratified, • multistage, • purposive. • cluster
    • SAMPLING & TYPE OF POPULATION Selection representative of population ? sampling methods - simple random sampling (may not be practical in national study) - stratified random sampling (in heterogenous pop./stratum) - multistage sampling (national-state-district-sub district-village) - cluster sampling
    • Data Collection Data collection begins after deciding on design of study and the sampling strategy
    • Data Collection Sample subjects are identified and the required individual information is obtained in an item-wise and structured manner.
    • Data Collection Information is collected on certain characteristics, attributes and the qualities of interest from the samples These data may be quantitative or qualitative in nature.
    • Types of Variables Qualitative - categorised based on characteristics which differentiate it e.g. ethnic - Malay, Chinese, Indian etc. Qualitative variables can be classed into nominal & ordinal. Quantitative - numerical values collected by observation, by measurement or by counting. Can either be discrete or continuous.
    • Variable Classification QuantitativeQualitative  discrete - from Nominal - no rank counting ie no of nor specific order children/wives e.g. ethnic; M, C, I &  continuous - can be in O. Ordinal - has fractions, from measurement e.g. rank/order between blood pressure, categories but the haemoglobin level. difference cannot be measured.
    • Types of DataTable 1.1 Exam ples of types of data QuantitativeContinuous DiscreteBlood pressure, height, w eight, age Number of children Number of attacks of asthma per w eek CategoricalOrdinal (Ordered categories) Nom inal (Unordered categories)Grade of breast cancer Sex (male/female)Better, same, w orse Alive or deadDisagree, neutral, agree Blood group O, A, B, ABhttp://www.bmj.com/collections/statsbk/
    • SO WHAT!So what’s the big deal about data types?
    • Statistical Tests - Qualitative
    • Type of Data Dictates Type of Analysis - Quantitative
    • Data Collection Techniques Use available information Observation Interviews Questionnaires Focus group discussion
    • Using Available Information Existing Records • Hospital records - case notes • National registry of births & deaths • Census data • Data from other surveys
    • Disadvantages of using existing records Incomplete records Cause of death may not be verified by a physician/MD Missing vital information Difficult to decipher May not be representative of the target group - only severe cases go to hosital
    • Disadvantages of using existing records Delayed publication - obsolete data Different method of data recording between institutions, states, countries, making comparison & pooling of data incompatible Comparisons across time difficult due to difference in classification, diagnostic tools etc
    • Advantages of using existing records Cheap convenient in some situations, it is the only data source i.e. accidents & suicides
    • Observation Involves systematically selecting, watching & recording behaviour and characteristics of living beings, objects or phenomena Done using defined scales Participant observation e.g. PEF and asthma symptom diary Non-participant observation e.g. cholesterol levels
    • Interviews Oral questioning of respondents either individually or as a group. Can be done loosely or highly structured using a questionnaire
    • Administering Written Questionnaires Self-administered via mail by gathering them in one place and getting them to fill it up hand-delivering and collecting them later Large non-response can distort results
    • Questionnaires Influenced by education & attitude of respondent esp. for self-administered Interviewers need to be trained open ended vs close ended the need for pre-testing or pilot study
    • Issues at stake Content validity Structural validity Criterion validity
    • Content Validity
    • Construct Validity
    • Criterion Validity
    • Focus group discussion Selecting relevant parties to the research questions at hand and discussing with them in focus groups examples in your own field of interest?
    • Source of biases during data collection Defective instruments • close ended questions with poor choice of options • open ended questions with no guidelines • vaguely-phrased questions • illogical sequences of questions • weighing scales that are not standardised
    • Source of biases during data collection Observer bias • reporting of radiographs Effect of interview on respondent Attitude of respondent • cough may be ignored by a smoker • stigmatised diseases may not be disclosed
    • Plan for data collection Permission to proceed Logistics - who will collect what, when and with what resources Quality control
    • Quality of Data How well do the variables designed for the study represent the phenomena of interest? E.g. How well does FBS represent control of diabetes
    • Accuracy & Reliability Accuracy - the degree which a measurement actually measures the measures the characteristic it is supposed to measure Reliability is the consistency of replicate measures
    • Reliability
    • Reliability & Accuracy
    • Accuracy & Reliability Both are reduced by random error and systematic error from the same sources of variability; • the data collectors • the respondents • the instrument
    • Strategies to enhance accuracy & reliability Standardise procedures and measurement methods training & certifying the data collectors Repetition Blinding
    • Data handling Check the data gathered storing of data - backup, backup & backup some more!
    • Data Management Data processing • Categorising • Coding • Data entry • Verification/validation
    • Labels & Coding
    • Variable Labels• Unique• Not more than 8 characters• Consists of letters and numbers only• Begins with a letter instead of a number.• Try to give a label that means something
    • Coding• Determine the coding to be used for each variable.• For qualitative variables, it is recommended to use numerical-codes to represent the groups; eg. 1 = male and 2 = female, this will also simplify the data entry process. The “danger” of using string/text is that a small “male” is different from a big “Male”,• see Table I.
    • Coding for Dichotomus Variable It is advisable to use 1=present, 0=absent. Or 1=higher risk, 0=lower risk
    • Coding for Missing Value @ blank responses Usually required only for qualitative variables Conventionally coded using a value that is not part of a valid response. For example; • Gender; M=1, F=2, MV=9 • Ethnic in East Malaysia; Codes 1 till 14 for races, MV=99
    • Advantage of Coding Reduce time for “data entry”. Make analysis possible e.g. SPSS wont analyse string responses of more than 8 characters Need a proper coding manual How to define variables and coding for application such as SPSS and Excel are available at the dept website
    • Data Entry
    • “Data Entry”
    • Data Entry Independent operator verification Random check of data entered against the original <5% error by convention Some checks are built-in by the software i.e. EpiInfo
    • Thank you!Gracias!