Your SlideShare is downloading. ×
0
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Topic 5 quality datafile_management
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Topic 5 quality datafile_management

301

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
301
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD)Jawaharlal Nehru University (JNU) New Delhi India r.srinivasulu@gmail.com
  • 2. Objective of the session To understand Data File Management, Quality checking a dataset & missing values through software packages
  • 3. 1. What are the procedure oneshould follow before proceedingfor statistical analysis through asoftware?2. How do we check quality ofdata?3. How do we organize thedataset through a software?
  • 4. Data sources  International Food Policy Research Institute (IFPRI) – 2006-07  Bangladesh Bureau of Statistics – Household Income and Expenditure Surveys (HIES) – 2004/2005  Bangladesh Demographic and Health Survey (BDHS) - 2007
  • 5. IFPRI Dataset Chronic Poverty Study (resurvey 3 studies) 1.Micronutrients Gender/Agricultural Technology (1996-97) – 5 Thanas 2. Food for Education/Cash for Education - (2000 (10 Thanas) & 2003 (8 Thanas)) 3. Microfinance (1994 – 5 Thanas) Institute involved: IFPRI, Chronic Poverty Research Center, Data Analysis and Technical Assistance
  • 6. In the 2006-07 resurvey, all thanas from the 1994, 1996-97 & 2003 rounds were resurveyed
  • 7. Micronutrients Gender/AgriculturalTechnology  Hereafter we refer MCG study also known as Agricultural Technology or Ag Tech  “A census of households was conducted in villages where the NGO had introduced the agricultural technology and comparable villages where NGO was operating, but where the new technologies had not yet been introduced”.
  • 8. There are two major type ofhouseholds selected from census 1. NGO – members adopting agricultural tech households 2. NGO members likely adopter households in villages where the technology was not yet introduced
  • 9. 330 Households 1304 HHs in the resurvey for AgrTechAgriTech introduced – AgriTech not introduced – “A” type villages “B” type villages 110 NGO Members LIKELY110 NGO Members adopter HHs adopter –“B” HHs “A” - HHs 55 Non adopter non-NGO Members & NGO members 55 Non LIKELY adopter non NGO UNLIKELY to adopt members & NGO members unlikely “C1” HHs to adopt “C2” HHs
  • 10. What are the procedure one should follow beforeproceeding for statistical analysis through asoftware? SPSS
  • 11. 1. Identify the data file format and convert them into relevant software (SPSS) data file format (*.sav)2. Make sure that COMPLETE variables and observations has been converted into SPSS Format3. Identify the characteristics of the variables for the analysis4. Save name of the file smaller size5. It is better to have no space in the file name6. Organize the data file at one place and folder7. When ever we work on data, please append the files with the previous programme file.
  • 12. How do we check quality of data?There are few things that needs to be checked before we proceed for any statistical analysis1. Missing values2. Wrong coding system3. Outliers4. Digits in the variables (specially for value term variables)5. Unique numbers of id for the observation6. Relevant variable characteristics i.e string, numberic etc
  • 13. SPSS has some good routines for detectingoutliers There is always the FREQUENCIES routine, of course. The PLOTS command can do scatterplots of 2 variables. The EXAMINE procedure includes an option for printing out the cases with the 5 lowest and 5 highest values. The REGRESSION command can print out scatterplots (particularly good is *ZRESID by *ZPRED, which is a plot of the standardized residuals by the standardized predicted values). In addition, the regression procedure will produce output on CASEWISE DIAGNOSTICS, which indicate which cases are extreme outliers.
  • 14. Detecting the problem Scatterplots, frequencies can reveal atypical cases Can also look for cases with very large residuals. Suspicious correlations sometimes indicate the presence of outliers.
  • 15. The difference between STATA &SPSS Probably the most critical difference between SPSS and STATA is that STATA includes additional routines (e.g. rreg, qreg) for addressing the problem of outliers, which we will discuss in future classes.

×