sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1,691 views

Published on

Presentation made at OpenDataCamp in Bangalore (24th March 2012) on the organisation of unit-level data published by National Sample Survey Organisation, Govt. of India.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,691
On SlideShare
0
From Embeds
0
Number of Embeds
336
Actions
Shares
0
Downloads
54
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

sumandro_anatomy_of_nsso_data_opendatacamp_20120324

  1. 1. AnatomyofNSSO DataSumandro Chattapadhyay@ajantriks@ajantriks.net
  2. 2. 1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
  3. 3. 1. Pre-History1862 British Administration constituted the Statistical Committee for preparation of forms for primary data collection, followed by the publication of the first Statistical Abstract of British India (1840-1865)1881 First Decennial Population Census begins1914 Directorate of Statistics was established in Calcutta in 1914 that later became the Directorate of Commercial Intelligence and Statistics, which was entrusted with the compilation of colonial trade statistics1916 Indian Industrial Commission1925 Economic Enquiry Committee1939 Wholesale Price Index collection and calculation begins1947 P. C. Mahalanobis was appointed the Honorary Statistical Advisor1949 The Central Statistical Unit was established1951 Central Statistical Organization and the Department of Statistics are established. They continue to be the major organisations for collection of national-level data
  4. 4. 1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
  5. 5. 2. GlossaryRound: Each round of data collection by NSSO, usually of annual durationSchedule: Each thematic focus for data collection, multiple Schedulesper RoundThick Round: Major data collection rounds repeated every 5 years(hence called quinquennial rounds)Thin Round: Minor data collection roundsState-Region: Usually a cluster of three or more districts in a stateFixed-Width File: Fixed-width text files are data files in text format specifiedby fixed column widths, pad character and left/right alignment..do File: A Stata file format. Collection of Stata commands..dta File: A Stata file format for data files, similar to Excel, readable by R.smcl File: A Stata file format for log files, automatically records the Statacommands and results
  6. 6. 1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
  7. 7. 3. SchedulesMain themes for thick rounds / quinquennial surveysConsumer expenditureEmployment and UnemploymentDebt and InvestmentManufacturing Enterprises (Organised and Unorganised)Main themes for thin roundsParticipation and Expenditure in EducationParticulars of Slum and Housing ConditionMorbidity and HealthcareSituation Assessment Survey of FarmersLand and Livestock Holding
  8. 8. 1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
  9. 9. 4. Organisation of DataOrganisation of Raw Data- The fixed-width file (.txt)- The binary coding of informationThe Supporting Files- The ‘schedule’ file – survey questionnaire- The ‘layout’ file – how the information is organised in data files- The ‘readme’ file – how different data sets are organised- The state and district codesLevel- Coding information about single entity in multiple rows
  10. 10. 4. Organisation of DataRaw Data12121212121212121212232323232323232323343434343434343434LayoutColumn 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  11. 11. 4. Organisation of DataRaw Data12121212121212121212232323232323232323343434343434343434LayoutColumn 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  12. 12. 4. Organisation of DataRaw Data12121212121212121212232323232323232323343434343434343434LayoutColumn 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  13. 13. 4. Organisation of DataRaw Data12121212121212121212232323232323232323343434343434343434LayoutColumn 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  14. 14. 4. Organisation of DataRaw Data with Levels120112121212121212121212021212121212121212122301232323232323232323LayoutColumn 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  15. 15. 4. Organisation of DataRaw Data with Levels120112121212121212121212021212121212121212122301232323232323232323LayoutColumn 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  16. 16. 4. Organisation of DataRaw Data with Levels120112121212121212121212021212121212121212122301232323232323232323LayoutColumn 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  17. 17. 4. Organisation of DataRaw Data with Levels120112121212121212121212021212121212121212122301232323232323232323LayoutColumn 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  18. 18. 4. Organisation of DataRaw Data with Levels120112121212121212121212021212121212121212122301232323232323232323LayoutColumn 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…ScheduleQ.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person? [12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
  19. 19. 1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
  20. 20. 5. The ExtractionConverting NSSO raw data to tabular form (Comma Separated) using Stata- The .do file: Set of Stata commands for extraction- The ‘infix’ command: Mapping variables to data columns- The ‘var’ command: Labeling the variables- The levels: Multiple data rows for single entity- The .dta file: The Stata spreadsheet format- The .smcl file: The Stata commands and results log file
  21. 21. 1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
  22. 22. NSSO – Raw Data
  23. 23. NSSO – Extracted Data
  24. 24. Sumandro Chattapadhyay@ajantriks@ajantriks.net

×