Your SlideShare is downloading. ×
India Census Data Processing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

India Census Data Processing

3,828
views

Published on

Published in: Technology

1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
3,828
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
73
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The areas which can be covered through a kiosk is to be discussed in this slide. Kiosks should provide information on all the areas mentioned in the slide and elaborated in the subsequent slides.
  • Transcript

    • 1. DATA CAPTURE IN CENSUS OFDATA CAPTURE IN CENSUS OF INDIAINDIA Registrar General & CensusRegistrar General & Census Commissioner, IndiaCommissioner, India Visit Our Website atVisit Our Website at www.censusindia.gov.inwww.censusindia.gov.in
    • 2. FEATURES OF INDIAN CENSUS • India – a large country with more than a billionIndia – a large country with more than a billion population Censuses is then one of the world largestpopulation Censuses is then one of the world largest administrative and statistical exerciseadministrative and statistical exercise • Diversity in languages – Schedules filled in 16Diversity in languages – Schedules filled in 16 languageslanguages • 2 million enumerators deployed in 2001 Census – likely2 million enumerators deployed in 2001 Census – likely to increase further in 2011 census.to increase further in 2011 census.
    • 3. FEATURES OF INDIAN CENSUS (Contd..) • Census which is conducted using ‘canvasser’ method is inCensus which is conducted using ‘canvasser’ method is in two phases:two phases:  House-listingHouse-listing  Population EnumerationPopulation Enumeration • Census Organization has experimented with new ITCensus Organization has experimented with new IT innovations since the beginninginnovations since the beginning • Technology is required particularly for dataTechnology is required particularly for data capture/processing – mainly due to large volume and forcapture/processing – mainly due to large volume and for speedier tabulation & release of Census resultsspeedier tabulation & release of Census results
    • 4. MODE FOR DATA CAPTURE & PROCESSINGMODE FOR DATA CAPTURE & PROCESSING SINCE 1961SINCE 1961 Census 1961 1971 1981 1991 2001 Population 43.9 Million 54.8 Million 68.3 Million 84.6 Million 102.8 Million Collection % 100 100 100 100 100 Capture % 5 15 25 45 100 Mode Hand Punch Key Punch Data Entry Data Entry Scanning/I CR Time taken 8-9Years 8-9Years 8-9 Years 7-8 Years 3-5 Years
    • 5. DATA CAPTURE & PROCESSING IN 2001 CENSUS Important ConsiderationsImportant Considerations • Conventional data entry not suitable for large volume (228Conventional data entry not suitable for large volume (228 million schedules for 102.8 million population) of data.million schedules for 102.8 million population) of data. • Availability of advanced IT tools and techniques.Availability of advanced IT tools and techniques. • Capture and process all the collected information.Capture and process all the collected information. • Complexities in data entry due to multiplicity ofComplexities in data entry due to multiplicity of languages/responses and size (A3) Census Schedule.languages/responses and size (A3) Census Schedule.
    • 6. DATA CAPTURE & PROCESSING IN 2001 CENSUS Important Considerations (Contd..)Important Considerations (Contd..) • Retrieval of original documents for correction labor –Retrieval of original documents for correction labor – intensive.intensive. • Reduce the time span from 5-8 years to 3-5 years.Reduce the time span from 5-8 years to 3-5 years. • Compact , reliable and efficient archival system.Compact , reliable and efficient archival system. • Better workflow management.Better workflow management.
    • 7. DATA CAPTURE & PROCESSING IN 2001 CENSUS Selection and Consequent ActionSelection and Consequent Action • Evaluation of various available technologiesEvaluation of various available technologies (OMR/OCR/ICR).(OMR/OCR/ICR).  Trial run with NCS and DRS OMR.Trial run with NCS and DRS OMR.  Trial Run with various ICR vendors.Trial Run with various ICR vendors. • Opted for ICR technology(TIS eFlow)Opted for ICR technology(TIS eFlow) • IT Infrastructure in all the 15 Data Centers upgraded toIT Infrastructure in all the 15 Data Centers upgraded to meet the new requirement.meet the new requirement.
    • 8. DATA CAPTURE & PROCESSING IN 2001 CENSUS Model Conceived for implementationModel Conceived for implementation • Services of System Integrator hired to guide and assistServices of System Integrator hired to guide and assist in the implementation of ICR technology.in the implementation of ICR technology. • An unique model for OutsourcingAn unique model for Outsourcing  SI to work in our premises for betterSI to work in our premises for better  communication and controlcommunication and control  maintain data security, safety andmaintain data security, safety and confidentialityconfidentiality  Capacity building (Training and guiding to IT staff)Capacity building (Training and guiding to IT staff)  Production Linked payment to SI
    • 9. DATA CAPTURE & PROCESSING IN 2001 CENSUS Work Flow of ORGI (TIS Eflow characteristic) Design data capture workflow Presents a graphical view of the system Monitors the processing and workflow in real time Enables to customize applications and add custom features
    • 10. DATA CAPTURE & PROCESSING IN 2001 CENSUS Work flow Modules Scan Portal, File Portal, Controller FormID, Manual FormID RC Processing [OCR/ICR] Tile, Completion, CAC & Exception Export
    • 11. DATA CAPTURE & PROCESSING IN 2001 CENSUS ORGI Workflow Stages ASCII FILE Prepare Batch Scanning Recognition Tiling Completion Exception Export/Archival Server
    • 12. Server Controller station Tiling & Completion stations Export station Scanning station Recognition stations Exception stations DATA CAPTURE & PROCESSING IN 2001 CENSUS LAN SETUP - ORGI DATA CENTERs Forms are fed thru SCANNER(S) batch by batch Field by field character images are automatically RECOGNISED Tile/Correction station - Un-recognised Characters are corrected by OPERATORS Supervisors Handle Exceptional cases referred by Operators Supervisor Export completed batches as ASCII file for further processing Supervisor Monitor the workflow & Balance the load at different stages of operation Form IMAGES stored in Network DISK
    • 13. DATA CAPTURE & PROCESSING IN 2001 CENSUS eFlow customizationeFlow customization • customization of Scanning software for Batching thecustomization of Scanning software for Batching the imagesimages • optimization of Batch Size for Network movementoptimization of Batch Size for Network movement of images and dataof images and data • Customization of workflow management to reduceCustomization of workflow management to reduce the workload on Manual Identification stationthe workload on Manual Identification station
    • 14. DATA CAPTURE & PROCESSING IN 2001 CENSUS eFlow customization (Contd..)eFlow customization (Contd..) • Development of new Management Information toolsDevelopment of new Management Information tools for operators and daily production status etcfor operators and daily production status etc • creation of JUSTICR.mdb to recognize the Indiancreation of JUSTICR.mdb to recognize the Indian enumerators writing patternsenumerators writing patterns • Creation and implementation of various static andCreation and implementation of various static and Dynamic Dictionaries for CACDynamic Dictionaries for CAC
    • 15. DATA CAPTURE & PROCESSING IN 2001 CENSUS Results Achieved • First time 100% data captured, processed and released within five year of Census • Auto Recognition Rate 90% & false positive < 2% • Considerable financial saving • Assimilation of IT skills internally in the organisation.
    • 16. DATA CAPTURE & PROCESSING IN 2001 CENSUS Results Achieved (Contd..) •Manual Coding was replaced by Computer Assisted Coding  Schedule Caste/ Schedule Tribe  Languages spoken, Education level  Migration particulars, NIC and NCO •Indigenous data capture for other projects  Economic Census  Sample Registration System  Verbal Autopsy
    • 17. DATA CAPTURE & PROCESSING IN 2001 CENSUS Difficulties ExperiencedDifficulties Experienced • Unable to use color drop-out at scanning stageUnable to use color drop-out at scanning stage • Difficult to handle bad images during scanning stages.Difficult to handle bad images during scanning stages. • Bad/Back Images due to variation in paper/print qualityBad/Back Images due to variation in paper/print quality • Over writing/use of whitener, grid line recognize as 1Over writing/use of whitener, grid line recognize as 1 • Limitation of recognizing Indian languages affected theLimitation of recognizing Indian languages affected the through putthrough put
    • 18. DATA CAPTURE & PROCESSING IN 2001 CENSUS Difficulties Experienced (Contd..)Difficulties Experienced (Contd..) • Operational Constraints in Manual IdentificationOperational Constraints in Manual Identification • No powerful tools for online Load balancing amongNo powerful tools for online Load balancing among various stages of eflowvarious stages of eflow • Lack of concurrent quality check at each stage of eflowLack of concurrent quality check at each stage of eflow • Lack of Auto coding features for textual responsesLack of Auto coding features for textual responses • Even Single image non recognition leads to redo wholeEven Single image non recognition leads to redo whole batchbatch
    • 19. LESSONS LEARNT FOR FUTURE • Outsourcing in controlled environment beneficial and cost-effective • Good quality of paper • ICR friendly Form Design • Use of Bar Code for better work flow and Inventory management • Good quality printing
    • 20. LESSONS LEARNT FOR FUTURE (Contd..) • Special training to enumerators for filling the forms • For CAC, use knowledge Based dictionaries to increase throughput • Use of concurrent quality check procedures on the line of USA and UK
    • 21. DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING Technology for 2011 CensusTechnology for 2011 Census • Continuation of ICR TechnologyContinuation of ICR Technology  International and national experience shows as onInternational and national experience shows as on date no better substitute for scanning & ICRdate no better substitute for scanning & ICR technologytechnology  Expertise and competence gained in using ICRExpertise and competence gained in using ICR technology available in the organizationtechnology available in the organization
    • 22. DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING Technology for 2011 Census (contd..)Technology for 2011 Census (contd..) • Use more efficient scanners having facility for imageUse more efficient scanners having facility for image enhancement, noise removal, color drop-out, betterenhancement, noise removal, color drop-out, better throughput and on-spot detection and correction (throughthroughput and on-spot detection and correction (through in-built software) of bad images to be used.in-built software) of bad images to be used. • Use of improved version of ICR software with betterUse of improved version of ICR software with better recognition and built-in enhanced workflow managementrecognition and built-in enhanced workflow management capability.capability. • Use new features in Auto/Computer Assisted Coding inUse new features in Auto/Computer Assisted Coding in ICR softwareICR software
    • 23. Thank you. Visit Our Website at www.censusindia.gov.in
    • 24. Steps involved in e-Flow ProcessSteps involved in e-Flow Process • Intelligent Character Recognition (ICR)Intelligent Character Recognition (ICR) Technology isTechnology is used to extract the handwritten/machine printed (typeset)used to extract the handwritten/machine printed (typeset) character(s) from the scanned images to generate thecharacter(s) from the scanned images to generate the computer processable data file. In brief, following steps arecomputer processable data file. In brief, following steps are involved in using ICR technology.involved in using ICR technology. • ScScanninganning:- Paper based forms are scanned to create bit map:- Paper based forms are scanned to create bit map image fileimage file • FiFile Portalle Portal::- It is an Image File Registration module in eflow::- It is an Image File Registration module in eflow as an input to next activity.as an input to next activity. • FoForm Identificationrm Identification:- Automatically identifies the Images of:- Automatically identifies the Images of various schedules based on the Empty Form Image (EFI)various schedules based on the Empty Form Image (EFI) template created during the designing stage.template created during the designing stage.
    • 25. Steps involved in e-Flow ProcessSteps involved in e-Flow Process • MaManual Identificationnual Identification: Unidentified forms due to bad images: Unidentified forms due to bad images are matched by the operator manually on computer with theare matched by the operator manually on computer with the help of EFIs .help of EFIs . • PrProcessing:ocessing: This module is heart and brain of the ICRThis module is heart and brain of the ICR technology. It automatically recognize the datatechnology. It automatically recognize the data (numerals/alpha) from the images with the help of various(numerals/alpha) from the images with the help of various engines (CGK, AEG,KADMOS,TISICR etc)engines (CGK, AEG,KADMOS,TISICR etc) • TiTile:le: This module displays the images of similar digit at oneThis module displays the images of similar digit at one place to identify any wrongly recognized character by systemplace to identify any wrongly recognized character by system for correction and thus, enhances the accuracy and quality offor correction and thus, enhances the accuracy and quality of data.data.
    • 26. STEPS INVOLVED IN eFLOW PROCESSSTEPS INVOLVED IN eFLOW PROCESS • CoCompletionmpletion:- Unrecognized or wrongly marked recognized:- Unrecognized or wrongly marked recognized characters in the Tiling will be presented for correctioncharacters in the Tiling will be presented for correction using images displayed simultaneously.using images displayed simultaneously. • ExcExceptioneption:- If any character image is not understood by:- If any character image is not understood by operator at completion station (module), that will beoperator at completion station (module), that will be corrected in Exception station by an officer competent tocorrected in Exception station by an officer competent to make decision.make decision. • ExExportport:- System exports the data generated in above steps:- System exports the data generated in above steps to server for further processing liketo server for further processing like editing/aggregation/tabulation etc.editing/aggregation/tabulation etc.
    • 27. eFLOW CONTROLLER
    • 28. e-FLOW WORKFLOW FOR ORGI
    • 29. EXAMPLE – BACK IMAGE
    • 30. EXAMPLE – IMPROPER GRID LINES
    • 31. EXAMPLE – USE OF WHITENER Casual writing pattern
    • 32. CAC Of MOTHER TONGUE
    • 33. CAC OF HIGHEST EDUCATION LEVEL ATTAINED
    • 34. CAC OF NATIONAL INDUSTRIAL CLASSIFICATION NIC
    • 35. HOUSEHOLD SCHEDULE IMAGE OF SIDE A
    • 36. HOUSEHOLD SCHEDULE IMAGE OF SIDE B
    • 37. FORM-ID STATION
    • 38. MANUAL-ID STATION
    • 39. IMAGE AFTER FORMOUT IN PROCESSING
    • 40. SEGMENTATION OF A FIELD IN PROCESSING
    • 41. VOTING IN PROCESSING 3 3 8 3 ICR 1 ICR 4ICR 3ICR 2 Majority = 3 Unanimous = ?
    • 42. FINAL RESULT IN PROCESSING
    • 43. TILING STATION
    • 44. COMPLETION STATION [Field mode display]
    • 45. EXCEPTION STATION Form Field Date Original Form Image Viewer Exception Area
    • 46. EXPORT STATION
    • 47. HOUSEHOLD SCHEDULE- SIDE AHOUSEHOLD SCHEDULE- SIDE A Mother Tongue & Other languages Name of SC/ST EducationReligion
    • 48. NCO HOUSEHOLD SCHEDULE- SIDE B NCO NIC Place of Birth & Last residence
    • 49. DATA CAPTURE & PROCESSING Selection of technologySelection of technology OMR/OCR / ICR in 2001OMR/OCR / ICR in 2001 Recognition of hand written descriptive entries in different languages is beyond the capabilities of the known ICR SW and hence a conscious decision was taken to go in for the recognition of Only Numeric Characters, leaving the rest to be handled thru Image enabled computer assisted coding (CAC) . Following key features were introduced in the data capture solution. Parameters for selecting the ICR Software Highest recognition rate and lowest percentage of false positive with customization and assured support & Training •Facility of organized workflow in LAN environment with centralized controls with Computer Assisted Coding facility. •In built quality enhancement tools to trap the wrongly recognized characters so as to facilitate corrective action. •Use of multiple engines with voting algorithm. Ability to incorporate validation rules to trap inconsistent entries/wrong recognition. Learning capabilities of engines.
    • 50. DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING • Parameters for selecting the scannerParameters for selecting the scanner – Speed to match with our volumeSpeed to match with our volume – Duty cycle (life and production tolerance)Duty cycle (life and production tolerance) – Must be duplex scanningMust be duplex scanning – Resolution minimum to 200dpiResolution minimum to 200dpi – Image enhancement facility like noise removing,Image enhancement facility like noise removing, skewing, cropping, contrastskewing, cropping, contrast – Hopper size and scanning path(U,J or flat belt)Hopper size and scanning path(U,J or flat belt) – Maintenance & Training servicesMaintenance & Training services
    • 51. DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING Selection of Scanner/Hardware/ICR softwareSelection of Scanner/Hardware/ICR software • High level technical committee has evaluated andHigh level technical committee has evaluated and selected the above items on the basis of demonstratedselected the above items on the basis of demonstrated capabilities of concerned items by various vendorscapabilities of concerned items by various vendors • As a result CMC was selected System Integrator, ACERAs a result CMC was selected System Integrator, ACER and HP for Computer Hardware with OS Window NT 4.0and HP for Computer Hardware with OS Window NT 4.0 • Kodak Module 7520 Scanner, TIS for ICR softwareKodak Module 7520 Scanner, TIS for ICR software • National Informatics Centre has done LAN cabling andNational Informatics Centre has done LAN cabling and inspection of Hardwareinspection of Hardware • Up gradation of 15 Data CentersUp gradation of 15 Data Centers
    • 52. SETUP AT D.P. DIVISION (HQ) HARDWARE Server: (P-III, 800 MHz, 512 MB, 6*36 GB HDD, CD & 1.44 MB Floppy Drive) 40/80 GB DLT Drives 100 MB Zip Drives CD Writer Local Area Network Intelligent Workstations (P- III) 800MHz, 128 MB, 9GB HDD, CD & 1.4 MB Floppy Drive Laser & Line Matrix Printer SOFTWARE Operating Systems: Windows 98, Windows NT Latest Software Packages: IMPS, MS-Office, MS Visual Studio, MS SQL Server, ISM Publisher (Hindi, English), Adobe Publishing Collection
    • 53. SETUP AT D.D.E. CENTRES 15 Locations (State Capitals) HARDWARE High Speed Scanner – 24 (Nos.) Server (45 No.): (P-III, 800 MHz, 512 MB, 6*36 GB HDD, CD & 1.44 MB Floppy Drive) 40/80 GB DLT Drives 100 MB Zip Drives, CD Writer Local Area Network 24 Workstation with each Server Intelligent Workstations (P-III) 800MHz, 128 MB, 9GB HDD, Laser & Line Matrix Printer SOFTWARE Operating Systems: Windows NT, Windows 98, Latest Software Packages: E-FLOW, MS-OFFICE, Software Package for Computer Assisted Coding
    • 54. SNAPSHOTS OF HARDWARE RESOURCESSNAPSHOTS OF HARDWARE RESOURCES Slno Location FilePortal FormID Processing Export MERGE Controller Sub-total Scan RC ManualID Exception Sub-Total Tile Completion Subtotal TotalPCs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 17 1 Ahmedabad 1 3 6 1 1 1 13 1 3 2 5 11 7 26 33 57 2 Bangalore 1 3 6 1 1 1 13 1 3 2 6 12 7 27 34 59 3 Bhopal eflow1 1 2 3 1 1 1 9 1 1 1 3 6 3 13 16 31 eflow2 1 2 3 1 1 1 9 1 1 1 3 6 3 13 16 31 eflow3 1 2 3 1 1 1 9 1 1 3 5 3 13 16 30 4 Bhubaneswar 1 2 4 1 1 1 10 1 3 1 4 9 5 19 24 43 5 Chandigarh eflow1 1 1 4 1 1 1 9 1 3 1 3 8 4 15 19 36 eflow2 1 1 4 1 1 1 9 1 3 1 3 8 4 15 19 36 6 Chennai eflow1 1 2 4 1 1 1 10 1 3 1 3 8 4 15 19 37 eflow2 1 2 4 1 1 1 10 1 3 1 3 8 4 16 20 38 7 Delhi eflow1 1 2 4 1 1 1 10 1 3 1 3 8 4 15 19 37 eflow2 1 2 4 1 1 1 10 1 3 1 3 8 4 15 19 37 eflow3 1 1 3 1 1 1 8 3 1 3 7 4 13 17 32 eflow4 8 Guwahati 1 2 5 1 1 1 11 1 3 1 4 9 6 20 26 46 Un-manned PC Operators PCSupervisory staff PC Distribution of PCs for various stages of Form Processing using e-FLOW - HHOLD PROJECT
    • 55. SNAPSHOTS OF HARDWARE RESOURCESSNAPSHOTS OF HARDWARE RESOURCESSlno Location FilePortal FormID Processing Export MERGE Controller Sub-total Scan RC ManualID Exception Sub-Total Tile Completion Subtotal TotalPCs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 17 9 Hyderabad eflow1 1 2 5 1 1 1 11 1 3 1 4 9 5 19 24 44 eflow2 1 2 4 1 1 1 10 1 3 1 4 9 5 19 24 43 10 Jaipur 1 3 7 1 1 1 14 1 3 2 6 12 7 28 35 61 11 Kolkatta eflow1 1 1 3 1 1 1 8 1 3 1 3 8 4 14 18 34 eflow2 1 1 3 1 1 1 8 1 3 1 3 8 4 13 17 33 12 Lucknow eflow1 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42 eflow2 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42 eflow3 1 2 5 1 1 1 11 3 1 4 8 5 19 24 43 13 Mumbai eflow1 1 2 3 1 1 1 9 1 3 1 3 8 4 15 19 36 eflow2 1 2 3 1 1 1 9 1 3 1 3 8 4 15 19 36 eflow3 1 2 4 1 1 1 10 3 1 3 7 4 15 19 36 14 Patna eflow1 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42 eflow2 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42 15 Trivandrum 1 2 4 1 1 1 10 1 3 1 3 8 4 16 20 38 Total 28 54 114 28 28 28 280 24 78 31 101 234 128 480 608 1122 Distribution of PCs for various stages of Form Processing using e-FLOW - HHOLD PROJECT Un-manned PC Supervisory staff PC Operators PC
    • 56. DATA CAPTURE & PROCESSING Role of the Integrator • Supply, Installation and On-site Maintenance of SCANNERS. • Supply, Installation of Form Processing Software. • Manage LAN and load balancing from one stage to another. • Provide Software Core-Team centrally at ORGI HQ. • Impart operational training to the staff at each location. • Provide Software Personnel at each site • Provide scanner operators and carry out Scanning operations • Achieve > 90% recognition rate and < 2% false positive