• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

India Census Data Processing

  • 3,702 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • waste
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
3,702
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
68
Comments
1
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The areas which can be covered through a kiosk is to be discussed in this slide. Kiosks should provide information on all the areas mentioned in the slide and elaborated in the subsequent slides.

Transcript

  • 1. DATA CAPTURE IN CENSUS OF INDIA Registrar General & Census Commissioner, India Visit Our Website at www.censusindia.gov.in
  • 2. FEATURES OF INDIAN CENSUS
    • India – a large country with more than a billion population Censuses is then one of the world largest administrative and statistical exercise
    • Diversity in languages – Schedules filled in 16 languages
    • 2 million enumerators deployed in 2001 Census – likely to increase further in 2011 census.
  • 3. FEATURES OF INDIAN CENSUS (Contd..)
    • Census which is conducted using ‘canvasser’ method is in two phases:
      • House-listing
      • Population Enumeration
    • Census Organization has experimented with new IT innovations since the beginning
    • Technology is required particularly for data capture/processing – mainly due to large volume and for speedier tabulation & release of Census results
  • 4. MODE FOR DATA CAPTURE & PROCESSING SINCE 1961 Census 1961 1971 1981 1991 2001 Population 43.9 Million 54.8 Million 68.3 Million 84.6 Million 102.8 Million Collection% 100 100 100 100 100 Capture % 5 15 25 45 100 Mode Hand Punch Key Punch Data Entry Data Entry Scanning/ICR Time taken 8-9Years 8-9Years 8-9 Years 7-8 Years 3-5 Years
  • 5. DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • Important Considerations
    • Conventional data entry not suitable for large volume (228 million schedules for 102.8 million population) of data.
    • Availability of advanced IT tools and techniques.
    • Capture and process all the collected information.
    • Complexities in data entry due to multiplicity of languages/responses and size (A3) Census Schedule.
  • 6. DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • Important Considerations (Contd..)
    • Retrieval of original documents for correction labor – intensive.
    • Reduce the time span from 5-8 years to 3-5 years.
    • Compact , reliable and efficient archival system.
    • Better workflow management.
  • 7. DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • Selection and Consequent Action
    • Evaluation of various available technologies (OMR/OCR/ICR).
        • Trial run with NCS and DRS OMR.
        • Trial Run with various ICR vendors.
    • Opted for ICR technology(TIS eFlow)
    • IT Infrastructure in all the 15 Data Centers upgraded to meet the new requirement.
  • 8. DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • Model Conceived for implementation
    • Services of System Integrator hired to guide and assist in the implementation of ICR technology.
    • An unique model for Outsourcing
        • SI to work in our premises for better
          • communication and control
          • maintain data security, safety and confidentiality
        • Capacity building (Training and guiding to IT staff)
        • Production Linked payment to SI
  • 9. DATA CAPTURE & PROCESSING IN 2001 CENSUS Work Flow of ORGI (TIS Eflow characteristic) Design data capture workflow Presents a graphical view of the system Monitors the processing and workflow in real time Enables to customize applications and add custom features
  • 10. DATA CAPTURE & PROCESSING IN 2001 CENSUS Work flow Modules Scan Portal, File Portal, Controller FormID, Manual FormID RC Processing [OCR/ICR] Tile, Completion, CAC & Exception Export
  • 11. DATA CAPTURE & PROCESSING IN 2001 CENSUS ORGI Workflow Stages ASCII FILE Prepare Batch Scanning Recognition Tiling Completion Exception Export / Archival Server
  • 12. Server Controller station Tiling & Completion stations Export station Scanning station Recognition stations Exception stations DATA CAPTURE & PROCESSING IN 2001 CENSUS L AN SETUP - ORGI DATA CENTERs Forms are fed thru SCANNER(S) batch by batch Field by field character images are automatically RECOGNISED Tile/Correction station - Un-recognised Characters are corrected by OPERATORS Supervisors Handle Exceptional cases referred by Operators Supervisor Export completed batches as ASCII file for further processing Supervisor Monitor the workflow & Balance the load at different stages of operation Form IMAGES stored in Network DISK
  • 13. DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • eFlow customization
      • customization of Scanning software for Batching the images
      • optimization of Batch Size for Network movement of images and data
      • Customization of workflow management to reduce the workload on Manual Identification station
  • 14. DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • eFlow customization (Contd..)
      • Development of new Management Information tools for operators and daily production status etc
      • creation of JUSTICR.mdb to recognize the Indian enumerators writing patterns
      • Creation and implementation of various static and Dynamic Dictionaries for CAC
  • 15.
    • DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • Results Achieved
      • First time 100% data captured, processed and released within five year of Census
      • Auto Recognition Rate 90% & false positive < 2%
      • Considerable financial saving
      • Assimilation of IT skills internally in the organisation.
  • 16.
    • DATA CAPTURE & PROCESSING IN 2001 CENSUS
    • Results Achieved (Contd..)
    • Manual Coding was replaced by Computer Assisted Coding
            • Schedule Caste/ Schedule Tribe
            • L anguages spoken, E ducation level
            • Migration particulars, N IC and NCO
    • Indigenous data capture for other projects
            • Economic Census
            • Sample Registration System
            • Verbal Autopsy
  • 17. DATA CAPTURE & PROCESSING IN 2001 CENSUS Difficulties Experienced
      • Unable to use color drop-out at scanning stage
      • Difficult to handle bad images during scanning stages.
      • Bad/Back Images due to variation in paper/print quality
      • Over writing/use of whitener, grid line recognize as 1
      • Limitation of recognizing Indian languages affected the through put
  • 18. DATA CAPTURE & PROCESSING IN 2001 CENSUS Difficulties Experienced (Contd..)
      • Operational Constraints in Manual Identification
      • No powerful tools for online Load balancing among various stages of eflow
      • Lack of concurrent quality check at each stage of eflow
      • Lack of Auto coding features for textual responses
      • Even Single image non recognition leads to redo whole batch
  • 19.
    • LESSONS LEARNT FOR FUTURE
      • Outsourcing in controlled environment beneficial and cost-effective
      • Good quality of paper
      • ICR friendly Form Design
      • Use of Bar Code for better work flow and Inventory management
      • Good quality printing
  • 20.
    • LESSONS LEARNT FOR FUTURE
    • (Contd..)
      • Special training to enumerators for filling the forms
      • For CAC, use knowledge Based dictionaries to increase throughput
      • Use of concurrent quality check procedures on the line of USA and UK
  • 21. DATA CAPTURE & PROCESSING Technology for 2011 Census
        • Continuation of ICR Technology
          • International and national experience shows as on date no better substitute for scanning & ICR technology
          • Expertise and competence gained in using ICR technology available in the organization
  • 22. DATA CAPTURE & PROCESSING Technology for 2011 Census (contd..)
        • Use more efficient scanners having facility for image enhancement, noise removal, color drop-out, better throughput and on-spot detection and correction (through in-built software) of bad images to be used.
        • Use of improved version of ICR software with better recognition and built-in enhanced workflow management capability.
        • Use new features in Auto/Computer Assisted Coding in ICR software
  • 23. Thank you. Visit Our Website at www.censusindia.gov.in
  • 24. Steps involved in e-Flow Process
    • Intelligent Character Recognition (ICR) Technology is used to extract the handwritten/machine printed (typeset) character(s) from the scanned images to generate the computer processable data file. In brief, following steps are involved in using ICR technology.
    • Sc anning :- Paper based forms are scanned to create bit map image file
    • Fi le Portal ::- It is an Image File Registration module in eflow as an input to next activity.
    • Fo rm Identification :- Automatically identifies the Images of various schedules based on the Empty Form Image (EFI) template created during the designing stage.
  • 25. Steps involved in e-Flow Process
    • Ma nual Identification : Unidentified forms due to bad images are matched by the operator manually on computer with the help of EFIs .
    • Pr ocessing: This module is heart and brain of the ICR technology. It automatically recognize the data (numerals/alpha) from the images with the help of various engines (CGK, AEG,KADMOS,TISICR etc)
    • Ti le: This module displays the images of similar digit at one place to identify any wrongly recognized character by system for correction and thus, enhances the accuracy and quality of data.
  • 26. STEPS INVOLVED IN eFLOW PROCESS
    • Co mpletion :- Unrecognized or wrongly marked recognized characters in the Tiling will be presented for correction using images displayed simultaneously.
    • Exc eption :- If any character image is not understood by operator at completion station (module), that will be corrected in Exception station by an officer competent to make decision.
    • Ex port :- System exports the data generated in above steps to server for further processing like editing/aggregation/tabulation etc.
  • 27. eFLOW CONTROLLER
  • 28. e-FLOW WORKFLOW FOR ORGI
  • 29. EXAMPLE – BACK IMAGE
  • 30. EXAMPLE – IMPROPER GRID LINES
  • 31. EXAMPLE – USE OF WHITENER Casual writing pattern
  • 32. CAC Of MOTHER TONGUE
  • 33. CAC OF HIGHEST EDUCATION LEVEL ATTAINED
  • 34. CAC OF NATIONAL INDUSTRIAL CLASSIFICATION NIC
  • 35. HOUSEHOLD SCHEDULE IMAGE OF SIDE A
  • 36. HOUSEHOLD SCHEDULE IMAGE OF SIDE B
  • 37. FORM-ID STATION
  • 38. MANUAL-ID STATION
  • 39. IMAGE AFTER FORMOUT IN PROCESSING
  • 40. SEGMENTATION OF A FIELD IN PROCESSING
  • 41. VOTING IN PROCESSING 3 3 8 3 ICR 1 ICR 4 ICR 3 ICR 2 Majority = 3 Unanimous = ?
  • 42. FINAL RESULT IN PROCESSING
  • 43. TILING STATION
  • 44. COMPLETION STATION [Field mode display]
  • 45. EXCEPTION STATION Form Field Date Original Form Image Viewer Exception Area
  • 46. EXPORT STATION
  • 47.
    • HOUSEHOLD SCHEDULE- SIDE A
    Mother Tongue & Other languages Name of SC/ST Education Religion
  • 48. NCO HOUSEHOLD SCHEDULE- SIDE B NCO NIC Place of Birth & Last residence
  • 49. DATA CAPTURE & PROCESSING Selection of technology OMR/OCR / ICR in 2001
    • Recognition of hand written descriptive entries in different languages is beyond the capabilities of the known ICR SW and hence a conscious decision was taken to go in for the recognition of Only Numeric Characters, leaving the rest to be handled thru Image enabled computer assisted coding (CAC) . Following key features were introduced in the data capture solution.
    • Parameters for selecting the ICR Software
    • Highest recognition rate and lowest percentage of false positive with customization and assured support & Training
    • Facility of organized workflow in LAN environment with centralized controls with Computer Assisted Coding facility.
    • In built quality enhancement tools to trap the wrongly recognized characters so as to facilitate corrective action.
    • U se of multiple engines with voting algorithm. Ability to incorporate validation rules to trap inconsistent entries/wrong recognition. Learning capabilities of engines.
  • 50. DATA CAPTURE & PROCESSING
    • Parameters for selecting the scanner
      • Speed to match with our volume
      • Duty cycle (life and production tolerance)
      • Must be duplex scanning
      • Resolution minimum to 200dpi
      • Image enhancement facility like noise removing, skewing, cropping, contrast
      • Hopper size and scanning path(U,J or flat belt)
      • Maintenance & Training services
  • 51. DATA CAPTURE & PROCESSING
    • Selection of Scanner/Hardware/ICR software
    • High level technical committee has evaluated and selected the above items on the basis of demonstrated capabilities of concerned items by various vendors
    • As a result CMC was selected System Integrator, ACER and HP for Computer Hardware with OS Window NT 4.0
    • Kodak Module 7520 Scanner, TIS for ICR software
    • National Informatics Centre has done LAN cabling and inspection of Hardware
    • Up gradation of 15 Data Centers
  • 52. SETUP AT D.P. DIVISION (HQ) HARDWARE Server: (P-III, 800 MHz, 512 MB, 6*36 GB HDD, CD & 1.44 MB Floppy Drive) 40/80 GB DLT Drives 100 MB Zip Drives CD Writer Local Area Network Intelligent Workstations (P-III) 800MHz, 128 MB, 9GB HDD, CD & 1.4 MB Floppy Drive Laser & Line Matrix Printer SOFTWARE Operating Systems: Windows 98, Windows NT Latest Software Packages: IMPS, MS-Office, MS Visual Studio, MS SQL Server, ISM Publisher (Hindi, English), Adobe Publishing Collection
  • 53. SETUP AT D.D.E. CENTRES 15 Locations (State Capitals) HARDWARE High Speed Scanner – 24 (Nos.) Server (45 No.): (P-III, 800 MHz, 512 MB, 6*36 GB HDD, CD & 1.44 MB Floppy Drive) 40/80 GB DLT Drives 100 MB Zip Drives, CD Writer Local Area Network 24 Workstation with each Server Intelligent Workstations (P-III) 800MHz, 128 MB, 9GB HDD, Laser & Line Matrix Printer SOFTWARE Operating Systems: Windows NT, Windows 98, Latest Software Packages: E-FLOW, MS-OFFICE, Software Package for Computer Assisted Coding
  • 54. SNAPSHOTS OF HARDWARE RESOURCES
  • 55. SNAPSHOTS OF HARDWARE RESOURCES
  • 56.
    • DATA CAPTURE & PROCESSING
    • Role of the Integrator
      • Supply, Installation and On-site Maintenance of SCANNERS.
      • Supply, Installation of Form Processing Software.
      • Manage LAN and load balancing from one stage to another.
      • Provide Software Core-Team centrally at ORGI HQ.
      • Impart operational training to the staff at each location.
      • Provide Software Personnel at each site
      • Provide scanner operators and carry out Scanning operations
      • Achieve > 90% recognition rate and < 2% false positive