SlideShare a Scribd company logo
1 of 18
Santosh Ghimire – 066 BCT 533
Subit Raj Pokharel – 066 BCT 538
Sudip Kafle – 066 BCT 539
                                   1
Data
             Visualization
Extraction




             Processing


                             2
System Block Diagram




                       3
4
1. Election Data    1. District      1. Data for District
2. GIS Data for        Population       Level Indicators
   Coordinate of       Based on
   districts           Ethnicity



        Different Set of Data available in
               Different file Formats
                                                5
Extraction


                                           Database




Parser extracts data from file and saves to database
                                             6
Parsing

   XML file has Tree-node structure
    › Required data present in between opening and
      closing tags
   PDF has no standard format for storing data
    › The file first converted to simple text
   HTML has DOM structure.
    › Data may not be structurally represented unlike
    XML.
 Data extracted using Regular expressions : PDF and
  HTML
                                                  7
Data Management

 Admin needs to login for Data
  Management
 Can Add, Update and Delete Data by
  searching based on various criteria
 Only admin can register new admin




                                    8
9
Population in Nepal
District    Population
Achham      228990
Arghakhanchi202462
Baglung     268240
Baitadi     234002




                         10
Visualization on Map

 Used Google Map API
 JavaScript used at Client side
 jQuery and JSON used to implement AJAX

                          Web Server

                                       Server Acknowledge
       User sets new                   request and sends
       criteria for Map                map data in JSON
                                       format

    Map shown on
    Web Page                                      New Map
                                                            11
Visualization with Tag Cloud

 Shows overview of scattering of data.
 One dimension represented by Text
  displayed (e.g. Name of district)
 Other dimension by weight(Font size and
  Color) of Text
    › Implemented using CSS.
   Weight of Tags statistically calculated
    based on population.

                                              12
13
State No. 7      State No. 2          State No. 1
     State No. 8




State No. 6                                                     State No. 3
                   State No. 5            State No. 4
                                                                        14
Analyzing Feasibility of Federal States

 Districts can be selected to form new state.
 Aggregate data for each state obtained
  from database.
    › Data can be
       Top Caste, Top Parties in election, development
        index
   Coefficient of Variation used to see if it is
    feasible


                                                  15
Facts Finder

 Informative facts extracted from raw data
  in Database
 User allowed to choose from multiple
  criteria
 Nested SQL queries used




                                      16
Methodology

 Programming    Languages
  › C# with ASP .Net, JavaScript, jQuery
 MS-SQLServer 2008 as Database Engine
 Web Technologies
  › JSON, AJAX
 Google   Map API



                                           17
Project Management

 Each   phase divided into small chunks.
  › Assigned to team members.
 OnlineRepository created on
 BitBucket.org
  › Using Mercurial based TortoiseHg
  › Works synchronized among each member
 Weekly discussion with Senior Developer
 at YIPL Nepal.
                                       18

More Related Content

Similar to Data Extraction, Visualization and Processing with application to census and election of Nepal

Spatial Data with SQL Server Reporting Services
Spatial Data with SQL Server Reporting ServicesSpatial Data with SQL Server Reporting Services
Spatial Data with SQL Server Reporting Services
Mihail Mateev
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
Pere Urbón-Bayes
 
Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
sharonyb
 
Seven50 Sparc Overview
Seven50 Sparc OverviewSeven50 Sparc Overview
Seven50 Sparc Overview
Roar Media
 

Similar to Data Extraction, Visualization and Processing with application to census and election of Nepal (20)

HoLIS GIS Update
HoLIS GIS UpdateHoLIS GIS Update
HoLIS GIS Update
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
 
Spatial Data with SQL Server Reporting Services
Spatial Data with SQL Server Reporting ServicesSpatial Data with SQL Server Reporting Services
Spatial Data with SQL Server Reporting Services
 
Map store geosolutions
Map store   geosolutionsMap store   geosolutions
Map store geosolutions
 
Building a modern in-house analytics pipeline
Building a modern in-house analytics pipelineBuilding a modern in-house analytics pipeline
Building a modern in-house analytics pipeline
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
gdswamyResume
gdswamyResumegdswamyResume
gdswamyResume
 
AIC x PyLadies TW Python Data Vis - 3: Dashboard
AIC x PyLadies TW Python Data Vis - 3: DashboardAIC x PyLadies TW Python Data Vis - 3: Dashboard
AIC x PyLadies TW Python Data Vis - 3: Dashboard
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
 
C1803041317
C1803041317C1803041317
C1803041317
 
Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
 
Network Information Factories
Network Information FactoriesNetwork Information Factories
Network Information Factories
 
Seven50 Sparc Overview
Seven50 Sparc OverviewSeven50 Sparc Overview
Seven50 Sparc Overview
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
 
GIS presentation
GIS presentationGIS presentation
GIS presentation
 
Suvradipta sadhukhan june_19
Suvradipta sadhukhan june_19Suvradipta sadhukhan june_19
Suvradipta sadhukhan june_19
 
DBMS
DBMSDBMS
DBMS
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
 
Webinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDBWebinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDB
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 

Data Extraction, Visualization and Processing with application to census and election of Nepal

  • 1. Santosh Ghimire – 066 BCT 533 Subit Raj Pokharel – 066 BCT 538 Sudip Kafle – 066 BCT 539 1
  • 2. Data Visualization Extraction Processing 2
  • 4. 4
  • 5. 1. Election Data 1. District 1. Data for District 2. GIS Data for Population Level Indicators Coordinate of Based on districts Ethnicity Different Set of Data available in Different file Formats 5
  • 6. Extraction Database Parser extracts data from file and saves to database 6
  • 7. Parsing  XML file has Tree-node structure › Required data present in between opening and closing tags  PDF has no standard format for storing data › The file first converted to simple text  HTML has DOM structure. › Data may not be structurally represented unlike XML.  Data extracted using Regular expressions : PDF and HTML 7
  • 8. Data Management  Admin needs to login for Data Management  Can Add, Update and Delete Data by searching based on various criteria  Only admin can register new admin 8
  • 9. 9
  • 10. Population in Nepal District Population Achham 228990 Arghakhanchi202462 Baglung 268240 Baitadi 234002 10
  • 11. Visualization on Map  Used Google Map API  JavaScript used at Client side  jQuery and JSON used to implement AJAX Web Server Server Acknowledge User sets new request and sends criteria for Map map data in JSON format Map shown on Web Page New Map 11
  • 12. Visualization with Tag Cloud  Shows overview of scattering of data.  One dimension represented by Text displayed (e.g. Name of district)  Other dimension by weight(Font size and Color) of Text › Implemented using CSS.  Weight of Tags statistically calculated based on population. 12
  • 13. 13
  • 14. State No. 7 State No. 2 State No. 1 State No. 8 State No. 6 State No. 3 State No. 5 State No. 4 14
  • 15. Analyzing Feasibility of Federal States  Districts can be selected to form new state.  Aggregate data for each state obtained from database. › Data can be  Top Caste, Top Parties in election, development index  Coefficient of Variation used to see if it is feasible 15
  • 16. Facts Finder  Informative facts extracted from raw data in Database  User allowed to choose from multiple criteria  Nested SQL queries used 16
  • 17. Methodology  Programming Languages › C# with ASP .Net, JavaScript, jQuery  MS-SQLServer 2008 as Database Engine  Web Technologies › JSON, AJAX  Google Map API 17
  • 18. Project Management  Each phase divided into small chunks. › Assigned to team members.  OnlineRepository created on BitBucket.org › Using Mercurial based TortoiseHg › Works synchronized among each member  Weekly discussion with Senior Developer at YIPL Nepal. 18