Data Warehouse techniques on Intermediate Census              and Demographic Statistics Web sites                        ...
Contents•What a Data Warehouse is•The User Interface: How users make queries•Data modelling: Two ways to organise data•Sof...
Data Warehouse•A data warehouse is a central repository for all or significantparts of the data that an enterprises variou...
Data Warehouse  A data warehouse is a copy of transaction data specifically           structured for querying and reportin...
User InterfaceIs the door to access dataThe goal of demo.istat.it and cens.istat.it has been to obtain auser interface to ...
User InterfaceOn the Web that means:•building the User interface as an HTML page•using available technologies (DHTML)•usin...
User InterfaceBuilding an User interface on the Web is completely different totraditional programming because of:   •DHTML...
User InterfaceBecause of this, in general:•Web programmers tend to develop systems in depth•It means users have to click m...
User Interface                                  2                   1              3The adopted solution has been placing ...
User InterfaceIn this way, users can always use the same window   •to perform queries   •to see results   •to save dataAdv...
User Interface How they work                 11
Data ModelDuring the Interface analysis we talked about:   •Analysis Variables   •Dimensions   •Data shown from different ...
Data ModelThe user interface is built around a data structure suitable to bequeried from different points of view.        ...
Dimensional Modelling•DM is a favourite modelling technique in data warehousing•In DM, a model of tables and relations is ...
Dimensional ModellingIn contrast, conventional E-R models are constituted to   •Removing the redundancy in data models   •...
Dimensional ModellingOLTP: On line transaction processingIs a class of programs that facilitates and managestransaction-or...
Dimensional ModellingOLTP based system=“gets the data in”OLAP based system =“gets the data out”                           ...
Dimensional Modelling•The data warehouse exists to answer questions people have aboutthe “business”•Dimensional modelling ...
Dimensional Modelling                How to build a Dimensional Model•DM is built around a “business subject” (in our case...
Dimensional ModellingThe measurements are referred to as FACTSThe parameters by which a fact can be viewed are referred to...
Dimensional Modelling                           Class of Employees                           EmployeesClass_PK            ...
Dimensional Modelling… or, speaking about demo.istat.it                           Resident Population                     ...
Dimensional Modelling•A Dimensional Model does not change much when implementedin a relational database. (The DM is referr...
Dimensional Modelling                This is         what we have done withWe have implemented the fact table in a        ...
Dimensional ModellingWe have also created aggregate tables by “Geography” dimension                         Resident Popul...
Dimensional ModellingResident Population Facts    AGG_BY_ITALYResident Population Facts                 Resident Populatio...
Dimensional ModellingThe aggregate tables contain the same facts as the base fact table,          but they are recorded at...
Dimensional Modelling  Dimensional Models are not always implemented in relational                           data bases.Se...
Dimensional Modelling                This is        what we have done withWe have implemented the fact table in a     mult...
MDDBA multidimensional database (MDDB) is a specialised storagefacility that allows data to be stored in a matrix-like for...
MDDBBesides, the process of building an MDDB, it summarises also theraw data according to hierarchical dimensionsSummarise...
MDDBSubcubes are built to enhance reporting speedIf a subcube does not exist for a particular aggregate query, thatis, if ...
MDDBIf you know of common queries that can be answered using asmaller set of crossings, you could create a subcube that sp...
SummaryWe have seen two ways to implement the fact table.•As a Relational Database (for demo.istat.it)•As a Multidimension...
Developing environmentTo implement demo.istat.it we’ve used:•mSQL as Relational Database (http://www.hughes.com.au/)•PHP a...
HTMLWelcome.html                                      HTML pages<HTML><BODY bgcolor=“#FFFFFF”>          <H1>Welcome!<H1></...
PHPWelcome.php                                      2                                                        PHP<?php     ...
SASIn contrast, to implement cens.istat.it we’ve used:•SAS/MDDB (to build Multidimensional Databases)•SAS/IntrNet (to run ...
SAS                                          21. The client                                     Cgi-bin          Brokerreq...
Costs and maintenanceHow much it costs to build and to maintain both systems interms of•Money•People•Time                 ...
Costs and maintenanceTo build demo.istat.it we needed to build:   •The relational database containing the Fact Table   •Th...
Costs and maintenanceTo build cens.istat.it we needed to:   •Build the MDDB containing the NWAY and all Data Marts   using...
Costs and maintenancemSQL (free of charge for certain      3 Weeks workorganisations. Otherwise US $ 250)PHP (Completely f...
Costs and maintenanceSAS                                3 Months workSAS/MDDBSAS/IntrNet                        10 PeopleS...
Summary  We have seen so far ...What a Data Warehouse isThe Data structures featuresThe User interface featuresThe Develop...
46
Resident population on 1st January 2001         Age=18         Region = Lazio                   Single   Married Divorced ...
Resident population on 1st January 2001         Age=All         Region = Lazio                    Single   Married Divorce...
Resident population on 1st January 2001              Age=All              Region = Lazio                 Single   Married ...
50
51
52
GIS                     Geographic Information SystemEssentially, a GIS is a computer-assisted information managementsyste...
QuestionsIs it possible to build up a Web based GIS System?Is it possible to combine a Web warehouse system with aGIS comp...
55
Web System Architecture                                Client     HTTP                                                 HTT...
Conclusions•A data warehouse is a central repository for   •The Geomarketing is to use the Geographyall or significant par...
Thank you for your attentionMy E-mail: patruno@istat.itMy address: Vincenzo Patruno            ISTAT - DCIT            - C...
Upcoming SlideShare
Loading in...5
×

Data Warehouse techniques on Intermediate Census and Demographic Statistics Web sites

2,967
-1

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,967
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Warehouse techniques on Intermediate Census and Demographic Statistics Web sites

  1. 1. Data Warehouse techniques on Intermediate Census and Demographic Statistics Web sites “View data from different points of view” Vincenzo Patruno - ISTAT TES course: Techniques for Data Dissemination Madrid 9th April 2003Http://cens.istat.itHttp://demo.istat.it M. C. Escher Relativity 1
  2. 2. Contents•What a Data Warehouse is•The User Interface: How users make queries•Data modelling: Two ways to organise data•Software environment•Costs and maintenance 2
  3. 3. Data Warehouse•A data warehouse is a central repository for all or significantparts of the data that an enterprises various business systemscollect•A data warehouse is a collection of data designed to supportmanagement decision making•A data warehouse is a computer system designed to givebusiness decision makers instant access to information bycopying data from existing systems and storing it for use byexecutives. 3
  4. 4. Data Warehouse A data warehouse is a copy of transaction data specifically structured for querying and reporting (Ralph Kimballs definition on page 310 of The Data Warehouse Toolkit - John Wiley & Sons 1996)Queries and reports generated from data stored in a data warehouse may ormay not be used for analysis. 4
  5. 5. User InterfaceIs the door to access dataThe goal of demo.istat.it and cens.istat.it has been to obtain auser interface to permit: •Easy data access •Handy parameter selection •Fast Data Base queriesand suitable to easily show data from different points of viewusing Internet technologies 5
  6. 6. User InterfaceOn the Web that means:•building the User interface as an HTML page•using available technologies (DHTML)•using HTTP protocol to send queries and to obtain results 6
  7. 7. User InterfaceBuilding an User interface on the Web is completely different totraditional programming because of: •DHTML limits •Cross-Browser problems •The nature of HTTP 7
  8. 8. User InterfaceBecause of this, in general:•Web programmers tend to develop systems in depth•It means users have to click many times to obtain results Our goal has been to obtain a user interface with a behaviour similar to traditional applications 8
  9. 9. User Interface 2 1 3The adopted solution has been placing into the same window the frames 1 and 2 to select parameters and variables and using theframe 3 to show results according to the reporting policy adopted 9
  10. 10. User InterfaceIn this way, users can always use the same window •to perform queries •to see results •to save dataAdvantages •Easy and fast access to dataDisadvantages •It doesn’t look so good 10
  11. 11. User Interface How they work 11
  12. 12. Data ModelDuring the Interface analysis we talked about: •Analysis Variables •Dimensions •Data shown from different points of view Both systems are built according to the modelling techniques used to build up Data Warehouses 12
  13. 13. Data ModelThe user interface is built around a data structure suitable to bequeried from different points of view. but To do this, it is very important to build first of all a good conceptual data model “The conceptual data model isn’t an exercise in intellectual gymnastics for engineers but the starting point to build good software systems” 13
  14. 14. Dimensional Modelling•DM is a favourite modelling technique in data warehousing•In DM, a model of tables and relations is built with thepurpose of maximising decision support and queryperformance in relational databases•It’s an excellent technique to build data models to optimiseOLAP performances 14
  15. 15. Dimensional ModellingIn contrast, conventional E-R models are constituted to •Removing the redundancy in data models •Facilitating the retrieval of individual records having certain critical identifiers •Optimising OLTP performance 15
  16. 16. Dimensional ModellingOLTP: On line transaction processingIs a class of programs that facilitates and managestransaction-oriented applications (typically for data entry andretrieval transactions)OLAP: On Line Analytical ProcessingEnables a user to easily and selectively extract and view datafrom different points of view 16
  17. 17. Dimensional ModellingOLTP based system=“gets the data in”OLAP based system =“gets the data out” 17
  18. 18. Dimensional Modelling•The data warehouse exists to answer questions people have aboutthe “business”•Dimensional modelling techniques ensure that the DW designreflects the way users think about the “business” and that the DWcan be used to answer their questions.•A dimensional model (DM) captures the measurement ofimportance to a “business” and the parameters by which themeasurements are broken out.•Is an excellent tool for identifying and classifying the importantbusiness components in a subject area. 18
  19. 19. Dimensional Modelling How to build a Dimensional Model•DM is built around a “business subject” (in our case “statisticalsubject”)•It means we have firstly to identify our subject to be modelledand all the measures that describe our subject.•At the same time we have to identify the parameters by which ameasurement can be viewed. 19
  20. 20. Dimensional ModellingThe measurements are referred to as FACTSThe parameters by which a fact can be viewed are referred to asDIMENSIONSThe level of detail of measures in the fact table is referred to asGRAINN.B. It is crucial that every row in the “fact” table be recorded at exactly the same level of detail 20
  21. 21. Dimensional Modelling Class of Employees EmployeesClass_PK EmployeesClass_Name Enterprise Facts Employees Profits Loss EmployeesClass_FK Geography_FK Geography Geography_PK Municipality Province Region Geographical Area 21
  22. 22. Dimensional Modelling… or, speaking about demo.istat.it Resident Population Facts Male Geography Female Age Geography_PK Male Married Age_PK Municipality Female Married Age_desc Province …… Region Age_FK Geographical Area Geography_FK 22
  23. 23. Dimensional Modelling•A Dimensional Model does not change much when implementedin a relational database. (The DM is referred to as Star Schema)•Each box of dimension attributes becomes a table in thedatabase, referred to as a dimension table.•The fact table becomes a very large table containing a very largenumber of rows. It contains the measures plus foreign keys thatrelate each measurement to the appropriate rows in each of thedimension tables. 23
  24. 24. Dimensional Modelling This is what we have done withWe have implemented the fact table in a relational database 24
  25. 25. Dimensional ModellingWe have also created aggregate tables by “Geography” dimension Resident Population Facts AGG_BY_PROVINCE Male Province Female Age Province_PK Male Married Age_PK Province Female Married Age_desc …… Age_FK Province_FK 25
  26. 26. Dimensional ModellingResident Population Facts AGG_BY_ITALYResident Population Facts Resident Population FactsAGG_BY_GEOGRAPHI AGG_BY_REGION CAL_AREA Resident Population Facts AGG_BY_PROVINCE BASE Resident Population Facts 26
  27. 27. Dimensional ModellingThe aggregate tables contain the same facts as the base fact table, but they are recorded at a different GRAIN It’s an excellent way to manage hierarchies 27
  28. 28. Dimensional Modelling Dimensional Models are not always implemented in relational data bases.Several vendors offer multidimensional databases (MDDB) which store information in a different format often referred to as cube 28
  29. 29. Dimensional Modelling This is what we have done withWe have implemented the fact table in a multidimensional database 29
  30. 30. MDDBA multidimensional database (MDDB) is a specialised storagefacility that allows data to be stored in a matrix-like format•It contains all possible values resulting from crossing alldimensions and all measures.•The whole of these values is referred to as Nway Cube 30
  31. 31. MDDBBesides, the process of building an MDDB, it summarises also theraw data according to hierarchical dimensionsSummarised data is stored in data structures referred to assubcubesAn MDDB stores its data as an Nway Cube and zero or more subcubes 31
  32. 32. MDDBSubcubes are built to enhance reporting speedIf a subcube does not exist for a particular aggregate query, thatis, if no subcube defines the exact crossing required to answer thequery, the aggregate data will be derived from the smallestsubcube that can provide the data.If no subcube can provide the data, it is derived from the Nwaycube 32
  33. 33. MDDBIf you know of common queries that can be answered using asmaller set of crossings, you could create a subcube that specifiesthe exact crossing required. Subcubes are often referred to as Data Marts 33
  34. 34. SummaryWe have seen two ways to implement the fact table.•As a Relational Database (for demo.istat.it)•As a Multidimensional Database (for cens.istat.it)Now we are going to see the tool to implement the DB and how theprograms to connect the DB with the User Interface have been built 34
  35. 35. Developing environmentTo implement demo.istat.it we’ve used:•mSQL as Relational Database (http://www.hughes.com.au/)•PHP as programming language (http://www.php.net/) PHP is a server-side, cross-platform, HTML embedded scripting language. 35
  36. 36. HTMLWelcome.html HTML pages<HTML><BODY bgcolor=“#FFFFFF”> <H1>Welcome!<H1></BODY></HTML> 1 21. The Client required Welcome.html2. The Web Server send Welcome.html to the ClientWelcome.html is interpreted by the Browser anddisplayed on the screen 36
  37. 37. PHPWelcome.php 2 PHP<?php Php pages interpreterprint “<HTML> <BODY bgcolor=“#FFFFFF”> <H1>Welcome!<H1> </BODY> </HTML>”; 1 RDBMS?> 31. The Client required Welcome.php2. The PHP interpreter runsWelcome.php3. Results are sent to the client 37
  38. 38. SASIn contrast, to implement cens.istat.it we’ve used:•SAS/MDDB (to build Multidimensional Databases)•SAS/IntrNet (to run SAS programs on the Web)•DAB (is a tool to generate automatically all programs and alljavascript to query the MDDB via Web) 38
  39. 39. SAS 21. The client Cgi-bin Brokerrequired to run a HTML pagesSAS program (i.e.sending a form)2. The SAS broker 3 Sas programs(cgi-bin program)calls the SAS 1 4program stored in anindependent area3. SAS program runsand accesses data MDDB4. Results are sentto the client 39
  40. 40. Costs and maintenanceHow much it costs to build and to maintain both systems interms of•Money•People•Time 40
  41. 41. Costs and maintenanceTo build demo.istat.it we needed to build: •The relational database containing the Fact Table •The Aggregate tables •The Dimension tables •All programs to query tables and to format outputs as html page and as csv file. •All JavaScripts to manage the user interface 41
  42. 42. Costs and maintenanceTo build cens.istat.it we needed to: •Build the MDDB containing the NWAY and all Data Marts using SAS tools. •Generate all SAS programs to query tables and to format outputs as html page and as csv file. •Generate all JavaScripts to manage the user interface.SAS programs and JavaScripts are generated automatically bySAS/DAB 42
  43. 43. Costs and maintenancemSQL (free of charge for certain 3 Weeks workorganisations. Otherwise US $ 250)PHP (Completely free of charge) 2 PeopleHardware: Workstation IBM/AIX RS/6000 43P 9GB-HD Each Database changes once a year Every year we create a new DB Time needed to charge new data: 5 minutes 43
  44. 44. Costs and maintenanceSAS 3 Months workSAS/MDDBSAS/IntrNet 10 PeopleSAS/DAB (free of charge) 1 SAS adviserHardware: ServerIBM AIX - 40GB HD Databases don’t change 44
  45. 45. Summary We have seen so far ...What a Data Warehouse isThe Data structures featuresThe User interface featuresThe Developing environment 45
  46. 46. 46
  47. 47. Resident population on 1st January 2001 Age=18 Region = Lazio Single Married Divorced Total Province Males Males Males … Males …Viterbo 1574 ... ... ... 1575 ...Rieti 830 ... ... ... 830 ...Rome 19510 ... ... ... 19511 ...Latina 3314 ... ... ... 3317 ...Frosinone 3297 ... ... ... 3304 ... 47
  48. 48. Resident population on 1st January 2001 Age=All Region = Lazio Single Married Divorced Total Province Males Males Males … Males …Viterbo 59131 79068 ... ... 143470 ...Rieti 31060 40010 ... ... 73819 ...Rome 829036 948885 ... ... 1843238 ...Latina 112325 133470 ... ... 252280 ...Frosinone 104665 129906 ... ... 242108 ... 48
  49. 49. Resident population on 1st January 2001 Age=All Region = Lazio Single Married Divorced Total Total Province … Males F/M Males Males Males Males DensityViterbo 59131 79068 ... ... 143470 79,0 ...Rieti 31060 40010 ... ... 73819 52,6 ...Rome 829036 948885 ... ... 1843238 668,7 ...Latina 112325 133470 ... ... 252280 217,6 ...Frosinone 104665 129906 ... ... 242108 147,3 ... 49
  50. 50. 50
  51. 51. 51
  52. 52. 52
  53. 53. GIS Geographic Information SystemEssentially, a GIS is a computer-assisted information managementsystem of geographically referenced data.It contains two closely integrated databases:•The spatial database contains information in the form of digital co-ordinates. These can be points, lines, or polygons.•The attribute database contains information about thecharacteristics or qualities of the spatial features (i.e. demographicinformation).GIS is sometimes seen as a set of tools for analysing spatial data. 53
  54. 54. QuestionsIs it possible to build up a Web based GIS System?Is it possible to combine a Web warehouse system with aGIS component? 54
  55. 55. 55
  56. 56. Web System Architecture Client HTTP HTTPLinux RedHatApache HTTP ServerPHP4PostgesqlPostGISMapserverMapscript Data Warehouse Spatial and Statistical Data 56
  57. 57. Conclusions•A data warehouse is a central repository for •The Geomarketing is to use the Geographyall or significant parts of the data that an to make efficient business decisions.enterprises various business systems collect•A data warehouse is a collection of data •The Geomarketing answers to crucialdesigned to support management decision questions concerning marketing, companymaking sales and other fields.•A data warehouse is a computer system •The Geomarketing is a complete database ofdesigned to give business decision makers commercial and marketing information builtinstant access to information by copying data around a geographical systemfrom existing systems and storing it for useby executives.•A data warehouse is a copy of transactiondata specifically structured for querying andreporting 57
  58. 58. Thank you for your attentionMy E-mail: patruno@istat.itMy address: Vincenzo Patruno ISTAT - DCIT - Central Direction of Information Technology - Security and Web Technologies Via C. Balbo, 16 00184 Rome - Italy 58
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×