IT tools for statistics, visualization, open data


Published on

Seminar "Opening Financial Data in Turkey: transparency, accessibility and citizen involvement"
"IT tools for statistics, visualization, open data"
Carlo Vaccari
Ankara, April 19 2012

Published in: Technology, Business

IT tools for statistics, visualization, open data

  1. 1. EU TWINNING PROJECT TR 08 IB FI 02 “ Improving Data Quality in Public Accounts” AB EŞLEŞTİRME PROJESİ “ Kamu Hesaplarında Veri Kalitesinin Artırılması”IT tools for statistics, visualization, open data Carlo Vaccari (ISTAT / Formez) 1 Twinning Project “Improving data quality in public accounts”
  2. 2. Data warehouseBusiness Intelligence to analyze dataBusiness Intelligence elaborations operate on Data WarehouseA Data Warehouse is a collection of data that supports decisionmaking and having the following characteristics: • oriented to the subject of interest • integrated and consistent • representative of the temporal evolution • non-volatile 2 Twinning Project “Improving data quality in public accounts” 2
  3. 3. Data Warehouse toolsOperational from operational data data to data warehouse Current transactional Data Warehouse procedures Dashboards Data Mining Advanced OLAP tools Reporting 3 Twinning Project “Improving data quality in public accounts” 3
  4. 4. DashboardDashboard: data visualization tool that displays the current status ofmetrics and key performance indicators (KPIs) for an enterprise.Dashboards consolidate and arrange numbers, metrics andsometimes performance scorecards on a single screen.Various kind of dashboards: “Business Dashboards” – Business related dashboard “Executive Dashboard” – Dashboards meant to be used by CEO,Managers etc “Operational Dashboard” – Dashboards that monitor day to dayactivityDashboards are designed to help us monitor what’s going on at aglance 4 Twinning Project “Improving data quality in public accounts” 4
  5. 5. Dashboard 5Twinning Project “Improving data quality in public accounts” 5
  6. 6. Dashboard 6Twinning Project “Improving data quality in public accounts” 6
  7. 7. OLAPOnLine Analytical Processing: decision support software that allows theuser to quickly analyze information that has been summarized intomultidimensional views and hierarchiesOLAP tools are used to perform trend analysis on financial information Multidimensional data Many operators Complex not- predefined analysis Data: - not operational - current and historical 7 Twinning Project “Improving data quality in public accounts” 7
  8. 8. An OLAP implementation with BO (Italian case) E/R outline DFM Fact outline DFM Functionality outline Fascia di Dimensione Tipo Ente Codice Fascia Popolazione Periodo Prospetto PSI Codice Tipo Ente Periodo PSI Anno rilevazione Istat Tipo Applicazione Fascia di Dimensione Periodo Prospetto Cassa Codice Tipo Applicazione Codice Fascia Ente Periodo Prospetto Cassa Codice Ente Popolazione Tipo Ente Anno rilevazione Istat Codice Tipo Ente Tipo Anomalia Codice Tipo Anomalia Prospetto Cassa Prospetto PSI Ente Codice Ente Tipo Prospetto Cassa Anagrafica Anomalia Tipo Prospetto Cassa Codice Anomalia Voce Cassa Tipo Modello Codice Tipo Modello Dettaglio Anomalia Progressivo Anomalia Tipo Voce Prospetto Codice Tipo Voce Anagrafica Voce Cassa Codice Voce Cassa Anagrafica Voce PSI Codice Voce Patto Tipo Voce Cassa Codice Tipo Voce Cassa Voce Prospetto PSI Sezione Titolo Voce Istat Voce Patto Categoria Voce Dettaglio Business Rules Analysis Development BO Report BO Universe ETL EDW ETL Data Mart DMA_DC13_LISTE ID_LISTA: SMALLINT DENOMINAZIONE_LISTA: varchar(200) NOME_FORNITORE: varchar(200) DMA_DC12_CAMPAGNE ID_CAMPAGNA: SMALLINT CAMPAGNA: varchar(200) DMA_DC01_DATA_OSSERVAZIONE ID_DATA_OSS: smalldatetime DATA_FORNITURA: smalldatetime DATA_ASSEGN_CAMPAGNA: smalldatetime DATA_OSSERVAZIONE: smalldatetime CRITERI_SELEZIONE: varchar(2000) DB_PROVENIENZA: varchar(100) COD_LISTA: varchar(200) DMA_DC02_CLIENTI D_DMA_DCEE_CL_ECONOMICA_EN ID_CLIENTE: SMALLINT DMA_DC08_OPERATORI_TELESELLING D_DMA_DTPE_TIPO_ ENTE TIPO_CLIENTE: varchar(15) ID_OPERATORE: int CONSENSO_INFORM: varchar(2) DMA_FC01_CONTATTI OPERATORE: varchar(20) NOMINATIVO_DA_RICHIAMARE: varchar(200) ID_CAMPAGNA: SMALLINT D_DMA_SLOG_LOG_DI_CARICAMENTO NOMINATIVO_INTERLOCUTORE: varchar(200) PARTNER_COMMERCIALE: varchar(255) ID_LISTA: SMALLINT D_DMA_DCES_CL_ECONOMICA_SP E_MAIL: varchar(50) ID_CLIENTE: SMALLINT PARTITA_IVA: varchar(16) D_DMA_DSTE_SOTTOTIPO_ENTE NUM_CONTATTI_DEFINITIVI: SMALLINT DMA_DC09_CONTRATTI FORMA_GIURIDICA: varchar(200) COGNOME_RAGIONE_SOCIALE: varchar(200) NUM_CONTATTI_NON_DEFINITIVI: SMALLINT ID_CONTRATTO: SMALLINT NOME: varchar(100) NUM__PRODOTTI_VENDUTI: SMALLINT TIPO_CONTRATTO: varchar(30) TITOLO: varchar(50) NUM_SERVIZI_VENDUTI: SMALLINT CONTRATTO: varchar(200) D_DMA_SSTS_STATUS ID_TEMPO_COURTESY_CALL: SMALLINT D_DMA_DPRZ_PROV_REG_ZONA SESSO: varchar(10) ID_TEMPO_CHIUSURA: SMALLINT UTENZA_CLI_INPUT: varchar(15) UTENZA_ALTERNATIVA1_CLI_OUTPUT: varchar(15) ID_TEMPO_SCADENZA_GEST: SMALLINT TIPO_1_CLI_OUTPUT: varchar(20) ID_CONTRATTO: SMALLINT DMA_DC07_FASCIA_ETA D_DMA_DAVC_ANAGRAFICA_VOCE_CAS UTENZA_ALTERNATIVA2_CLI_OUTPUT: varchar(15) ID_ESITO: SMALLINT ID_FASCIA_ETA: TINYINT CENTRALE_TELEFONICA: varchar(100) ID_DATA_OSS: smalldatetime FASCIA_ETA: varchar(10) CODICE_IDBRE: varchar(15) ID_GESTORE: SMALLINT ESTREMO_INF: TINYINT LONGDISTANCE: varchar(2) ID_FASCIA_ETA: TINYINT ESTREMO_SUP: TINYINT COPERTURA_WS: varchar(2) DATA_HHMM_CONTATTO: smalldatetime CENTRALE_ADSL_SA: varchar(2) ID_MOT_NON_ADES_ALTR: TINYINT TIPO_APPARATO: varchar(50) ID_MOT_NONADES_NOTE: int DMA_DC15_GESTORI INDIRIZZO_DEL_CLI_INPUT: varchar(200) ID_ESITO_CCALL: SMALLINT COD_PROVINCIA: varchar(3) ID_GESTORE: SMALLINT D_DMA_DCGS_CODICI_GEST_SIOPE COMUNE: varchar(100) PROVINCIA: varchar(100) COD_COMUNE: varchar(3) GESTORE: varchar(100) REGIONE: varchar(50) COD_REGIONE: varchar(2) CAP: varchar(5) ID_OPERATORE_VENDITA: int NUM_LINEE: TINYINT ID_OPERATORE_CCALL: int DMA_DC00_TEMPO FAX: varchar(15) ID_MOT_RIFIUTO_NOTE: int ID_TEMPO: SMALLINT D_DMA_DAEN_ANAGRAFICA_ENTE COD_CLIENTE: varchar(20) COD_REGIONE: varchar(2) ANNO: SMALLINT COD_PROVINCIA: varchar(3) MESE: TINYINT COD_COMUNE: varchar(3) GIORNO: TINYINT DATA: smalldatetime DMA_DC04_ESITO_CCALL ORA: TINYINT ID_ESITO_CCALL: SMALLINT FESTIVO: bit ESITO_COURTESY_CALL: varchar(20) MOTIVO_RIFIUTO: varchar(80) COD_ESITO_CCAL: varchar(10) DMA_DC18_REGIONI D_DMA_DDCT_DATA_CONT_SIOPE D_DMA_FMUE_MOV_USCITE_ENTRATE DMA_DC14_TERRITORIO COD_REGIONE: varchar(2) DMA_DC05_MOT_RIFIUTO_NOTE COD_PROVINCIA: varchar(3) ID_MOT_RIFIUTO_NOTE: int REGIONE: varchar(50) COD_COMUNE: varchar(3) RIPARTIZIONE: varchar(30) MOTIVO_RIFIUTO_NOTE: varchar(2000) COMUNE: varchar(100) COD_RIPARTIZ: varchar(1) PROVINCIA: varchar(100) DMA_DC03_ESITO_CONTATTO REGIONE: varchar(100) ID_ESITO: SMALLINT RIPARTIZIONE: varchar(20) CODICE_REGIONE: varchar(2) UTILITA: varchar(10) CODICE_RIPART: varchar(1) ESITO: varchar(20) MOTIVO_ESITO: varchar(80) COD_ESITO_DEF: varchar(2) ESITO_DEFINITIVO: varchar(20) D_DMA_DBTE_BANCA_TESORIERA DMA_DC16_PROVINCE DMA_DC10_MOT_NONADES_NOTE COD_PROVINCIA: varchar(3) ID_MOT_NONADES_NOTE: int PROVINCIA: varchar(100) MOT_NONADES_NOTE: varchar(2000) COD_REGIONE: varchar(2) DMA_DC06_MOT_NONADES_ALTRO ID_MOT_NON_ADES_ALTR: TINYINT MOTIVO_NON_ADESIONE_ALTR: varchar(50) 88 Twinning Project “Improving data quality in public accounts” 8
  9. 9. ToolsExamples of tools for data management and Business Intelligence(opensource applications)Google refine Intelligence opensource tools:- software with fee for support 9 Twinning Project “Improving data quality in public accounts” 9
  10. 10. Visualization techniquesVisualization techniques (cartography, advanced visualizationtools) Mindmaps Displaying news Displaying data Displaying connections Displaying websites Articles & resources Tools and servicesTableau 10 Twinning Project “Improving data quality in public accounts” 10
  11. 11. Visualization techniquesInfographics tools: Many Eyes on social network visualizations: 11 Twinning Project “Improving data quality in public accounts” 11
  12. 12. Visualization techniquesGoogle Public Data Explorer:a simple way to start presenting data using advancedvisualization techniques is Google Public Data Explorer(, a tool by which everyorganization can show his data on the Web so that users can find,explore, and share it.Two steps available for using GPDE:1 - MoF can start testing the tool uploading datasets forvisualization and exploration by privileged users2 – in a second phase MoF can agree with Google for a formalinsertion of his data in the Dataset Directory organizations have chosen this way of publishing (often asadditional way to their website), between them WorldBank, IMF,OECD, Eurostat etc. 12 Twinning Project “Improving data quality in public accounts” 12
  13. 13. Visualization techniquesVIDIVIDI suite is a set of Drupal (an open CMS) modules designed toenable the creation of visual data displays. Using VIDI tools youcan display changes in data values over time, relate data invarious ways to geographical maps, or display static datasetsthrough different types of charts. You can use Dataviz website tocreate visual data displaysTwo ways to use it:1. You can use VIDI on the website http://www.dataviz.orgloading your data, choosing between available visualizations andstoring your visualization2. Download VIDI modules and install them in your Drupalwebiste, then import your datasets and prepare data displayssee 13 Twinning Project “Improving data quality in public accounts” 13
  14. 14. Visualization techniquesFuture: HTML5 – new standard for the WebSee some show: effect from 14 Twinning Project “Improving data quality in public accounts” 14
  15. 15. Open DataOpen Data:- data freely available to everyone- elementary (raw) data- to use and republish- without restrictions from copyrights or patentsBest practice: World Bank by country or by topic or by indicator (1000+)All indicators available in table, map, graph and downloadable asxls and xmlOn the WB website also modules to directly access WB data fromStata and “R” statistical tools 15 Twinning Project “Improving data quality in public accounts” 15
  16. 16. Open Data Tim Berners-Lee: Linked Data associated with gold stars, like the ones you got in school. 1 - make your stuff available on the web (whatever format) 2 - make it available as structured data (e.g. excel instead of image scan) 3 - non-proprietary format (e.g. csv not xls) 4 - use URLs to identify things, so that people can point at your stuff 5 - link your data to other people’s data to provide context 16Twinning Project “Improving data quality in public accounts” 16
  17. 17. Linked Data 17Twinning Project “Improving data quality in public accounts” 17
  18. 18. Open Data tools: CKANCKAN stands for Comprehensive Knowledge Archive NetworkDeveloped by OKFN Open Knowledge Foundation NetworkOpen source package that make data accessible – by providingtools to streamline publishing, sharing, finding and using data.CKAN is aimed at data publishers (national and regionalgovernments, companies and organizations) wanting to maketheir data open and availableUsed by many central (dk, no, uk) and local governmentsFeatures: Publish & Find Datasets (import, keywords, versioning) Store & Manage Data (Raw data, metadata, statistics, geo-) Engage with users & Others (Community Mgmt) Customize & Extend (APIs, extensions, opensource) 18 Twinning Project “Improving data quality in public accounts” 18
  19. 19. Open Data tools: / DrupalDrupal is a CMS (Content Management System) opensource oftenused in Open Data code released as OSS (a modified Drupal version) usedalso for India → Open Government PlatformDrupal OpenData working group Journalism: 19 Twinning Project “Improving data quality in public accounts” 19