Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Maria F.Romano, Methodological innovations to estimate illegal economy

119 views

Published on

13° Conferenza Nazionale di statistica 4-5-6 luglio 2018
# FUTURO
Centro Congressi Ergife Via Aurelia, 619 Roma

Published in: Education
  • Be the first to comment

Maria F.Romano, Methodological innovations to estimate illegal economy

  1. 1. Methodological innovations to estimate illegal economy Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 0
  2. 2. o A research directed by Guido M. Rey has resulted in the volume «La mafia come impresa. Analisi del sistema economico criminale e dele politiche di contrasto» (2017) o In the chapter «Dalle parole ai numeri : estrarre dati dalle sentenze della magistratura» the results obtained from the analysis of about 5,000 judgements issued by the Corte di Cassazione are presented. o Increase the results obtained from the text mining of sentences through the interaction of multiple data sources. o Evaluation of completeness and reliability of data. o Organize database(s) aimed at estimating statistical models 1 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 1 Aims Starting point Goals
  3. 3. 2 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 2 Exercise: Integration of data from multiple sources ① Judgments issued by the Corte di Cassazione (www.italgiure.it) : Open Data PA ② Orbis : database of economic enterprises accessible with the resources of EMBeDS (Economics and Management in the era of Data Science), project winner in the MIUR selection of Departments of Excellence 2018-2022 http://embeds.santannapisa.it/ A subset of 308 sentences has been extracted from the selected 4,632 judgments (from 2012 to September 2016) with one or more of the words “corruzione”, “concussione”, “turbativa” e “appalto”. • Issued in 2014 • Containing references to professional roles held in the Public Administration
  4. 4. oCreation of a Corpus with the texts of the judgements oVocabulary (words and lemma) oGrammatical and semantic Tagging oIdentification of Multiwords and segments oText mining Through the TalTaC2 package 3 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 3 Step 1: Import texts of sentences and text mining
  5. 5. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 4 Information chart Come si può vedere dalla figura seguente, il centro delle informazioni è costituito dal singolo evento criminoso, che coinvolge attori (singoli o aggregati), che viene individuato / sanzionato, che si svolge in un luogo geografico specifico, in una data (o periodo) certa, con determinate modalità, con un valore economico determinato. Fa parte /lavora per Evento criminoso persona persona  Tribunale  Polizia  Sanzionato /Individuato Valore economico Euro coinvolge quando dove come Ai danni di Insie me a Ass criminale Ente Pubblico Azienda luogo periodo WHO WHEN WHERE WHAT HOW Economic value
  6. 6. 5 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 5 Guidelines followed for matching with Orbis -- The matching procedure must be automatic or automatable: repeatable with lists obtained from a higher number of judgments and without the intervention of "manual" choices -- The presence of data / information on natural persons in clear does not pose privacy problems, because this information is not extracted "per se" but it constitutes the premise for obtaining a correct and reliable matching: the data are still treated in a statistical way (anonymously)
  7. 7. 6 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 6 Step 2: Matching with Orbis (1) «Batch search» (automatic) in two consecutive steps:  Companies : list obtained from Taltac2 by exporting name and identification of the sentence  Persons (defendants): list of defendants obtained by Taltac by exporting graphic forms with semantic tagging «defendants» (multiword graphic form with name and surname or surname and name) and date of birth
  8. 8. 7 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 7 Step 2: Matching with Orbis (2) RESULTS of the «Batch search» (automatic) in two consecutive steps:  Companies Input : 400 companies of wihich 228 with A score 186 unique companies (due to the presence of the company name in several judgments or the name written by judges with more variations)  Person Input (defendants): 408 defendants (unique, no repetitions) 16 validated records (automatic comparison between date of birth and part of the social security number) + 6 individual companies A Excellent total score >= 95% B Good total score between 85 and 94% The automated process produces a matching score for each record. Our quality indicator uses the following scoring criteria:
  9. 9. 8 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 8 Step 3: Information contribution from Orbis: variables with high information potential What data do we add to those already available?  Company status  Business size  Statistical classification of activities  Start year  Budget data  …. BUT ALSO THE NAMES OF THE TOP MANAGEMENT AND OWNERS Again with a view to anonymous treatment, they can be used to identify a network of companies. Not interesting "per se" (we are not a detective agency) but holders of other individual companies and / or family (founded after the outcome of the judgment). NB: the names of the defendants are clear in the source Corte di Cassazione, as it is the last court level.
  10. 10. 9 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 9 TalTaC results: The automatic classification of judgments
  11. 11. 10 Cluster 1 (n=119) : presence of organized crime Cluster 2 (n=177) : concussion /corruption in the PA cosca pubblico ufficiale associazione mafiosa concussione associazione privato Nome1 costrizione sodalizio corruzione partecipazione induzione conversazione servizio estorsione CP ndrangheta ufficio clan abuso Nome2 prescrizione Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 10 How to interpret clusters First 11 words characterizing the 2 main identified clusters
  12. 12. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 11 Not just text mining but help in the interpretation The interaction between the results of the textual analysis and the new information that can be acquired with other databases (administrative or not) is the novelty of the approach that is presented. The questions we would like to answer: Companies present in sentences have characteristics different from those not present? Do the companies, belonging to a cluster and present in the judgments, differ? Example: Different by company size, economic sector, geographical location?
  13. 13. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 12 Regions and companies by cluster Region Cluster 1 Reati + org crim Cluster 2 Reati e PA Total # sentenze # imprese # sentenze # imprese # sentenze # imprese Abruzzo 1 1 1 1 Calabria 11 33 1 1 12 34 Campania 6 21 8 13 14 34 Emilia-Romagna 2 7 2 7 Lazio 1 1 5 6 6 7 Liguria 1 1 1 2 2 3 Lombardia 1 13 6 17 7 30 Marche 3 8 3 8 Molise 1 1 1 1 Piemonte 1 10 1 10 Puglia 5 14 5 14 Sardegna 1 1 1 1 Sicilia 5 13 5 11 10 24 Toscana 1 1 4 10 5 11 Veneto 4 17 4 17 Total 26 83 48 119 74 202 Dati provvisori e parziali
  14. 14. 13 National legal form Number of companies Consortium + Consortium with external activity 4 Cooperative company ( SCARL + SCARLPA) 4 Joint stock company - SPA 25 Limited liability company - SRL 121 Limited partnership - SAS 2 One-person company with limited liability - SRLU 21 One-person joint stock company - SPA 3 Sole proprietorship 2 n.d. 4 Total 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 13 Companies by national legal form Provisional and partial data To be added 22 one- person companies obtained from the list of defendants
  15. 15. 14 Status number of companies Active 135 Active (default of payment) 1 Bankruptcy 1 Dissolved 5 Dissolved (bankruptcy) 16 Dissolved (liquidation) 5 Dissolved (merger or take-over) 6 In liquidation 11 Status unknown 6 Totale 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 14 Companies by status Provisional and partial data
  16. 16. 15 Areas Status Active Others Status unknown Total ITC - Northwest 29 12 1 42 ITH - Northeast 22 12 34 ITI - Centre 33 9 42 ITF - South 26 8 4 38 ITG - Insular Italy 15 4 1 20 (blank) 10 0 10 Total 135 45 6 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 15 Companies by Geographical Areas and status Provisional and partial data Others: Active (default of payment) Bankruptcy Dissolved Dissolved (bankruptcy) Dissolved (liquidation) Dissolved (merger or take-over) In liquidation
  17. 17. 16 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 16 Discussion The potential sources of data and information are many and each one is organized according to its own purposes. The use for statistical purposes obliges to have to take into account some aspects, sometimes neglected when talking about Big Data or Open Data: • The completeness of the information • The time base of the information acquired or possibly acquired
  18. 18. 17 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 17 Final goal : the «statistical» DataBase The database thus obtained will allow reconstructions and analysis starting from any element (Company, Public Body, persons, period, place, etc) provided that it is correctly identified as such within the texts of the judgments. It is, therefore, necessary to use several tools: Text mining for processing the information contained in the texts of the sentences and transform them into data that can be analysed statistically Validate and integrate this data with other information and data from other administrative databases / records. The greater the completeness and reliability of the other databases, the greater the information value of the statistical analysis carried out on the statistical database.
  19. 19. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 18 Credits Un ringraziamento a: Fabrizio Alboni Daniela Arlia Antonella Baldassarini Lorenzo Bartalini Pietro Battiston Sergio Bolasco Alberto di Martino Giuseppe Di Vetta Pasquale Pavone Guido M. Rey

×