SlideShare a Scribd company logo
IT Architectures for Handling
Big Data in Official Statistics:
the Case of Scanner Data in Istat
Gianluca D’Amato, Annunziata Fiore,
Domenico Infante, Antonella Simone,
Giorgio Vinci, Antonino Virgillito
Istat
Scanner Data Workshop
Rome 2 October 2015
About me
• Head of unit «Architectures for business
intelligence, mobile and big data»
• Project manager of the UNECE project Big
Data in Official Statistics
• Leader of the IT group in the Istat Scanner
Data project
Abstract
• In this talk we present the issues and challenges
related to dealing with datasets of big size such as
those involved in the Scanner Data project at Istat
• We illustrate the IT architecture backing the testing
phase of the project, currently in place, and the ideas
for the production architecture
• The motivations behind the design are explained as
well as the solutions introduced as part of a larger
scope approach to the modernisation of tools and
techniques used for data storage and processing in
Istat, envisioning the future challenges posed by the
adoption of Big Data and Data Science in NSIs
Data Size - Current
Received and stored data from
6 chains in 6 provinces
30% of all stores available from Nielsen in these provinces
2 and a half years time span
Space occupation of 1 year of microdata in the database
426 Million records
18Gb
550 Stores
210,000 Products
Data Size - Expected
Further 29 provinces will be
received by the end of the year
Estimated size of 1 year of microdata
47Gb
142Gb
6 chains only
All available chains
Actual occupation
2 years of microdata only
36Gb
Current occupation of DB space
200Gb
100Gb
now
End 2015
29 provinces
6 chains
?
Indexes, views, aggregations,
classifications…
540Gb2016?
EVERYTHING!
The Problem with Size…
• Query time is not satisfactory
– Aggregated intermediate results in tables
– IT support required
– More space!
• «Difficult» to extract data for analysis
• Data growth is not predictable
• DBAs do not guarantee proper backup
when space occupation is over 500Gb
Architectural Elements
• Data ingestion
– SFTP
– Custom Java code
• Data architecture
– Relational DB
• Tools
– SAS, Excel…
– Business Analytics Platform (MicroStrategy)
Business Analyitics Platforms
• Allow to access data stored in different data sources and
represent it in a common multi-dimensional schema that is
easier to query and navigate
– Automatically aggregate measures at different levels of
dimentionality
• Normally used in enterprise contexts to facilitate management
of large-heterogeneous data warehouses
• Enable interactive analysis
– Reports: results of queries in a tabular format that can be
browsed and downloaded in Excel or CSV file
– Dashboard: free navigation on data through the creation of
interactive visualizations in drag-and-drop mode
• First example of use in Istat with large data sizes
Load Pre-process
IT Architecture: Testing Phase
SFTP
Views
Reports andVisualizations
DB
SAS
Microstrategy
Control Dashboard
Load Pre-process
Data Ingestion
SFTP SAS
Control Dashboard
Data is sent by Nielsen in form of compressed text files via SFTP (secure
channel)
The SFTP server is located in Istat data center and it is protected by strict
security policies
Load Pre-process
Data Ingestion
SFTP SAS
Control Dashboard
Received data are handled by programs written in Java
- Load: performs integrity checks on received files, loads data in the DB,
logs received files and estimates discounts
- Pre-process: performs quality checks at record level, discards dirty data
The whole acquisition process is controlled by a web dashboard
Load Pre-process
Data Access and Analysis
SFTP
Views
Reports andVisualizations
DB
SAS
Microstrategy
Data can be accessed in two ways:
• Extraction from the DB
• Materialized views were created in order to facilitate import in SAS
• Use of a business analytics tool (MicroStrategy) for reporting,
visualization and browsing of the data
Load Pre-process
Data Access and Analysis
SFTP
Views
Reports andVisualizations
DB
SAS
Microstrategy
In both access modes the results of common interrogations were pre-computed at
different levels of aggregations and were provided as views or reports
This allowed to speed-up access time at the cost of additional space on the DB
Preliminary Analysis
• Analysis of quality of data
– Anomalies
– Distributions
– Nulls
– …
• Exploratory analysis
– Time series of turnover and quantity
– Distribution per coicop group
– …
Turnover per Market
• Total quantity per week and province
Navigation of DB
Production Data Platform
• We are setting up a new data platform for the
production architecture based on Big Data tools
– 7 nodes Hadoop Cluster
– Hadoop: parallel storage and processing platform, de-facto
standard for Big Data
• Features:
– All historical data always online for interactive analysis
– Possibility of retaining historical data indefinitely
– Costruction of a global historical data warehouse of prices
data
– Database is used only for processing current, online data
(computation of indexes)
– Easier to perform large-scale analysis
Ingestion
IT Architecture: Production Phase
SFTP
Reports and
Visualizations
Control Dashboard SAS
Processing
of indexes
Oracle DB Hadoop
Extraction for
offline analysis
Enhanced data warehouse
Sample
selection
Current data Historical data
Conclusions
• The scanner data project represents a
challenging testbed for experimenting new
approaches in the IT support to analysis and
production
• Objective is get faster results and more
efficient processes
• The concept of «Big Data» is not merely a
matter of size but rather of new opportunities
• Technology can give the answers, now it’s
time to make new questions
Questions

More Related Content

What's hot

Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Dataconomy Media
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
Rob Winters
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
Denodo
 
How OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaHow OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman Marya
Data Con LA
 
Data mining tools used in business intelligence
Data mining tools used in business intelligenceData mining tools used in business intelligence
Data mining tools used in business intelligence
Nithya Ravi
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
Databricks
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
InfiniteGraph
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Dataconomy Media
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use Case
Databricks
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
Databricks
 
Business Insight
Business InsightBusiness Insight
Business Insight
Microsoft
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Why IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery PlatformWhy IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery Platform
syed_javed
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Data Con LA
 
The Life of an Internet of Things Electron
The Life of an Internet of Things ElectronThe Life of an Internet of Things Electron
The Life of an Internet of Things Electron
DataWorks Summit/Hadoop Summit
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
DataWorks Summit
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
DataWorks Summit
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
Hal Kalechofsky
 

What's hot (20)

Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
How OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaHow OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman Marya
 
Data mining tools used in business intelligence
Data mining tools used in business intelligenceData mining tools used in business intelligence
Data mining tools used in business intelligence
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use Case
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
Business Insight
Business InsightBusiness Insight
Business Insight
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Why IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery PlatformWhy IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery Platform
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 
The Life of an Internet of Things Electron
The Life of an Internet of Things ElectronThe Life of an Internet of Things Electron
The Life of an Internet of Things Electron
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 

Similar to IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat - Antonio Virgillito

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Alistair Hamilton
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
MohammedShahid562503
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisation
Shwetabh Jaiswal
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
Selvaraj Kesavan
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
ShivanandaVSeeri
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
eramfatima43
 
Microstrategy Overview
Microstrategy OverviewMicrostrategy Overview
Microstrategy Overview
Roberto Zerbini
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Pentaho
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Matt Stubbs
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
Simon Belak
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 

Similar to IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat - Antonio Virgillito (20)

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisation
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
 
Microstrategy Overview
Microstrategy OverviewMicrostrategy Overview
Microstrategy Overview
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 

More from Istituto nazionale di statistica

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 

More from Istituto nazionale di statistica (20)

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 

Recently uploaded

clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Leena Ghag-Sakpal
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
dot55audits
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 

Recently uploaded (20)

clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 

IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat - Antonio Virgillito

  • 1. IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat Gianluca D’Amato, Annunziata Fiore, Domenico Infante, Antonella Simone, Giorgio Vinci, Antonino Virgillito Istat Scanner Data Workshop Rome 2 October 2015
  • 2. About me • Head of unit «Architectures for business intelligence, mobile and big data» • Project manager of the UNECE project Big Data in Official Statistics • Leader of the IT group in the Istat Scanner Data project
  • 3. Abstract • In this talk we present the issues and challenges related to dealing with datasets of big size such as those involved in the Scanner Data project at Istat • We illustrate the IT architecture backing the testing phase of the project, currently in place, and the ideas for the production architecture • The motivations behind the design are explained as well as the solutions introduced as part of a larger scope approach to the modernisation of tools and techniques used for data storage and processing in Istat, envisioning the future challenges posed by the adoption of Big Data and Data Science in NSIs
  • 4. Data Size - Current Received and stored data from 6 chains in 6 provinces 30% of all stores available from Nielsen in these provinces 2 and a half years time span Space occupation of 1 year of microdata in the database 426 Million records 18Gb 550 Stores 210,000 Products
  • 5. Data Size - Expected Further 29 provinces will be received by the end of the year Estimated size of 1 year of microdata 47Gb 142Gb 6 chains only All available chains
  • 6. Actual occupation 2 years of microdata only 36Gb Current occupation of DB space 200Gb 100Gb now End 2015 29 provinces 6 chains ? Indexes, views, aggregations, classifications… 540Gb2016? EVERYTHING!
  • 7. The Problem with Size… • Query time is not satisfactory – Aggregated intermediate results in tables – IT support required – More space! • «Difficult» to extract data for analysis • Data growth is not predictable • DBAs do not guarantee proper backup when space occupation is over 500Gb
  • 8. Architectural Elements • Data ingestion – SFTP – Custom Java code • Data architecture – Relational DB • Tools – SAS, Excel… – Business Analytics Platform (MicroStrategy)
  • 9. Business Analyitics Platforms • Allow to access data stored in different data sources and represent it in a common multi-dimensional schema that is easier to query and navigate – Automatically aggregate measures at different levels of dimentionality • Normally used in enterprise contexts to facilitate management of large-heterogeneous data warehouses • Enable interactive analysis – Reports: results of queries in a tabular format that can be browsed and downloaded in Excel or CSV file – Dashboard: free navigation on data through the creation of interactive visualizations in drag-and-drop mode • First example of use in Istat with large data sizes
  • 10. Load Pre-process IT Architecture: Testing Phase SFTP Views Reports andVisualizations DB SAS Microstrategy Control Dashboard
  • 11. Load Pre-process Data Ingestion SFTP SAS Control Dashboard Data is sent by Nielsen in form of compressed text files via SFTP (secure channel) The SFTP server is located in Istat data center and it is protected by strict security policies
  • 12. Load Pre-process Data Ingestion SFTP SAS Control Dashboard Received data are handled by programs written in Java - Load: performs integrity checks on received files, loads data in the DB, logs received files and estimates discounts - Pre-process: performs quality checks at record level, discards dirty data The whole acquisition process is controlled by a web dashboard
  • 13. Load Pre-process Data Access and Analysis SFTP Views Reports andVisualizations DB SAS Microstrategy Data can be accessed in two ways: • Extraction from the DB • Materialized views were created in order to facilitate import in SAS • Use of a business analytics tool (MicroStrategy) for reporting, visualization and browsing of the data
  • 14. Load Pre-process Data Access and Analysis SFTP Views Reports andVisualizations DB SAS Microstrategy In both access modes the results of common interrogations were pre-computed at different levels of aggregations and were provided as views or reports This allowed to speed-up access time at the cost of additional space on the DB
  • 15. Preliminary Analysis • Analysis of quality of data – Anomalies – Distributions – Nulls – … • Exploratory analysis – Time series of turnover and quantity – Distribution per coicop group – …
  • 17. • Total quantity per week and province
  • 19. Production Data Platform • We are setting up a new data platform for the production architecture based on Big Data tools – 7 nodes Hadoop Cluster – Hadoop: parallel storage and processing platform, de-facto standard for Big Data • Features: – All historical data always online for interactive analysis – Possibility of retaining historical data indefinitely – Costruction of a global historical data warehouse of prices data – Database is used only for processing current, online data (computation of indexes) – Easier to perform large-scale analysis
  • 20. Ingestion IT Architecture: Production Phase SFTP Reports and Visualizations Control Dashboard SAS Processing of indexes Oracle DB Hadoop Extraction for offline analysis Enhanced data warehouse Sample selection Current data Historical data
  • 21. Conclusions • The scanner data project represents a challenging testbed for experimenting new approaches in the IT support to analysis and production • Objective is get faster results and more efficient processes • The concept of «Big Data» is not merely a matter of size but rather of new opportunities • Technology can give the answers, now it’s time to make new questions