2. My Professional Timeline
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 2
2001 2003 2005 2007 2009 2011 2013 2015
Master
degree in
Software
Engineering
Business
Intelligence
Specialist
Business
Consultant
Delivery Manager
Methodology
Industrialization of
the delivery phase
University of Rome
« La Sapienza »
Project
Manager
Datamat S.p.A.
a Finmeccanica
company
Sopra Steria Group
Consulting – IT Services – Software Solutions
3. BI Trends
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 3
Data
Integration
Descriptive
Predictive
Prescriptive
Deep
learning
Business
Value
Business
Intelligence
Data
Warehouse
Simulation &
forecasting
Optimization &
automation
Semantic &
AI
Time
Digital transformation of every market
Data explosion: exponential growth of digital data
4. Disruptive scenario
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 4
Innovative
technologies
•Internet of
Things
•Big Data
•Distributed
computing
•In Memory
systems
•Cloud
•Mobile
Complex
architectures
•Data
federation
•Data store
•No SQL
•Distributed
file system
•Appliances
•Real-time data
integration
Business
transformations
•Frenetic time-
to-market
•API / service
economy
•Data-driven
company
•Business
process
automation
… more … … more … … more …
5. Business
Design
Build
Business
Desing
Build
New processes ? Roles ?
Waterfall process Iterative process
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 5
Business
Analyst
Engineer
Technician
Data
Scientist
Business
Analyst
Engineer
Technician
6. Project Layers for Data Mart
Business
•Dimensional Fact Model
Design
•Relational model
Build
•DBMS specific DDL
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 6
7. Why Dimensional Fact Model ?
Formal language well-specified syntax and an unequivocally
interpretation (semantic) based on a sound algebraic definition
Simple and effective graphical notation (representation)
Does not imply any technical/implementation choice
Specifically designed to represent multi-dimensional models
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 7
1
2
3
4
8. Multi-dimensional model
The SALES event:
On Nov. 25th, 2014
the Store 2 sold 10
pieces of Product X
for a total revenue of
€ 220
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 8
Product
Store
Day
Product X
Store 2
Store 1
Store 3
Product Y
Units sold: 10 pieces
Revenue: € 220
Product Z
3-dimensional SALES hyper-space
10. Data Mart building process
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 10
Business user’s needs
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Data Mart
Deployment
Implementation
strategy
Technical knowledge
11. Data Mart building process
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 11
Business user’s needs
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Data Mart
Deployment
Implementation
strategy
Technical knowledge
12. Business - From requisite to DFM
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 12
• Context: weblog analytics - the
analysis of the visits of several
web sites belonging to different
domains (eg. Google Analytics)
• Requisite: monitoring and
analyzing the number of visits
and their monthly and daily
average duration for each page
of the websites, or each domain,
distributed by the geographic
region of the IP of the visitors.
12
Domain definition
Aggregation rules
Optional dependencies
+
13. Design choice
•Star-schema (denormalized dimension table)
•Snow-flake (hierarchies implemented by tables in 3NF)
Reference ROLAP model:
•Use natural key (the dimension attribute PK column)
•Use surrogate key (add a new column with no business meaning)
•Use slow-changing dimension (SCD) of type 2
•Use implicit dimension (no dimension table, only a column in the fact table)
Hierarchy implementation strategy (for every dimension)
•Text VARCHAR(250) ; Currency NUMBER(9,2) ; etc.
Domain Data type association
•Table name prefix (D for Dimensions, F for Facts) ; Number NBR ; etc.
Standard naming conventions and abbreviations
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 13
14. Transform DFM in a Relational Model
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 14
Model
transformation
Fact grain
Technical design choices:
• Reference ROLAP model star-schema
• Hierarchy Viewer use surrogate key
• Hierarchy Page SCD – Type 2
• Hierarchy Time denormalized with natural key
Surrogate key
SCD-2
Start date
End date
14
15. Build choice
•Microsoft SqlServer – Oracle DBMS – SAP Hana– Apache Hive / Hadoop
Choice the DBMS
•Generate unique keys / primary keys / integrity constraints (foreign keys)
Generate constraints?
•Add clustered indexes / column-store indexes / bitmap indexes / etc.
Add specific indexes
•Organize fact tables in partitions (by hash, value, range, etc.)
Define table partitions
•Define file groups / tablespaces for tables, partitions, indexes
Distribute data over multiple volumes
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 15
16. In-Memory Computing Engine
Session management
Request Processing / Execution Control
Transaction
Manager
Metadata
Manager
SQL Parser
SQL ScriptCalc. Engine
MDX
Relational Engines
Row Store Column Store
Persistence LayerPage Management Logger
Disk Storage
Authorization
Manager
Data Volumes Log Volumes
SAP HANA Architecture
Row tables
versus
Column tables
Partitioning by
HASH, RANGE,
ROUNDROBIN
Use extended
tables for
warm data
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 16
17. Phisical model and DDL
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 17
Implementation choices & best practice:
• DBMS SAP HANA
• All tables are Column-tables
• Fact F_VISITS partitioned by HASH on DAY
• Fact F_VISITS indexed by PAGE
Partition by HASH
BTREE index
17
Unload priority for memory optimization
Create a column table
Preload columns for
performance optimization
18. BI Modeler
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 18
• In order to apply a model-driven approach, BI Project teams
need a software tool to:
Manage (draw) all the models - DFM, relational, etc.
Support (and drive) the model transformation process
• There was (are) no many tools able to do that so, in 2006 I
started working on the development of …
http://bimodeler.com
19. DEMO
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 19
Create a
DFM from
scratch
Define the fact schema and its measures
Add some dimensions / hierarchies
Define and associate domains to attributes and measures
Transform a
DFM in a
relational
data model
Define an implementation strategy for Hierarchies
Associate Data type to domains
Apply a naming convention
Add physical
properties
to the
relational
model
Choose a DBMS
Create partitions
Create indexes
Generate DDL script