Comparison of Reporting architectures

Comparison of Various Analytics & Reporting Architectures
A.Rajendran, CEO, Team Business Solutions (rajen@teamcomputers.com)

1. Objective
Generating analytics & reports from the Transaction System

2. Problems in Analytics & Reporting Today
High Loads on Servers/Storage and Time Delay: While generating reports/analytics,
the most challenging issues are a) Response Times, b) Intensity of Load on the CPU,
Memory and Storage

These lead to a need for extensive increase in the Capacity / Configuration of the servers
and network bandwidth, making the Cost per Report to become very high.

Lack of Flexibility: In addition to this, the flexibility offered to users, to choose the
required columns, with filters of choice combinations, is highly limited. This leads to
inability to explore all possible scenarios and understand them clearly, to take informed
business decisions.

3. Factors Causing the above Problems
SQL Limitations

1. Join conditions in all the Queries lead to Cartesian Product Effect (Cross Joins),
restricted by conditions chosen to varying degrees, creating varying number of
RESULTANT records to handle. The load generated also depends on the varying
degree of Indexes created on the join fields.
This creates two major problems: a) Compute Load on the Server requiring large
CPU and memory usage, b) Bloating of data – either in memory or in storage
depending on the data being kept transient or permanent.

2. Filter conditions in all the queries lead to execution of CONDITIONAL
EXPRESSIONS on every record that is in the PRODUCT of JOIN.
This leads to a large compute load, proportional to the number of
RESULTANT records.

3. Group By conditions in all the queries leads to execution of AGGREGATION
functions.
This leads to a large compute load again, varying with the extent of Indexing
done on the fields used for GROUP BY.

4. Order By functions in all the queries make the system heavily use SORTING
algorithms, which need SORT BUFFER sizes to be very large.
This leads to a huge impact on Memory usage and Compute Load.

These problems become manifold when we do more ad hoc reporting.

4. Various Solutions for the Problems Above
Over time, various solutions have been devised to overcome the above problems: some
of them focusing on shifting this problem to a different time than the reporting time, while
others have really made a difference in reducing the problems significantly.

The following are the various distinct approaches used in these solutions:

1. Direct Queries from Live Transaction System (say, Crystal Reports / Jasper etc)
2. Direct Queries from Mirror of Transaction System (Crystal Reports / Jasper etc):
3. Direct Queries from Data Warehouse
4. OLAP over Mirror of Transaction System
5. OLAP over Data Warehouse
6. Direct Queries from QlikView (QV)

Details of a Typical Sample Transaction System:
Database Size – 1 TB, Tables – 100

In the above diagram 6 different methods are shown to obtain reports & analytics, and
each of these approaches is discussed separately in the next sections.

1. Direct Queries from Live Transaction System (say, Crystal Reports / Jasper etc)

By running the query on the live transaction system, every time a report is requested, we
can generate the reports & analyses. However, this being a live system, the repeated
report runs can increase the load on the system and the time taken to get the output is
high. Also the response of the system is considerably brought down for the regular
transaction users.

Potentially Slow Transaction and Report Response

2. Direct Queries from Mirror of Transaction System (Crystal Reports / Jasper etc):

In this method a copy of the transaction system is created. The copy system does not
only hold copy of the live data but also holds all the historical data. Running the query to
fetch reports & analyses on this copy system will take a considerable amount of time as
the system is already loaded with a huge amount of ever increasing historical data.
However, the actual transaction users would be freed from the sluggish response in
option 1.
Potentially Slow report response

3. Direct Queries from Data Warehouse

When a data warehouse is created, only the master tables are saved as-is and summary
level information is stored for all the transaction tables.

The summarization will have to go through the joins, group by and order by leading to
problems highlighted above.
Potentially Slow summarization and report response

4. OLAP over Mirror of Transaction System

By creating an OLAP cube and fetching the data from the copy of the transaction system.
In this method we use the traditional OLAP technology wherein the query is pre-
aggregated. This however, limits the flexibility of choosing dimensions and measures in
any report, making Ad Hoc reporting very cumbersome (every new combination requiring
a new CUBE to be made).
Potentially No flexibility and Ad hoc reporting

5. OLAP over Data Warehouse

This approach will reduce the DWH load due to Joins, Group by and Order By, generated
repeatedly at the time Reports are run by different users at different times. However, this
load is shifted to the CUBE refresh phase, where the OLAP engine pre-calculates all the
chosen Measures across the various predefined permutations of Multiple Dimension
Values.

It is interesting to note that, not all the various permutations would be used by users any
time, leading to a wasteful effort of exhaustively calculating all permutations and
combinations.
Potentially Time Consuming and High Cost

6. Direct Queries from QlikView (QV)

QV extracts the data from the transaction system and saves the extracted data in raw
QVD files – which are compressed up to 90% of the original database size by way of
normalization. This is the copy of the complete database.
Join by, group by and Order by queries are run on these QVD file eliminating the load on
the live transaction system hence the response time is quick.

QV is architected on the In-Memory technology where the calculations are post facto &
not pre aggregated providing the end user the flexibility of ad-hoc, dynamic, flexible
reporting option with user friendly GUI and reducing the overall all time for report creation.

Potentially Quick response, Flexible and Cost Effective

Comparison of Reporting architectures

More Related Content

What's hot

Similar to Comparison of Reporting architectures

Recently uploaded

Comparison of Reporting architectures