More Related Content Similar to Yertl v2 granada (20) Yertl v2 granada1. Ad-Hoc OLAP databases with Yertl and HANA
Radek Kotowicz (radoslaw.kotowicz@sap.com)
http://blogs.perl.org/users/radek_kotowicz
http://www.ariba.com/about/sap-ariba
2. © 2013 Ariba - an SAP company. All rights reserved. 2Public
OLTP
3. © 2013 Ariba - an SAP company. All rights reserved. 3Public
OLAP schema
4. © 2013 Ariba - an SAP company. All rights reserved. 4Public
Difference
OLTP OLAP
Normalization De-normalization
5. © 2013 Ariba - an SAP company. All rights reserved. 5Public
Consequences
OLTP OLAP
• Enforced integrity
• Easily extensible design
• Flexible analytics
• Slow for analytical queries
operating on large result
sets
Lack or loose integrity
Simple queries
Less flexible/extensible analytics
Performant analytics
Interface for operating on hyper-cubes
7. © 2013 Ariba - an SAP company. All rights reserved. 7Public
Goal
1. Get a DB where analytical queries can be executed without impacting
transaction database
2. No setup
3. Need an analytical tool for non-tech users
4. Report needs to be quick
5. Dataloading time not crucial
8. © 2013 Ariba - an SAP company. All rights reserved. 8Public
RDBMS
JDBCJDBC
OLAP
cubes
JRuby
mondrian-olap
gem
MDX
A possible scenario
9. © 2013 Ariba - an SAP company. All rights reserved. 9Public
RDBMS
JDBCXMLA/
HTTP
OLAP
cubes
Mondrian-XML-A-
Consumer
wxWidgets
More Perl-aware …
10. © 2013 Ariba - an SAP company. All rights reserved. 10Public
RDBMS
JDBCXMLA/
HTTP
OLAP
cubes
Mondrian-XML-A-
Consumer
wxWidgets
Still not Perl-centric
Do I want to
build a pivot
table engine
from scratch?
SQL level filtering
too weak
No persistence
11. © 2013 Ariba - an SAP company. All rights reserved. 11Public
RDBMS
DBI/ODBCODBO
OLAP
cubesExcel
ETL::Yertl
Perl as an ETL tool + HANA DB + Excel for presentation
12. © 2013 Ariba - an SAP company. All rights reserved. 12Public
The main concepts of HANA
On current CPUs, we can
expect to process 1 MB per ms and with parallel processing
on 16 cores more than 10MB per ms. To put this into con-
text, to look for a single dimension compressed in 4 bytes,
we can scan 2.5 million tuples for qualification in 1 ms
13. © 2013 Ariba - an SAP company. All rights reserved. 13Public
Compression
Column data is of uniform type; therefore, there are some opportunities for storage
size optimizations available in column-oriented data that are not available in row-
oriented data.
14. © 2013 Ariba - an SAP company. All rights reserved. 14Public
Data Loading
15. © 2013 Ariba - an SAP company. All rights reserved. 15Public
Yertl
• yfrom - Build YAML from another format (like JSON or CSV)
• ygrok - Build YAML by parsing lines of plain text
• ysql - Query SQL databases in a Yertl workflow
• ymask - Mask a data structure to display only the desired fields
• yq - Filter YAML through a command-line program
• yto - Change YAML to another format (like JSON)
EXTRACT
FILTER /
TRANSFORM
LOAD
17. © 2013 Ariba - an SAP company. All rights reserved. 17Public
How it works?
YAML
YAML
[ DBI ]
[ DBI ]
18. © 2013 Ariba - an SAP company. All rights reserved. 18Public
Get_recent_auctions.sql (source query file)
19. © 2013 Ariba - an SAP company. All rights reserved. 19Public
insert_auctions.hsql (target query file)
20. © 2013 Ariba - an SAP company. All rights reserved. 20Public
For complex filtering
22. © 2013 Ariba - an SAP company. All rights reserved. 22Public
HCP architecture
23. © 2013 Ariba - an SAP company. All rights reserved. 23Public
What about performance of such ETL process?
If you kick-off a load the data with a single ysql into a trial HANA instance you
probably won't get a speed above 20-60k rows per hour…
24. © 2013 Ariba - an SAP company. All rights reserved. 24Public
What about performance of such ETL process?
If you kick-off a load the data with a single ysql into a trial HANA instance you
probably won't get a speed above 20-60k rows per hour… but we need to bear
in mind that:
1. Yertl runs DMLs one by one
2. Auto-commits
3. You're loading within one ODBC connection that is routed through one
TLS tunnel
4. There are some constraints imposed on the connections in the trial
instance
25. © 2013 Ariba - an SAP company. All rights reserved. 25Public
If that’s still too slow
• Data Services
• HANA studio import
26. © 2013 Ariba - an SAP company. All rights reserved. 26Public
HANA studio import
27. © 2013 Ariba - an SAP company. All rights reserved. 27Public
HANA studio import
29. © 2013 Ariba - an SAP company. All rights reserved. 29Public
Defining views
• Attribute views (dimensions) – typically modelling entities such as product, user,
commodity etc
• Analytical views – facts surrounded by dimensions with some defined aggregates
• Calculation views – extension of analytical views e.g. for multi-fact reporting
34. © 2013 Ariba - an SAP company. All rights reserved. 34Public
„Good” ETL tool
35. © 2013 Ariba - an SAP company. All rights reserved. 35Public
References
• A Common Database Approach for OLTP and
OLAP Using an In-Memory Column Database,
Hasso Plattner Institute for IT Systems
Engineering University of Potsdam
• SAP HANA Essentials eBook, Jeffrey Word
http://saphanabook.com/
• CPAN: https://metacpan.org/release/ETL-Yertl
• Perl Blogs:
• http://blogs.perl.org/users/preaction/2015/01/man
aging-sql-data-with-yertl.html
• http://blogs.perl.org/users/radek_kotowicz/2015/0
8/moving-data-around-with-yertl-over-odbc-to-
hana.html
36. © 2013 Ariba - an SAP company. All rights reserved. 36Public
Q & A