2. Data Lake with SAP HANA Cloud
SAP HANA Cloud, data lake is one of the components that make up SAP HANA Cloud. It’s
composed of two different components, the Data Lake IQ and Data Lake Files. The Data Lake
IQ enables efficient storage and high-performance analytics of relational data at petabyte
scale. It leverages the technology of SAP IQ.
With SAP HANA Cloud, data lake, you can ingest data from multiple sources, as well as non-
SAP cloud storage providers, at high speed. It’s an integrated part of SAP HANA Cloud,
providing common security mechanisms, tenancy models, and tools operating within SAP
HANA Cloud.
SAP HANA Cloud, data lake is built to be scalable and accommodate increases in data
volume, in user count, and the complexity of workloads
2
3. Components
The two main components of SAP HANA Cloud, data lake are:
◦ Data Lake, IQ: Data Lake, IQ is an efficient disk-optimized relational store, based on SAP IQ on-
premise. It’s enabled by default when you provision a data lake instance, whether it’s
standalone or managed by an SAP HANA database instance within SAP HANA Cloud.
◦ Data Lake Files: Data Lake Files service provides a secure, managed object storage to host
structured, semi-structured and unstructured data files. You can query files in a relational
format stored in data lake files by using the Data Lake’s SQL on Files feature. This allows you to
analyze the data with a low-cost strategy, given that this data has an unknown value. It is also
easy to share this data with other processing tools. It’s enabled by default when you provision
a data lake instance, whether it’s a standalone or managed by an SAP HANA database
instance.
3
4. Provisioning Options
SAP HANA Cloud, data lake can be provisioned and used in two different ways:
◦ Managed data lake: the data lake is provisioned as part of the SAP HANA Cloud, SAP HANA
database provisioning. A remote connection between the SAP HANA database and the Data
Lake, IQ is then automatically created. The easiest way to access the data in a managed data
lake is to use SAP HANA virtual tables using the SAP HANA Database Explorer. You can,
however, also access the data lake independently.
◦ Standalone data lake: the data lake is provisioned independently of any other SAP HANA
Cloud services, and therefore it is not automatically connected to any other SAP HANA Cloud
instances you might have. You can access your data within the data lake with SAP HANA
Database Explorer, dbisql, isql, or any of the supported data lake client interfaces.
4
6. Costing Factors
Parameter Description
Relational Engine Compute The number of vCPUs dedicated to data lake Relational Engine (memory included)
SQL on Files File Scan The amount of file data scanned by queries during a specific month
File Access API Calls The number of read, write, and metadata requests made against data lake files storage
A starting estimate is 4,000,000 API calls for each TB of file storage
Relational Engine Storage Data stored in data lake Relational Engine format to facilitate OLAP analysis
File Storage Files stored in data lake Files storage
Backup Storage The backup storage space of data lake Relational Engine
Network Data Transfer The amount of network traffic generated by reading from the system
6
7. Sizing Factors
Coordinator:
◦ 8 GB of memory per vCPU on coordinator node.
◦ For larger databases with a higher number of transactions (DDL/DML), setting a higher value for coordinator node is recommended
◦ minimum value is 2 vCPUs and the maximum is 64 vCPUs
Workers:
◦ Larger worker nodes improve single user performance (scale-up), while more workers nodes provide higher concurrency (scale-out).
◦ minimum number of workers nodes is 1 and the maximum is 10. Minimum value of vCPUs per worker node is 2 vCPUs and the maximum
is 64 vCPUs
Storage:
◦ Minimum value is 1 TB (AWS/GCP) or 4 TB (Azure). The maximum value is 90 TB.
◦ After creating the data lake Relational Engine database, you can increase the amount of storage but cannot decrease it.
7
8. Costing Estimate
Costing based on capacity units (CUs)
Ball park estimation for 6 vCPUs and 4 TB relational engine
storage up to 3700 CUs on CPEA model is close to ~4000
AUD/month without discounts
Need to properly estimate sizing based on our use cases
8