Spark tunning in Apache Kylin

Spark performance tuning
for Apache Kylin
Shaofeng Shi

Background
• Kylin 2.0 starts to use Spark as the Cube build engine
• Has been proved can improve 2x to 3x build performance
• Need to have Spark tuning experience.
• Kylin 2.5 will move more jobs onto Spark
• Convert to HFile (KYLIN-3427)
• Merge segments (KYLIN-3441)
• Merge dictionaries on YARN （KYLIN-3471)
• Fact distinct columns in Spark (KYLIN-3442)
• In the future, Spark engine will replace MR

Agenda
• Why Spark
• Spark on YARN Model
• Spark Executor Memory Model
• Executor/Driver memory/core configuration
• Dynamic Resource Allocation
• RDD Partitioning
• Shuffle
• Compression
• DFS Replication
• Deploy Modes
• Other Tips

Why Apache Spark
• Fast, memory centric distributed computing framework
• Flexible API
• Spark Core
• DataFrames, Datasets and SparkSQL
• Spark Streaming
• MLLib／SparkR
• Languages support
• Java, Scala, Python, R
• Deployment option:
• Standalone/YARN/Mesos/Kubernetes (Spark 2.3+)

Spark on YARN memory model
• Overhead memory
• JVM need memory to run
• By default: executor memory * 0.1, minimal 384 MB;
• Executor memory

Spark on YARN memory model (cont.)
• If you allocation 4GB to an executor, Spark will request:
• 4 * 0.1 + 4 = 4.4 GB as the container memory from YARN
• From our observation, the default factor (0.1) is a little small for Kylin,
executor is very likely be killed.
• Give 1GB or more as overhead memory
• spark.yarn.executor.memoryOverhead=1024
• From Kylin 2.5, default request 1GB for overhead.

Spark executor memory model
• Reserved memory
• 300MB, just for avoiding OOM
• Spark memory
• spark.memory.fraction=0.6
• For both storage/cache and
execution (shuffle, sort)
• spark.memory.storageFraction=0.5:
cache and execution half half.
• User memory
• The left is for user code execution

Spark executor memory model(cont.)
• An example:
• Given an executor 4GB memory, its max. storage/execution memory is:
• (4096 – 300) * 0.6 = 2.27GB
• If the executor need run computation (need sorting/shuffling), the space for
RDD cache can be shrined to:
• 2.27GB * 0.5 = 1.13 GB
• User memory:
• (4096 – 300) * 0.4 = 1.52 GB
• When you have big dictionaries, consider to allocate more to user
memory

Executor memory/core configuration
• Check how much memory/core available in your Hadoop cluster
• To maximize the resource utilization, use the similar ratio for Spark.
• For example, a cluster has 100 cores and 500 GB memory. You can allocate 1
core, 5GB (1 GB for overhead, 4GB for executor) for each executor instance.
• If you use multiple cores in one executor, increase the memory
accordingly
• e.g., 2 core + 10 GB per instance.
• No more than 40GB mem / instance

Driver memory configuration
• Kylin does not collect data to driver, you can configure less resource
for driver
• spark.driver.memory=2g
• spark.driver.cores=1

More instances less core, or less instance
more cores?
• Spark active task number = instance * (cores / instance)
• Both can get similar parallelism
• If use more cores in one executor, tasks can share references in the
same JVM
• Share big objects like dictionaries
• If with Spark Dynamic resource allocation, 1 core per instance.

Dynamic resource allocation
• Dynamic allocation can improve resource utilization
• Not enabled by default

Dynamic resource allocation
• Static allocation does not fit for Kylin.
• Cubing is by layer; Each layer’s size is different
• Workload is unbalanced: small -> mediate -> big -> extreme big -> small -> tiny
• DRA is highly recommended.
• With DRA enabled, 1 executor has 1 core.

RDD partitioning
• RDD Partition is similar as File Split in MapReduce;
• Spark prefers to many & small partitions, instead of less & big partition
• Kylin splits partition by estimated file size (after aggregation), by default 1
partition per 10 MB:
• kylin.engine.spark.rdd-partition-cut-mb=10
• The real size may vary as the estimation might be inaccurate
• This may affect the performance greatly!
• Min/max partition cap:
• kylin.engine.spark.min-partition=1
• kylin.engine.spark.max-partition=5000

Partition number is important
• When partition number is less than normal
• Less parallelism, low resource utilization ratio
• Executor OOM (especially when use "mapPartition ”)
• When partition number is much more than normal
• Shuffle is slow
• Many small fraction generated
• Pay attention if you observe a job has > 1000 partitions

Partition number can be wild in certain case
• If your cube has Count Distinct or TopN measures, the estimated size may
be far bigger than actual, causing too many partitions.
• Tune the parameter manually, at Cube level, according to the actual Cuboid
file size:
• kylin.engine.spark.rdd-partition-cut-mb=100
• Or, reduce the max. partition number:
• kylin.engine.spark.max-partition=500
• KYLIN-3453 Make the size estimation more accurate
• KYLIN-3472 TopN in Spark is slow

Shuffle
• Spark shuffle is similar as MapReduce
• Partition mapper’s output and send the partition only to its reducer;
• Reducer buffers data in memory, sort, aggregate and then reduce.
• But with difference
• Spark sorts the data on map side, but doesn’t merge them on reduce side;
• If user need the data be sorted, call “sortByKey”or similar, Spark will re-sort
the data. The re-sort doesn’t aware map’s output is already sorted.
• The sorting is in memory, spill if memory is full

Shuffle (cont.)
• Shuffle spill
• Spill memory = (executorMemory – 300M) * memory.fractor * (1 –
memory.StorageFraction)
• Spilled files won’t be merged, until data be request, merging on the fly
• If you need data be sorted, Spark is slower than MR.
• SPARK-2926 tries to introduce MR-style merge sort.
• Kylin’s“Convert to HFile” step need the value being sorted. Spark may
spend 2x time on this step than MR.

Compression
• Compression can significantly reduce IO
• By default Kylin enabled compression for MR in `conf/kylin_job_conf.xml`,
but not for Spark
• If your Hadoop did not enable compression, you may see 2X sized file
generated when switch from MR to Spark engine
• Manually enable compression with adding:
• kylin.engine.spark-
conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true
• kylin.engine.spark-
conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.
hadoop.io.compress.DefaultCodec
• Kylin 2.5 will enable compression by default.

Compression (cont.)
• 40% performance improvement + 50% disk saving
No compression vs compression (Merge segments on Spark)

DFS replication
• Kylin keeps 2 replication for intermediate files, configurated in
`kylin_job_conf.xml` and `kylin_hive_conf.xml`
• But this does not work for Spark
• Manually add:
• kylin.engine.spark-conf.spark.hadoop.dfs.replication=2
• Save 1/3 disk space
• Kylin 2.5 will enable this by default.

Deployment modes
• Spark on YARN has two deploy modes
• Cluster: driver runs inside app master
• Client: driver runs in client process
• When dev/debugging, use `client` mode;
• Start fast, with detailed log message printed on console
• Will occupy client node memory
• In production deployment, use `cluster` mode.
• Kylin 2.5 will use `cluster` mode by default

Other tips
• Pre-upload YARN archive
• Avoid uploading big files repeatedly
• Accelerate job startup
• Run Spark history server for trouble shooting
• Identify bottleneck much easier
• https://kylin.apache.org/docs/tutorial/cube_spark.html

Recommended configurations (Kylin 2.2-2.4,
Spark 2.1)
• kylin.engine.spark-conf.spark.submit.deployMode=cluster
• kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
• kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1
• kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000
• kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
• kylin.engine.spark-conf.spark.driver.memory=2G
• kylin.engine.spark-conf.spark.executor.memory=4G
• kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
• kylin.engine.spark-conf.spark.executor.cores=1
• kylin.engine.spark-conf.spark.network.timeout=600
• kylin.engine.spark-conf.spark.yarn.archive=hdfs://nameservice/kylin/spark/spark-libs.jar
• kylin.engine.spark-conf.spark.shuffle.service.enabled=true
• kylin.engine.spark-conf.spark.hadoop.dfs.replication=2
• kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true
• kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec
• kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec

Key takeaway
• Kylin will move more jobs to Spark
• Master Spark tuning will help you run Kylin better
• Kylin aims to provide an out-of-box user experience of Spark, like MR.

We are hiring
Apache Kylin
dev@kylin.apach
e.org
Kyligence Inc
info@kyligence.io

Spark tunning in Apache Kylin

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spark tunning in Apache Kylin

Similar to Spark tunning in Apache Kylin (20)

Recently uploaded

Recently uploaded (20)

Spark tunning in Apache Kylin

Editor's Notes