5. ❖ Network-bound (WiFi, VPN)
❖ Benefits from compression
❖ CPU-bound, but many cores
❖ Benefits from multiprocessing
Person with laptop Server in datacenter
6. PyEXASOL
❖ Websocket protocol
❖ Python 3.6+
❖ HTTP transport, compression
❖ Multiprocessing
❖ Feature-rich
❖ For analysts with laptops
❖ For parallel machine learning
11. How it looks in Exasol
EXPORT my_table INTO CSV
AT 'http://27.1.0.30:33601' FILE '000.csv'
AT 'http://27.1.0.31:41733' FILE '001.csv'
AT 'http://27.1.0.32:45014' FILE '002.csv'
AT 'http://27.1.0.33:42071' FILE '003.csv'
AT 'http://27.1.0.34:36669' FILE '004.csv'
AT 'http://27.1.0.35:36794' FILE '005.csv'
12. Other features
❖ Multi-host connection strings
❖ SQL formatter
❖ SSL encryption
❖ Local config (.ini file)
❖ Metadata requests
❖ Profiling of last query
❖ Built-in UDF script output server
15. UDF Framework
❖ Run anything in Exasol
❖ In massively parallel manner
❖ Very flexible
❖ Great for IMPORT / EXPORT
❖ But:
➢ Requires some skill
➢ Lack of practical manual
16. Parallel import from MySQL
❖ MariaDB JDBC driver
❖ Protocol compression
❖ ~ 200 MySQL hosts
❖ ~ 376,400 shards
❖ Up to 800 parallel threads
❖ 500,000,000 rows in 10m
17. Hadoop ORC support
❖ Columnar format
❖ Native Java API
❖ VectorizedRowBatch (fast!)
❖ Exasol can pre-sort data
❖ Optional bloom filters
❖ Many custom settings
20. Memory pitfall
Exasol DB RAM Cluster OS UDF
Pre-allocated during startup
Set in ExaOperations: DB RAM, Huge Pages
Required for
OS
Tiny!
❖ Always monitor memory usage
❖ Exasol may lose parallelism if >4Gb RAM used per node
❖ Increase this limit if you need it (contact Customer Support)
21. Long transaction pitfall
Process 2
Process 1
Process 3
Process 4
❖ Some groups are slow
❖ Not obvious
❖ Hard to profile
❖ Holding locks
❖ Holding resources
❖ QUERY_TIMEOUT helps
22. Gists
❖ Exasol UDF: Import from MySQL
❖ Exasol UDF: Import from Hadoop ORC
❖ Exasol UDF: Export to Hadoop ORC
Icons from the Noun Project
Thank you!