In this session, Neil Avery covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. You will learn about the various deployment models, how to track and monitor your KSQL applications, how to scale in and out and how to think about capacity planning. This is part 3 out of 3 in the Empowering Streams through KSQL series.
2. 2
Neil is a senior engineer and technologist at Confluent, the
company founded by the creators of Apache Kafka®. He has over
20 years of expertise of working on distributed computing,
messaging and stream processing. He has built or redesigned
commercial messaging platforms, distributed caching products as
well as developed large scale bespoke systems for tier-1 banks.
After a period at ThoughtWorks, he went on to build some of the
first distributed risk engines in financial services. In 2008 he
launched a startup that specialised in distributed data analytics and
visualization. Then prior to joining Confluent he was the CTO at a
fintech consultancy.
Neil Avery
Senior Engineer and Technologist, Confluent
3. 3
Housekeeping Items
● This session will last about an hour.
● This session will be recorded.
● You can submit your questions by entering them into the GoToWebinar panel.
● The last 10-15 minutes will consist of Q&A.
● The slides and recording will be available after the talk.
5. 5
First things first
• Getting KSQL binaries is easy
Download: https://www.confluent.io/download/
Confluent Open Source 4.1+ (free)
Confluent Enterprise 4.1+ (30-day trial)
• Links to downloads, docs, news, examples, etc.
https://confluent.io/ksql
• KSQL code is open source (Apache license)
https://github.com/confluentinc/ksql
6. 6
Deploying KSQL – Getting Started
• For development, e.g. on your laptop, use the Confluent CLI:
$ confluent start
• Starts up a full set of services:
Zookeeper & Kafka Broker
Schema Registry
KSQL Server
REST Proxy
Kafka Connect
Control Center
8. 8
Deploying KSQL – Starting KSQL Server
• KSQL Server acts as a Kafka client
Run it on nodes separate from the Kafka Brokers
• Provide a configuration file of settings
• From your installation directory:
$ bin/ksql-server-start config/ksql-server.properties
9. 9
KSQL Server Configuration
• The configuration file has only a few mandatory options:
bootstrap.servers – where to find the Kafka Broker(s)
listeners – ports on which to listen for connections from the KSQL CLI
• Optional:
ksql.service.id – a name to group together a pool of KSQL Servers
• Optionally, add any property the embedded Kafka consumers and producers or Kafka Streams API would understand
e.g. security configurations
• Example:
bootstrap.servers=broker1:9092
listeners=http://localhost:8088
ksql.streams.commit.interval.ms=1000
producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
10. 10
Connecting to a Secured Kafka cluster
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=
org.apache.kafka.common.security.plain.PlainLoginModule required
username="<name of the user KSQL should use>"
password="<the password>";
Exact settings you will need will vary depending on what SASL mechanism your
Kafka cluster is using and how your SSL certificates are signed. For full details,
please refer to the Security section of the Kafka documentation
<http://kafka.apache.org/documentation.html#security>
11. 11
Connecting to a Schema Registry (Optional)
• Add Schema Registry address in the same configuration file
• If your Schema Registry is secured, you will also need to set a KSQL_OPTS environment variable when starting KSQL
Server to specify the connection credentials
12. 12
Starting KSQL CLI
• KSQL CLI interfaces with a KSQL Server over HTTP
• Start by specifying the address of the target KSQL Server
13. 13
Starting KSQL (preview) web interface
• 1. Download https://s3.amazonaws.com/ksql-experimental-ui/ksql-experimental-ui-0.1.war
• 2. Copy the war into ksql/ui folder
• 3. Run ksql-server-start Start by specifying the address of the target KSQL Server
http://localhost:8080/index.html
15. 15
Log Files
• See config/log4j.properties or config/log4j-rolling.properties
16. 16
Patterns & Best Practices
• KSQL Server pools
– per team / project / use-case
• “headless” vs. “interactive”
$ bin/ksql-server-start config/my.properties - -queries-file /path/to/foo.sql
17. 17
Scaling
• KSQL Servers are Kafka clients
• Queries act as Consumer Groups
• Partitions are the limit of scale-out
18. 18
Scaling your data model
• Partitions – 1 topic has multiple queries – the number of partitions determines horizontal scale
• Queries performance (200k/second) (log-resilience using kafka)
• Partitions are the limit of scale-out -
19. 19
Scaling your data model
• Partitions are the limit of scale-out – over 30k per typical server
• Query throughput determined by serialization
• Latency considerations
20. 20
Scaling KSQL Server
• Using K8s & Docker (create pods of Server instances – deploy using an application.sql file)
• Monitor using Control Centre & Datadog & others
• Latency considerations
25. 25
KSQL-Server JMX metrics
$ export JMX_PORT=1099 && bin/ksql-server-start config/nicks-ksqlserver.properties
• Attach JConsole or tool of choice
• OR
• jstatd– run it on every host!
• Remotely connect and use Visualvm
33. 33
Drivers of Load and Throughput
• Messages
- message size (same as for any Kafka Client)
- message format (JSON is more expensive in CPU)
• Message (de)serialization is the most CPU-intensive aspect of any query
- in throughput testing, all queries are CPU-bound
- start with 4 cores minimum
• Use SSD if any joins or aggregations
• Relative resource demand: Query Type CPU RAM DISK
Project, filter n/a Medium None
Join n/a High Medium
Aggregate n/a High High