6. Using non-replicated MergeTree tables and duplicate data through Distributed table - we
call it "poor man's replication".
When you use it, you have to do many work by yourself - recovery, control of consistency,
etc.
I recommend to use real replication (Replicated) tables for almost all cases. But there are
notable exceptions:
- if you really hate ZooKeeper or maybe you afraid to have any piece of Java code in your
infrastructure;
- if you already have some data processing pipeline with other databases, that are already
replicated "by hand" and you want to just integrate ClickHouse in the same way;
- if you want your replicas to be as much independent as possible;
- if you want solution that is conceptually as simple as possible.
Replicated tables are not slower at insertion than plain MergeTree, if you measure
throughput with enough batch size. ZooKeeper synchronization only contributes to latency.
But for INSERTs, additional latency up to hundreds of ms usually doesn't matter.
Also Replicated tables are more heavy: for plain MergeTree tables ClickHouse will tolerate
many thousands of tables per server, and for ReplicatedMergeTree it's difficult (but usually
you should have few big tables).
⽼老老⼤大说了了,⽤用复制表就对了了