01 Oct 2017, 17:00

confluent-kafka-go vs sarama benchmarks

There are two popular golang clients for Kafka - confluent-kafka-go and sarama (by Shopify). Recently I came across some speculation that Sarama is faster than the Confluent client because of cgo related overhead. In fact, the situation is the opposite - the Confluent client is much faster than Sarama. I didn’t find any direct performance comparison of the two, so decided to whip one up.

I just tested with a single kafka broker locally on my laptop. The easiest way to achive this is to use the Confluent CLI (available in Confluent Open Source version 3.3.0 and later):

confluent start

confluent-kafka-go benchmarks

{
    "Brokers": "localhost",
    "Topic": "test",
    "GroupID": "testgroup",
    "PerfMsgCount": 1000000,
    "PerfMsgSize": 100,
    "Config": ["api.version.request=true"]
}

Benchmark | ProduceFuncDR: 128446 (12.250 Mb/s), 116746 (11.134 Mb/s), ProduceFunc: 70625 (6.735 Mb/s), 240971 (22.981 Mb/s), 195318 (18.627 Mb/s)

18 Mar 2017, 17:00

Low/High Watermark Offsets

Writing this down because I keep on forgetting where to look it up - the lowest and highest available offsets for a specific Kafka topic / partition can be determined from the command line as follows:

Highest Available Kafka Topic/Partition Offset:

From the Confluent Platform directory:

./bin/kafka-run-class kafka.tools.GetOffsetShell --broker-list <host>:<port> --topic <topic-name> --partition <partition-number> --time -1

Lowest Available Kafka Topic/Partition Offset:

./bin/kafka-run-class kafka.tools.GetOffsetShell --broker-list <host>:<port> --topic <topic-name> --partition <partition-number> --time -2

Omit the --partition parameter to retrieve offsets for all partitions of the topic.

20 Jan 2017, 14:00

Scaling Down Apache Kafka

Apache Kafka is designed to scale up to handle trillions of messages per day. Kafka is well known for it’s large scale deployments (LinkedIn, Netflix, Microsoft, Uber …) but it has an efficient implementation and can be configured to run surprisingly well on systems with limited resources for low throughput use cases as well. Here are the key settings you’ll need to change to get Kafka running on your low end VPS or Raspberry Pi:

Java heap settings

The default java heap sizes for zookeeper and kafka are 512Mb and 1Gb respectively. With appropriate configuration properties, you can take these down to as low as 4Mb and 32Mb:

KAFKA_HEAP_OPTS="-Xmx4M -Xms4M" bin/zookeeper-server-start etc/kafka/zookeeper.properties
KAFKA_HEAP_OPTS="-Xmx32M -Xms32M" bin/kafka-server-start etc/kafka/server.properties

Kafka config settings

The most important configuration setting to tweak is log.cleaner.dedupe.buffer.size. For Kafka > v0.9.0, this is set to 128Mb by default, pre-allocating 128Mb of Java heap space. You can reduce this to as low as ~11Mb. You could also disable log-compaction altogether by setting log.cleaner.enable to false but you won’t want to do this if you’re using Kafka to keep track of consumer offsets as these are stored in compacted topics.

Finally, there are other parameters that could potentially be tuned down as well, for example background.threads (threads have an associated memory cost), however I’ve never bothered. I’ve successfully deployed and run Kafka as part of a hobby project on a low-end VPS with just the above two changes.