20 Jan 2017, 19:46

Scaling Down Apache Kafka

Apache Kafka is designed to scale up to handle trillions of messages per day. Kafka is well known for it’s large scale deployments (LinkedIn, Netflix, Microsoft, Uber …) but it has an efficient implementation and can be configured to run surprisingly well on systems with limited resources for low throughput use cases as well. Here are the key settings you’ll need to change to get Kafka running on your low end VPS or Raspberry Pi:

Java heap settings

The default java heap sizes for zookeeper and kafka are 512Mb and 1Gb respectively. With appropriate configuration properties, you can take these down to as low as 4Mb and 32Mb:

KAFKA_HEAP_OPTS="-Xmx4M -Xms4M" bin/zookeeper-server-start etc/kafka/zookeeper.properties
KAFKA_HEAP_OPTS="-Xmx32M -Xms32M" bin/kafka-server-start etc/kafka/server.properties

Kafka config settings

The most important configuration setting to tweak is log.cleaner.dedupe.buffer.size. For Kafka > v0.9.0, this is set to 128Mb by default, pre-allocating 128Mb of Java heap space. You can reduce this to as low as ~11Mb. You could also disable log-compaction altogether by setting log.cleaner.enable to false but you won’t want to do this if you’re using Kafka to keep track of consumer offsets as these are stored in compacted topics.

Finally, there are other parameters that could potentially be tuned down as well, for example background.threads (threads have an associated memory cost), however I’ve never bothered. I’ve successfully deployed and run Kafka as part of a hobby project on a low-end VPS with just the above two changes.