Overview
We have been using the the nats bench CLI tool for benchmarking the performance of NATS Jetstream over different variables.
In our use-case, we require 100K ephemeral parallel consumers subscribed to the stream. The reason for going with jetstream and ephemeral consumer is that, we are streaming real-time data to our browser application and it should support the capability of message replay in order to get the latest state for all the subscribed subjects before loading up.
After running the benchmarks we saw publisher throughput is decreasing rapidly when the number of parallel consumers is increased.
We are running the below command to test run the benchmarks for 10KB payloads and 100K message count. The below command is supposed to create <sub> # of ephemeral consumers and <pub> # parallel publishers
nats bench bar --js --pub <pub> --sub <sub> --size 10000 --msgs 100000 --maxbytes=50GB --no-progress --purge
Results
Here is the output for different values of pub and sub:Instance: AWS EC2 m5.2xlarge (8 core CPU, 32 GB RAM)
pub: 1, sub 1: Publisher Throughput: 19082.0 msgs/secpub: 1, sub 10: Publisher Throughput: 8432.67 msgs/seccpub: 1, sub 100: Publisher Throughput: 1047.33 msgs/secpub: 1, sub 1000: Publisher Throughput: 83.33 msgs/sec
How to replicate the test
- install nats-cli
- install nats-server
- run in nats server in jetstream mode
nats-server -c server.conf --js
- run the below nats bench commands.
nats bench bar --js --pub 1 --sub 1 --size 10000 --msgs 100_000 --maxbytes=50GB --no-progress --purgenats bench bar --js --pub 1 --sub 10 --size 10000 --msgs 100_000 --maxbytes=50GB --no-progress --purgenats bench bar --js --pub 1 --sub 100 --size 10000 --msgs 10_000 --maxbytes=50GB --no-progress --purgenats bench bar --js --pub 1 --sub 1000 --size 10000 --msgs 1000 --maxbytes=50GB --no-progress --purge
Things we tried and our observations
- Horizontal scaling with 3 server and 5 server clusters with instance type AWS m5.2xlarge (8 core CPU, 32 GB RAM). Saw Similar publisher throughput, no significant improvement.
- Vertical scaling upto m5.8xlarge (32 core CPU, 128 GB RAM). Single server, 3 server and 5 server configuration. No significant improvement. The CPU was underutilised barely reaching 25 percent even when 10K consumers.
- Updating the --consumerbatch and --pubbatch in nats bench options to 1000 (default is 100). No significant improvement.
- Enabling multisubject --multisubject. No significant improvement.
- Having a 5 server cluster and making the publish call only to leader server and connect consumers only to remaining 4 non-leader servers. No significant improvement.
- In all the above cases, from the data we could correlate that the publisher throughput is matching the subscriber throughput of individual consumers.
- Using NATS core publish instead of js publish. This improved the performance drastically, however we want to understand is there any drawback on using NATS core publish over NATS Jetstream publish.
Expected behaviour
The publisher throughput not be affected significantly on increasing the number of parallel consumers.
Questions
- The publisher throughput is decreasing rapidly on increasing parallel consumers. Is this behaviour expected in jetstream? What is the reason behind this? Is there a way to increase jetstream publish throughput?
- What is the maximum number of parallel ephemeral consumers per stream and per server that is supported in jetstream?
- Is there any drawback on using NATS core publish over NATS Jetstream publish? As per my understanding as long as the subject is part of a jetstream, it will be retained, persisted and replicated in both scenarios.
- Why is the performance of NATS core publish not affected but NATS Jetstream publish affected on increasing number of consumers?
- Are there any limitations with nats bench tool that is affecting the results?