The memory leak in ConcurrentQueue
We ran into a memory issue recently in RavenDB, which had a pretty interesting root cause. Take a look at the following code and see if you can spot what is going on:ConcurrentQueue _buffers = new(); void FlushUntil(long maxTransactionId) { List toFlush = new(); while(_buffers.TryPeek(out buffer) && buffer.TransactionId
We ran into a memory issue recently in RavenDB, which had a pretty interesting root cause. Take a look at the following code and see if you can spot what is going on:
ConcurrentQueue<Buffer> _buffers = new();
void FlushUntil(long maxTransactionId)
{
List<Buffer> toFlush = new();
while(_buffers.TryPeek(out buffer) &&
buffer.TransactionId <= maxTransactionId)
{
if(_buffers.TryDequeue(out buffer))
{
toFlush.Add(buffer);
}
}
FlushToDisk(toFlush);
}
The code handles flushing data to disk based on the maximum transaction ID. Can you see the memory leak?
If we have a lot of load on the system, this will run just fine. The problem is when the load is over. If we stop writing new items to the system, it will keep a lot of stuff in memory, even though there is no reason for it to do so.
The reason for that is the call to TryPeek(). You can read the source directly, but the basic idea is that when you peek, you have to guard against concurrent TryTake(). If you are not careful, you may encounter something called a torn read.
Let’s try to explain it in detail. Suppose we store a large struct in the queue and call TryPeek() and TryTake() concurrently. The TryPeek() starts copying the struct to the caller at the same time that TryTake() does the same and zeros the value. So it is possible that TryPeek() would get an invalid value.
To handle that, if you are using TryPeek(), the queue will not zero out the values. This means that until that queue segment is completely full and a new one is generated, we’ll retain references to those buffers, leading to an interesting memory leak.