When in doubt, batch requests

September 27, 2025 1 minute read

When in doubt, always try to batch your operations.

In the context of machine learning operations (mlops) I’ve learned that batching requests is almost always the better answer. Why? Mainly because network hops are expensive and inference is optimized for matrix math. If you must choose between 100 single requests or one batched request of similar size, choose the batched request. The batched request takes longer, but split over the number of requests in the batch the time per request is shorter.

Let me share two real examples:

Example 1 (Model state storage): We built an API that had to do a get_state() to a database somewhere else. We did this around 200 times per batched request, which is a lot. In the end we parallelized this operation but if we had built out a get_state_batch(), I’m sure it would be even faster. We did this with the set_state() which we had a set_state_batch() for and it was orders of magnitude faster.
Example 2 (Model inference): We built an API that did model inference and we took some engineering time to have this API operate on batched requests. This took some cross-team effort because of request multiplexing and load-balancing but in the end we managed to squeeze 100 requests together in a single big request, because of that we cut our machines down from 5 to 1 resulting in large cost savings. Total time per request was larger, but average latencies over the invididual requests were a lot shorter.

Share on

X Facebook LinkedIn Bluesky

Jan Meppe

When in doubt, batch requests

Share on

Comments

You May Also Enjoy

The Magic Secrets of System Design

How we made our CICD pipelines 30% faster by putting things in parallel

Visual Summary of The Mom Test

How to: Better writing in 3 simple steps using the Write-Streamline-Sexify framework

Jan Meppe

Subscribe

Share on

Comments

You May Also Enjoy

The Magic Secrets of System Design

How we made our CICD pipelines 30% faster by putting things in parallel

Visual Summary of The Mom Test

How to: Better writing in 3 simple steps using the Write-Streamline-Sexify framework