Speed in and of itself can be a feature, see ruff for example. Here’s a list of tricks that I’ve found useful to speed up your code.
- Profile. The first rule of profiling is to never trust your gut. The computer works in mysterious ways and the bottleneck might not be where you think it is. Profile first and really measure what is slow.
- Don’t do it (elimination). The best code is code that doesn’t exist. Speed up your code by removing things. Are there calculations or steps that you can skip?
- Do it once (caching). Can we speed things up by doing them once? Can we remove duplicate work? Is there work that you can do and then cache/remember the result and reuse that?
- Do it once 2 (pre-computing). Can we speed things up by doing some calculations before we start the routine on disk as opposed to caching in memory?
- Do it efficiently (vectorisation/parallelization). Can we do things more efficiently? Can we exploit matrix structures instead of doing things on single rows? Can we do steps at the same time that we are doing sequentially now?
- Do it less accurately (quantization). Can we be less accurate? Can we quantize the model? Can we reduce precision from a float32 to float16 or float8? Can we sacrifice some accuracy for speed? Are there other knobs and trade-offs that can speed things up like reducing some other parameters.