1 minute read

Some things people only learn the hard way, this is a lesson I learned the hard way by making this mistake one too many times.

Hereโ€™s a nice tip: Push all your data input output (IO) to the boundaries of your application. What this means is that you want all your reads and writes to be defined in the โ€œouterโ€ layer of your application. If you find a pd.read_csv() somewhere in the middle of your code of one of your embedding layers then something is wrong. What you should have done was define this IO operation at the top (near the entrypoint) and then create an object that represents that thing and pass that around.

This tip is very closely related to these two great blogs:

The basic gist is that there is a difference between impure (and policy) code and pure code and that we must separate the two. If you put your IO somewhere in our pure code your are mixing things that should not be mixed, like oil and water.