Here’s another lesson that I learned about building machine learning systems. For the full overview of the lessons, click here.
The lesson that I want to give you today is that you should be able to find back the machine learning runs that you did.
Doing runs and storing them in mlflow is not enough. Because this doesn’t mean you can find them back later.
I recently ran into this myself. To make an important decision on which model we wanted to continue developing I had to make an overview of all the models we previously trained. I thought we were good because we had mlflow, but in practice it turned out to be quite difficult to match mlflow runs to the actual experiments we performed.
It was difficult because we had so many different runs and different iterations and stored everything together in a single mlflow experiment. All in all, I wish I was a bit more disciplined while actually storing the runs so that it would save me time later.
So this is the trick that I want to give you: add your ticket number that you are working on as a parameter in mlflow. You are most likely using some form of Kanban where each ticket has some form of unique id, so why not use it? Next time, when you are browsing your old runs you can filter on each of these ids to pare down your search results. This makes linking old runs to actual experiments you did a lot easier because this work is often specified in the ticket.