1 minute read

Recently I’ve been using a lot of pytest fixtures instead of a tests/data folder and my life has been much better ever since!

What are fixtures?

From the pytest docs

Software test fixtures initialize test functions. They provide a fixed baseline so that tests execute reliably and produce consistent, repeatable, results. These are accessed by test functions through arguments; for each fixture used by a test function there is typically a parameter (named after the fixture) in the test function’s definition.

For example

import pytest
import pandas as pd

def my_df():
    return pd.DataFrame({"foo": [1,2,3]})

def test_df(my_df): # <= my_df now refers to the fixture 
    assert do_something(my_df) 

This is what happens when you don’t have to think about your test data…

❌Don’t: Use test data

Without fixtures you don’t really have to think about your test data, so without proper guardrails it becomes a mindless append-only fiesta. Look at this (very real) test data folder over 1gb!

And this is your repository on fixtures:

✅ Do: Use pytest fixtures

With fixtures you keep test data in code, and that means that you have to be more intentional about it. You have to think about what is the smallest piece of data that you need for your test to pass. By virtue of thinking this through you create lightweight test structures that make everything else that it touches also light.

def raw_dataset_df():
    data = {
        "pupilid": [1, 2],
        "exerciseid": [3, 4],
        "correct": [1, 0],
    yield pd.DataFrame(data=data)


The simple act of using pytest fixtures over a folder with test data has made our repositories significantly lighter (in terms of raw files) and our code significantly better. I highly suggest you try it out!