Tip: Use pytest fixtures instead of test data
Recently I’ve been using a lot of pytest fixtures instead of a tests/data
folder and my life has been much better ever since!
What are fixtures?
From the pytest docs
Software test fixtures initialize test functions. They provide a fixed baseline so that tests execute reliably and produce consistent, repeatable, results. These are accessed by test functions through arguments; for each fixture used by a test function there is typically a parameter (named after the fixture) in the test function’s definition.
For example
import pytest
import pandas as pd
@pytest.fixture
def my_df():
return pd.DataFrame({"foo": [1,2,3]})
def test_df(my_df): # <= my_df now refers to the fixture
assert do_something(my_df)
This is what happens when you don’t have to think about your test data…
❌Don’t: Use test data
Without fixtures you don’t really have to think about your test data, so without proper guardrails it becomes a mindless append-only fiesta. Look at this (very real) test data folder over 1gb!
And this is your repository on fixtures:
âś… Do: Use pytest fixtures
With fixtures you keep test data in code, and that means that you have to be more intentional about it. You have to think about what is the smallest piece of data that you need for your test to pass. By virtue of thinking this through you create lightweight test structures that make everything else that it touches also light.
@pytest.fixture
def raw_dataset_df():
data = {
"pupilid": [1, 2],
"exerciseid": [3, 4],
"correct": [1, 0],
}
yield pd.DataFrame(data=data)
Conclusion
The simple act of using pytest fixtures over a folder with test data has made our repositories significantly lighter (in terms of raw files) and our code significantly better. I highly suggest you try it out!
Comments