1 minute read

I refactored a piece of code but I’m not sure if I made it better.

This is the code before. It grabs an object from s3, unpacks it, and passes it through a csv reader turning it into a list of dicts.

def read_csv_data(self, bucket_name, key):
    s3_object = self._s3_resource.object(bucket_name, key)
    data = s3_object.get()['body'].read().decode('utf-8').splitlines()
    return [*csv.dictreader(data)]

This is the code after. The main thing I did was pull out the decoding in a separate function and added some type hints (because type hints are love).

from typing import List, Dict

def decode_s3_object(self, s3_object) -> List[str]: 
    return s3_object.get()['body'].read().decode('utf-8').splitlines()

def read_csv_data(self, bucket_name: str, key: str) -> List[Dict[str, str]]:
    s3_object = self._s3_resource.object(bucket_name, key)
    data = self.decode_s3_object(s3_object)
    return [*csv.dictreader(data)]

Had trouble mocking. The reason I pulled out this function was because I had trouble mocking the data variable. At some point I got so annoyed at myself I just said “Screw it I’ll put it in a separate function.”

Short functions are easier to test. With short functions you end up writing more tests because the tests are easier to write. And the tests are easier to write because your functions are short. In that sense, short functions and tests form a positive feedback loop together. I found it much easier to write two tests for the second version than a single test for the first version because in the second case I could mock the output of decode_s3_object.

Anyway, what do you think? Did the code improve by pulling out this function?