1 minute read

Hereโ€™s a quick piece of code that Iโ€™ve been using repeatedly in my projects. Currently, Iโ€™m working on storing large amounts of data efficiently, dealing with millions of records that need to fit into packets under 4KB. This makes compression and optimized storage crucial. Since Iโ€™ve found this technique to be quite handy, I thought itโ€™d be helpful to share it with you all, hoping you find it useful as well.

Imagine you have a list of integers lst = [1,2,3.., 100, 2, 3, 100] in Python that you need to compress, but you also know that this list has a maximum value (100 in this case).

What you can then do, if you want to compress and store this array, is the following: pack it into bytes, compress it with zlib, and then base64 encode it.

class CappedIntBlob(List[int]):
    """Capped integer Base64 Large Object aka CappedIntBlob.
    
    Represents a list of 2-byte unsigned short integers, capped at max_val

    When converted to str, will convert all integers into 2-byte values and then base64 encode them after compressing with zlib.

    The constructor accepts either a base64 string or an iterable of integers.
    """

    def __init__(self, contents: Union[str, Iterable[int]], max_val: int = 10):
        """Constructor
        @param contents: the contents, either a list of ints or a base64 string with 2-byte unsigned short integer representation.
        """
        self.max = max_val
        lst: Iterable[int]
        if isinstance(contents, str):
            # If input is a string, decode and decompress it
            bs: bytes = zlib.decompress(base64.b64decode(contents))
            # < = little endian, H = unsigned short with 2-byte size integers
            lst = struct.unpack("<" + "H" * int(len(bs) / 2), bs)
        elif isinstance(contents, Iterable):
            lst = [min(int(x), self.max) for x in contents]
        else:
            raise ValueError("Expecting a string or an iterable for contents")

        super().__init__(lst)

    def __str__(self):
        """Convert to string
        @return: base64 string encoding for list of ints in 2 byte unsigned short integer representation.
        """
        return base64.b64encode(zlib.compress(struct.pack("<" + "H" * len(self), *self))).decode("utf-8")

And this is how you use it

>>> CappedIntBlob([1,2,3])
[1, 2, 3]

>>> str(CappedIntBlob([1,2,3]))
"eJxjZGBiYGYAAAAaAAc="

>>> CappedIntBlob("eJxjZGBiYGYAAAAaAAc=")
[1, 2, 3]

assert str(CappedIntBlob([5, 10, 20])) == str(CappedIntBlob([5, 100, 200])) # true

https://app.bannerbear.com/projects/POobgvMNDkxzxAYW70/templates/3g8zka5Y2OlaDEJXBY

https://www.photopea.com/

Subscribe

Comments