1 minute read

Here’s a quick piece of code that I’ve been using repeatedly in my projects. Currently, I’m working on storing large amounts of data efficiently, dealing with millions of records that need to fit into packets under 4KB. This makes compression and optimized storage crucial. Since I’ve found this technique to be quite handy, I thought it’d be helpful to share it with you all, hoping you find it useful as well.

Imagine you have a list of integers lst = [1,2,3.., 100, 2, 3, 100] in Python that you need to compress, but you also know that this list has a maximum value (100 in this case).

What you can then do, if you want to compress and store this array, is the following: pack it into bytes, compress it with zlib, and then base64 encode it.

class CappedIntBlob(List[int]):
    """Capped integer Base64 Large Object aka CappedIntBlob.
    
    Represents a list of 2-byte unsigned short integers, capped at max_val

    When converted to str, will convert all integers into 2-byte values and then base64 encode them after compressing with zlib.

    The constructor accepts either a base64 string or an iterable of integers.
    """

    def __init__(self, contents: Union[str, Iterable[int]], max_val: int = 10):
        """Constructor
        @param contents: the contents, either a list of ints or a base64 string with 2-byte unsigned short integer representation.
        """
        self.max = max_val
        lst: Iterable[int]
        if isinstance(contents, str):
            # If input is a string, decode and decompress it
            bs: bytes = zlib.decompress(base64.b64decode(contents))
            # < = little endian, H = unsigned short with 2-byte size integers
            lst = struct.unpack("<" + "H" * int(len(bs) / 2), bs)
        elif isinstance(contents, Iterable):
            lst = [min(int(x), self.max) for x in contents]
        else:
            raise ValueError("Expecting a string or an iterable for contents")

        super().__init__(lst)

    def __str__(self):
        """Convert to string
        @return: base64 string encoding for list of ints in 2 byte unsigned short integer representation.
        """
        return base64.b64encode(zlib.compress(struct.pack("<" + "H" * len(self), *self))).decode("utf-8")

And this is how you use it

>>> CappedIntBlob([1,2,3])
[1, 2, 3]

>>> str(CappedIntBlob([1,2,3]))
"eJxjZGBiYGYAAAAaAAc="

>>> CappedIntBlob("eJxjZGBiYGYAAAAaAAc=")
[1, 2, 3]

assert str(CappedIntBlob([5, 10, 20])) == str(CappedIntBlob([5, 100, 200])) # true

https://app.bannerbear.com/projects/POobgvMNDkxzxAYW70/templates/3g8zka5Y2OlaDEJXBY

https://www.photopea.com/

Subscribe

Comments