Enum vs Flag for bitmasks in Python
Ever wondered what’s behind those funny looking Unix access permission values like 600
✋, 777
⚠, or 666
😈? Well, they are
octal representations, where each of the three rightmost digits represents a different part of the permissions for owner, group, and others.
Each of these digits is created from the sum of its component bits. As a result, specific bits add to the sum as it is represented by a numeral:
- The read bit adds 4 to its total (in binary
100
), - The write bit adds 2 to its total (in binary
010
), and - The execute bit adds 1 to its total (in binary
001
).
These values never produce ambiguous combinations and therefore each sum represents a unique set of permissions. Let’s take 600
as an example. The first digit 6
represents the permissions for the owner. In binary it is 110
and the only way to create it is from the sum of component bits above by adding 010
(write access) to 100
(read access). In other words, the owner has permission to read and write the file, but not to execute it.
Using this system for permissions is optimized for data storage. But also turning permissions on (or off) can be done efficiently in what is known as bit-masking.
Bitmasks
You see, to turn certain bits on, the OR
operator can be used. This operator follows the rule that X OR 1 = 1
and X OR 0 = X
. If from the example above we now also want the owner to have executive permission on the file, we simply use OR
with 1 (execute value) on the current permission value: 6 OR 1 = 7
, or in binary:
110
OR
001
= 111
We can also turn off certain bits by using the AND
operator, which follows the rule that X AND 1 = X
and X AND 0 = 0
. To turn off a bit we use it with 0
. Interestingly we can leave the bit as it was when using 1
.
Turning bits on and off is also referred to as “masking on” and “masking off”.
Using this very old technique to store permission parts is very cool! But are there other use-cases where it makes sense? There are still today a ton of them, although they have become unpopular due to much more efficient hardware allowing the use of a set or dictionary to store the representations that a bit would otherwise hold.
Filtering using bit masking
In Bit I needed a way to store the supported features for each of the numerous objects representing network APIs. Supported features could range from understanding bech32 addresses or segwit transactions. Instead of using a dictionary to map each API to its supported set of features, or store a set of features on each API, I chose to use a system similar to the Unix access permission values.
Making use of bit masking allows for fast filtering of APIs depending on specific features. This can be achieved by using a single AND
operator. We just use a filter value Y
that represents the features we want to filter for, and the value X
representing the supported features of some API object. Calculating X AND Y
gives another value Z
. But recall that each bit in Z
will either be 0 if the bit in Y
was 0, or it will be the bit of X
itself.
Can you see the subtle implication from this? If the n’th bit of our filtering value Y
was 0 then the n’th bit of the result Z
will be 0. There is no loss of information because we didn’t care if the bit value in X
signaled support for this feature or not. But if the n’th bit of Y
was 1 then we also want the n’th bit of Z
to be 1. This is only possible if the corresponding bit in X
was 1, hence signaling support for this feature. We can thus compare the result Z
to Y
itself. If X AND Y == Y
, then and only then will the value of X
signal for (at least) all the features we are filtering for with the value Y
.
Let’s take a quick example using Unix permission values again. Let’s say we have a list of files with permission values and want to filter them to keep only those that we are allowed to read, which is the value 4 or in binary 100
. One of the files may have a permission value of 5, which in binary is 101
and thus corresponds to read and execute permission.
A filter to see if at least the read permission was set on this file would be checked as follows:
101
AND
100
= 100
We see that the result on the last line is exactly the same value as our filter. This is just as we expected and we would therefore keep it.
If a file gives us permission to write and execute, but not to read, it is represented as a 3 or in binary 011
. Applying our filter:
011
AND
100
= 000
The result is different from our filter and we therefore immediately know that the filter was not satisfied.
The real beauty comes from applying complex filters with multiple bits turned on. If we would want to filter for e.g. read and execute permissions, the filter would be 101
and just as before we would check if the final AND
result would equal this value.
A> The value Z
resulting from using the AND
operator on a file’s permission and filter value is the intersection of the permissions in the file and the filter.
Bitmask filtering in Python
Using Python 3.6+
Since Python 3.6 a class Flag
exists for bit mask operations.
>>> from enum import IntFlag, auto
>>>
>>> class Permissions(IntFlag):
... EXECUTE = auto()
... WRITE = auto()
... READ = auto()
...
>>> class PermissionSet:
... def __init__(self, *flags):
... self._permission = Permissions(0) # Initiate no permissions
... for flag in flags:
... self._permission |= Permissions[flag.upper()]
... def __contains__(self, item):
... return (self._permission & item._permission) == item._permission
We can use this PermissionSet
class to store the permissions of a file as follows:
>>> class File:
... def __init__(self, name, *, permission):
... self.name = name
... self._permissions = PermissionSet(*permission)
Let’s use it in action!
>>> # A filter:
>>> permission_filter = PermissionSet("read", "execute")
>>>
>>> # The dummy files:
>>> files = []
>>> files.append(File("file_rw", permission=("read", "write"))) # Should be excluded from filter
>>> files.append(File("file_re", permission=("read", "execute"))) # Should be included from filter
>>> files.append(File("file_w", permission=("read", "execute"))) # Should be included from filter
>>> files.append(File("file_we", permission=("write", "execute"))) # Should be excluded from filter
>>> files.append(File("file_rwe", permission=("read", "write", "execute"))) # Should be included from filter
>>>
>>> # Filter the files:
>>> for f in filter(lambda f: permission_filter in f._permissions, files):
... print(f.name)
...
'file_re'
'file_w'
'file_rwe'
Using Python 3.5
However, also before Python 3.6 this could be achieved using the IntEnum
class, however with manual adjustment of the underlying permission values.
>>> from enum import IntEnum
>>>
>>> class Permissions(IntEnum):
... EXECUTE = 1 << 0
... WRITE = 1 << 1
... READ = 1 << 2
...
>>> class PermissionSet:
... def __init__(self, *flags):
... self._permission = 0 # Initiate no permissions
... for flag in flags:
... self._permission |= Permissions[flag.upper()]
... def __contains__(self, item):
... return (self._permission & item._permission) == item._permission
We can then use it in the exact same way we already did above in the example for IntFlag
.
What do you think about this technique? Is it elegant? Or over-engineering? Let me know in the comment below.