Integers are ubiquitous, both in Python code and the virtual machine.
However our representation of integers is somewhat clunky and inefficient.
There is an old technique for handling ints efficiently in dynamic language VMs, called tagged pointers.
A normal object pointer always has its low bits set to zero, as objects are 8 or 16 byte aligned in memory.
We can use those low bits as “tags” to denote the meaning of the high bits.
Historically, the low bit has been set to zero to indicate that the high bits are a pointer, and to 1 to indicate that the high bits are an integer.
We already use another tagging scheme within the interpreter and in the frame stack, the tag indicating whether the reference is borrowed.
This techniques is used in many older VMs and runtimes, including many lisps and Smalltalk.
It is also used in newer VMs, including Ruby.
It is not used in high-performance javascript VMs, because all numbers in JS are floats, so a different scheme is used called “Nan boxing”
Why do this?
Speed.
Python is notoriously slow at integer handling. While tagged integers are not as fast as C integers, they are at least an order of magnitude faster than the boxed integers that Python currently uses. The tags will add some small cost to non-integer operations, but very little as most operations already know whether they are operating on an int or another type.
How would we implement this?
Fortunately, there is no need to implement this all at once, everywhere.
We would add it to the interpreter first, using tagged ints to support faster iteration and arithmetic.
Then we would add it to a few core objects that make heavy use of ints, such as range
and enumerate
.
Thereafter, we would broaden the use of tagged pointers, adding support for them to builtin collections like list
, tuple
and dict
.
Finally, we would add them to the C API so that third-party code can avoid the overhead of tagged pointer ↔ PyObject *
conversions.
Tagging schemes
The exact tagging scheme we use is likely to change and be a result of experimentation and experience, but here is a possible scheme