I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.
Yeah, it feels intuitively like it should work, but even with full 64-bit modular multiply the bounds don't overlap. I should probably delete that sentence, but on the other hand maybe the wild goose chase will lead to someone finding a better trick. For now my double-precision multiply is just direct expansion.
The IEEE single-precision format with a hidden bit has replaced earlier single-precision formats, most of which had a 24-bit significand and a 7-bit exponent, chosen thus for alignment on byte boundaries, which simplified the emulation in software of the hardware floating-point units at a time when many computers lacked hardware FPUs.
When the idea of using a hidden bit freed one bit in the memory format, a decision was required, whether to use it for the exponent or for the significand.
The experience of using the older single-precision formats before 1980 indicated that the frequent overflows and underflows caused by a 7-bit exponent were more annoying than insufficient precision in the significand, so it was decided to use the hidden bit for increasing the exponent from 7 bits to 8 bits.
At that time, by 1980, the efficient implementation in hardware had become the main consideration in the design of the number formats, and not whether there are some tricks that could be useful for software emulation. The use of a hidden bit is inconvenient for software, which is why it had not been used earlier.
Some earlier formats had also used an 8-bit exponent and a hidden bit, but they differed from the IEEE format because the exponent was byte-aligned and the sign bit was stored over the hidden bit. This simplified the software emulation, but it had the disadvantage that comparing floating-point numbers was more complicated, as one could not compare the bytes in the same order as for integer numbers.
I appreciate this warning near the top: This post contains floating point. Floating point is known to the State of California to cause confusion and a fear response in mammalian bipeds.
I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.
Yeah, it feels intuitively like it should work, but even with full 64-bit modular multiply the bounds don't overlap. I should probably delete that sentence, but on the other hand maybe the wild goose chase will lead to someone finding a better trick. For now my double-precision multiply is just direct expansion.
Link to Mark Owen’s excellent QFP library for soft float on Cortex-M0+ (ARMv6-M) and Cortex-M3/M4 (ARMv7-M).
https://www.quinapalus.com/qfplib.html
Nice write up here, too, I like the idea of a firm float.
Since the trick works on mantissa only without hidden 1 included, I wonder if that number of bits in mantissa was chosen because of it
Not really.
The IEEE single-precision format with a hidden bit has replaced earlier single-precision formats, most of which had a 24-bit significand and a 7-bit exponent, chosen thus for alignment on byte boundaries, which simplified the emulation in software of the hardware floating-point units at a time when many computers lacked hardware FPUs.
When the idea of using a hidden bit freed one bit in the memory format, a decision was required, whether to use it for the exponent or for the significand.
The experience of using the older single-precision formats before 1980 indicated that the frequent overflows and underflows caused by a 7-bit exponent were more annoying than insufficient precision in the significand, so it was decided to use the hidden bit for increasing the exponent from 7 bits to 8 bits.
At that time, by 1980, the efficient implementation in hardware had become the main consideration in the design of the number formats, and not whether there are some tricks that could be useful for software emulation. The use of a hidden bit is inconvenient for software, which is why it had not been used earlier.
Some earlier formats had also used an 8-bit exponent and a hidden bit, but they differed from the IEEE format because the exponent was byte-aligned and the sign bit was stored over the hidden bit. This simplified the software emulation, but it had the disadvantage that comparing floating-point numbers was more complicated, as one could not compare the bytes in the same order as for integer numbers.
I appreciate this warning near the top: This post contains floating point. Floating point is known to the State of California to cause confusion and a fear response in mammalian bipeds.
I wisely hit the back button :)
I saw an explanation (perhaps Fabien's) which made it click - floating point is just segment:offset addressing for numbers!
Then the full horror of it hit me.