Fast division by 2n-1

Michael Schmidt / 24 min read / 2026 Jan 25

I recently found this code snippet in the source code of pic-scale-safe:

Rust
fn div_round_by_1023(v: u32) -> u32 {
    let round = 1 << 9;
    let w = v + round;
    ((w >> 10) + w) >> 10
}

This function computes for inputs with just a few bit shifts and additions in 32-bit arithmetic. Quite efficient.

But that's not all. This trick only 21 bits for the intermediate results. Other approaches like the multiply-add method require 31 bits for intermediate results to perform the same rounded division by 1023 in the range . While not the case for division by 1023, in general this trick needs at most one additional bit, which can be the difference between being able to use 32-bit arithmetic or having to resort to 64-bit arithmetic. This is especially important in SIMD code and code the compiler is supposed to auto-vectorize.

As I hinted, this trick doesn't work just for division by 1023. In general, it works for any divisor of the form for inputs . Here is the generalized version:

Rust
fn div_round_by_2pn_m1(v: u32, n: u32) -> u32 {
    let round = 1 << (n - 1);
    let w = v + round;
    ((w >> n) + w) >> n
}

This function returns exactly for all inputs . For larger inputs, the results are typically close, but not exact. This makes it an approximation for rounded division by .

Unfortunately, it's not obvious at all why this approximation is exact for , and why it stops working at .

In this article, I will answer both questions and generalize the trick to (1) increase the input range and (2) support floor and ceil division as well.

Contents

Results

Before I start with the derivations and proofs, here is a summary of the results.

  1. can be approximated using:

    The approximation is exact for all inputs .

  2. can be approximated using:

    The approximation is exact for all inputs .

  3. can be approximated using:

    The approximation is exact for all inputs .

Note

These are theoretical results. In practice, integer overflow has to be carefully considered in order to determine the true range of inputs for which a particular implementation is exact.

The below tool determines bounds for correctness automatically based on the given settings.

Code Generation

I also implemented a little code gen tool that uses these results to generate Rust code. You can set , the iteration count, the rounding mode, and the integer type all operations will be performed in. Everything (except for the integer type) can also be made variable at runtime by checking the Parameter box.

n:Iterations:Rounding:Int type:

Rust
/// Returns `round(v / (2^n - 1))`.
///
/// Returned values are correct if both:
/// 1. the approximation is exact: v < 2^(2n) + 2^(n-1) - 1
/// 2. no overflow occurs:         v < 2^32 - 2^(n-1) - round(v/(2^n-1)).
fn div_round_by_2pn_m1(v: u32, n: u32) -> u32 {
    debug_assert!(n != 0, "Division by zero");
    let round = 1 << (n - 1);
    let w = v + round;
    ((w >> n) + w) >> n
}
Limitation

If cannot be represented by the chosen integer type, generated code may fail to compile or panic at runtime (if is a parameter). Such cases are mostly nonsensical anyway, so just avoid them.

Proving correctness

Formally, the approximation is defined as follows: Let , then:

Info

This looks more complicated than it really is. is just x >> n in code, and is the same as . So really, the approximation is just two nested rounded divisions (implemented as bit shifts in code). I will keep it in floor division form for the rest of the article, since it makes the proofs easier.

To capture how good the approximation is, I will introduce an error term defined as follows:

Consequently, the approximation is exact for an input iff .

Now, I will write a bit differently. Let for . While a bit unusual, it should be obvious that all integers can be uniquely expressed in this form. I choose this form because it has the nice property that .

Proof:

(Note: since is defined as .)

Info

All proofs that use this form of will make heavy use of the following properties:

This makes it possible to simplify the error term:

Now it is easy to show that .

All that is left to do, is to show that for all inputs , it holds that . This is done in two cases:

  1. : This corresponds to .

    Two cases:

    1. .
    2. .
  2. : This corresponds to .

    .

    Note: While this case algebraically includes negative values for , the bounds are implied by . So this cannot be taken as proof that the approximation works for negative inputs.

Taken together, this proves , which means that the approximation is exactly equal to rounded division by for those input.

Further, proving that the approximation is not exact for is easy. This number corresponds to , which results in a non-zero error term:

Therefore, is the smallest (non-negative) input for which the approximation is not exact.

Showing failure points

After proving that the approximation fails, I thought it might also be interesting to see it fail for a few values of . So here are the smallest inputs where the approximation starts to differ from the actual result for from 1 to 15:

nFirst non-exact input vv rewrittenApprox­imation
1434
21756
367910
42631719
510393334
641276566
716447129130
865663257258
9262399513514
10104908710251026
11419532720492050
121677926340974098
136711295981938194
142684436471638516386
1510737582073276932770

These numbers were found experimentally using brute-force search.

Extending the input range

Depending on the use case, an input range of might be too small. However, the trick can be extended to support larger inputs.

The main insight here is this equality:

Derived from:

This makes it possible to recursively rewrite the division by into a series of divisions by .

Let be the approximation of rounded division by after iterations, defined as follows:

The trick from above then corresponds to .

Let's see the smallest inputs the approximations start to fail for:

n
1
2
3
4
5
6-
7--
8--

These numbers were found experimentally using a brute-force search. I aborted the search when it took too long, so some entries are missing.

The pattern is very clear. It seems that is exact for all inputs . So the trick can be extended to support arbitrarily large input ranges by increasing the number of iterations.

In code, this is implemented as follows:

Rust
fn div_round_2pn_m1_iters(v: u32, n: u32, iters: u8) -> u32 {
    let round = 1 << (n - 1);
    let w = v + round;
    let mut r = w >> n; // R_1
    for _ in 1..iters {
        r = (r + w) >> n; // R_{i+1}
    }
    r
}

Proving correctness for extended ranges

Let be defined as before: . Further, let be the i-th iteration of the approximation as defined above and . As before, is exact for inputs iff .

I will start by simplifying the error term :

With the error term in a nicer form, it's now easy to show by induction that :

Since corresponds to , this shows that the approximations (for any number of iterations) are exact for this range. (Again, the bounds are implied by .)

To make things simpler going forward, I derived an explicit formula for by unrolling the recursion:

Proof for the explicit error term:

The correctness of the explicit error term formula can be shown using induction over :

  • Base case ():

  • Induction step (from to ):

Using the explicit formula for , it's easy to show that results in , making the approximation non-exact for that input. corresponds to . Plugging this into the explicit formula for gives:

Before I can prove the rest, I need split similar to how I split into . Let for and at iteration .

Plugging this into the explicit formula for gives:

This way of representing reveals that is just plus some small correction term that depends on and but not .

It's rather easy to show that is always the case, which implies that .

Proof:

Now I will determine the range for the two terms inside the floor function:

  • Range for the first term:

  • Range for the second term:

    Note that:

    • .
    • , so . Therefore, .

    Therefore:

From those two ranges, it follows that .

However, that's not all that can be shown about . There is more structure to it. This structure becomes obvious by looking at a few concrete values. So are the possible values of (with ) depending on :

valuesNote
0
1
2┃ always zero
3
4
5
6
......┃ always -1 or 0
14
15

The table shows that starts being always zero for and then switches to being either or for .

This pattern persists across all values for , but the switch occurs at different values . I will call the value where the switch occurs the critical value, denoted as . Here are a few values of I determined experimentally (missing values were too slow to compute):

n
11371531
2152185341
319735854681
41172734369-
51331057--

The pattern is .

Showing that is easy.

Proof:

Since has already been proven, I just have to show that the lower bound of the expression inside the floor function is at least . That expression has its minimum when is minimal () and is maximal (). Plugging in those values gives:

With this proven, 2 facts about are now known:

  1. .
  2. .

Since corresponds to and , it follows from (2) that . This range of corresponds to inputs . Together with the earlier result for , this proves that the approximation is exact for all inputs .

Other rounding modes

As it turns out, other rounding modes can also be implemented by making a small modification to the approximation. Similar to how was defined, and can be defined for floor and ceiling division, respectively:

where:

is exact for all inputs , and is exact for all inputs .

Proving correctness for floor division

The proof for floor division is very similar to the proof for rounded division. The main difference is that is split differently. Let with defined similarly to before. Let be the error term for floor division after iterations. Simplifying the error term gives:

Note that has the same recursive structure as (rounded division). Therefore, the rest of the proof follows the same steps as before, leading to the conclusion that is exact for all inputs . Writing out this proof will be left as an exercise to the reader.

Proving correctness for ceiling division

Same game again. Let with defined similarly to before. Let be the error term for ceiling division after iterations. Simplifying the error term gives:

Same as before, has the same recursive structure as (rounded division). Therefore, the rest of the proof follows the same steps as before, leading to the conclusion that is exact for all inputs . Writing out this proof will be left as an exercise to the reader.

Interesting extra: Division by

Similar to how:

something similar is also true for division by :

So, is turning one plus into a minus enough to make the trick work for division by ? No. Well, it is almost enough.

As it turns out, the number added to (called round in the code) has to be changed for floor and ceiling division. Furthermore, the evenness of the iteration count also matters.

Rust
enum RoundingMode {
    Floor,
    Round,
    Ceil,
}
fn div_2pn_p1(v: u32, n: u32, i: u32, mode: RoundingMode) -> u32 {
    let round = match mode {
        RoundingMode::Floor => 0,
        RoundingMode::Round => 1 << (n - 1),
        RoundingMode::Ceil => 1 << n,
    };
    let w = v + round - i % 2;
    let mut r = w >> n;
    for _ in 1..i {
        r = (w - r) >> n;
    }
    r
}

As before, here are tables for the smallest inputs the approximations start to fail for. First for rounded division by :

n
1
2
3
4-
5---
6----
7----
8-----

Second for ceiling division by :

n
1
2
3
4-
5---
6----
7----
8-----

And lastly for floor division by :

n
1
2
3
4-
5--
6--
7--
8---

Floor division fails at for odd iteration counts, because it tries to compute v - 1. This either (1) panics or (2) underflows to a huge number. If (3) the backing number type is signed, so that w = -1 can be represented, then the function returns -1, which is also incorrect. No matter what, it's wrong.

If is ignored, the table for floor division looks like this:

n
1
2
3
4-
5---
6----
7----
8-----

Better. However, this makes the trick less useful for floor division by , since it requires an even iteration count or special handling for .

In any case, this is all I have for division by . I haven't formally analyzed the error terms, so I can't explain why it works and why the weirdness around odd iteration counts exists.

For those inclined, this might be a fun exercise to spend the weekend on.