longlong.h – support functions for multi-word arithmetic¶
Auxiliary asm macros¶
-
umul_ppmm
(high_prod, low_prod, multipler, multiplicand)¶ Multiplies two single limb integers
MULTIPLER
andMULTIPLICAND
, and generates a two limb product inHIGH_PROD
andLOW_PROD
.
-
smul_ppmm
(high_prod, low_prod, multipler, multiplicand)¶ As for
umul_ppmm()
but the numbers are signed.
-
udiv_qrnnd
(quotient, remainder, high_numerator, low_numerator, denominator)¶ Divides an unsigned integer, composed by the limb integers
HIGH_NUMERATOR
and\LOW_NUMERATOR
, byDENOMINATOR
and places the quotient inQUOTIENT
and the remainder inREMAINDER
.HIGH_NUMERATOR
must be less thanDENOMINATOR
for correct operation.
-
sdiv_qrnnd
(quotient, remainder, high_numerator, low_numerator, denominator)¶ As for
udiv_qrnnd()
but the numbers are signed. The quotient is rounded towards \(0\). Note that as the quotient is signed it must lie in the range \([-2^63, 2^63)\).
-
count_leading_zeros
(count, x)¶ Counts the number of zero-bits from the msb to the first non-zero bit in the limb
x
. This is the number of stepsx
needs to be shifted left to set the msb. Ifx
is \(0\) then count is undefined.
-
count_trailing_zeros
(count, x)¶ As for
count_leading_zeros()
, but counts from the least significant end. Ifx
is zero then count is undefined.
-
add_ssaaaa
(high_sum, low_sum, high_addend_1, low_addend_1, high_addend_2, low_addend_2)¶ Adds two limb integers, composed by
HIGH_ADDEND_1
andLOW_ADDEND_1
, and\HIGH_ADDEND_2
andLOW_ADDEND_2
, respectively. The result is placed inHIGH_SUM
andLOW_SUM
. Overflow, i.e.carry out, is not stored anywhere, and is lost.
-
add_sssaaaaaa
(high_sum, mid_sum, low_sum, high_addend_1, mid_addend_1, low_addend_1, high_addend_2, mid_addend_2, low_addend_2)¶ Adds two three limb integers. Carry out is lost.
-
sub_ddmmss
(high_difference, low_difference, high_minuend, low_minuend, high_subtrahend, low_subtrahend)¶ Subtracts two limb integers, composed by
HIGH_MINUEND_1
andLOW_MINUEND_1
, andHIGH_SUBTRAHEND_2
andLOW_SUBTRAHEND_2
, respectively. The result is placed in\HIGH_DIFFERENCE
andLOW_DIFFERENCE
. Overflow, i.e.carry out is not stored anywhere, and is lost.
-
sub_dddmmmsss
(high_diff, mid_diff, low_diff, high_minuend_1, mid_minuend_1, low_minuend_1, high_subtrahend_2, mid_subtrahend_2, low_subtrahend_2)¶ Subtracts two three limb integers. Borrow out is lost.
-
byte_swap
(x)¶ Swap the order of the bytes in the word \(x\), i.e. most significant byte becomes least significant byte, etc.
-
invert_limb
(invxl, xl)¶ Computes an approximate inverse
invxl
of the limbxl
, with an implicit leading~`1`. More formally it computes:invxl = (B^2 - B*x - 1)/x = (B^2 - 1)/x - B
Note that \(x\) must be normalised, i.e.with msb set. This inverse makes use of the following theorem of Torbjorn Granlund and Peter Montgomery~citep[Lemma~8.1]{GraMon1994}:
Let \(d\) be normalised, \(d < B\), i.e.it fits in a word, and suppose that \(m d < B^2 \leq (m+1) d\). Let \(0 \leq n \leq B d - 1\). Write \(n = n_2 B + n_1 B/2 + n_0\) with \(n_1 = 0\) or \(1\) and \(n_0 < B/2\). Suppose \(q_1 B + q_0 = n_2 B + (n_2 + n_1) (m - B) + n_1 (d-B/2) + n_0\) and \(0 \leq q_0 < B\). Then \(0 \leq q_1 < B\) and \(0 \leq n - q_1 d < 2 d\).
In the theorem, \(m\) is the inverse of \(d\). If we let
m = invxl + B
and \(d = x\) we have \(m d = B^2 - 1 < B^2\) and \((m+1) x = B^2 + d - 1 \geq B^2\).The theorem is often applied as follows: note that \(n_0\) and \(n_1 (d-B/2)\) are both less than \(B/2\). Also note that \(n_1 (m-B) < B\). Thus the sum of all these terms contributes at most \(1\) to \(q_1\). We are left with \(n_2 B + n_2 (m-B)\). But note that \((m-B)\) is precisely our precomputed inverse
invxl
. If we write \(q_1 B + q_0 = n_2 B + n_2 (m-B)\), then from the theorem, we have \(0 \leq n - q_1 d < 3 d\), i.e.the quotient is out by at most \(2\) and is always either correct or too small.
-
udiv_qrnnd_preinv
(q, r, nh, nl, d, di)¶ As for
udiv_qrnnd()
but takes a precomputed inversedi
as computed byinvert_limb()
. The algorithm, in terms of the theorem above, is:nadj = n1*(d-B/2) + n0 xh, xl = (n2+n1)*(m-B) xh, xl += nadj + n2*B ( xh, xl = n2*B + (n2+n1)*(m-B) + n1*(d-B/2) + n0 ) _q1 = B - xh - 1 xh, xl = _q1*d + nh, nl - B*d = nh, nl - q1*d - d so that xh = 0 or -1 r = xl + xh*d where xh is 0 if q1 is off by 1, otherwise -1 q = xh - _q1 = xh + 1 + n2