fft.h – SchoenhageStrassen FFT¶
Split/combine FFT coefficients¶

mp_size_t fft_split_limbs(mp_limb_t **poly, mp_srcptr limbs, mp_size_t total_limbs, mp_size_t coeff_limbs, mp_size_t output_limbs)¶
Split an integer
(limbs, total_limbs)
into coefficients of lengthcoeff_limbs
limbs and store as the coefficients ofpoly
which are assumed to have space foroutput_limbs + 1
limbs per coefficient. The coefficients of the polynomial do not need to be zeroed before calling this function, however the number of coefficients written is returned by the function and any coefficients beyond this point are not touched.

mp_size_t fft_split_bits(mp_limb_t **poly, mp_srcptr limbs, mp_size_t total_limbs, flint_bitcnt_t bits, mp_size_t output_limbs)¶
Split an integer
(limbs, total_limbs)
into coefficients of the given number ofbits
and store as the coefficients ofpoly
which are assumed to have space foroutput_limbs + 1
limbs per coefficient. The coefficients of the polynomial do not need to be zeroed before calling this function, however the number of coefficients written is returned by the function and any coefficients beyond this point are not touched.

void fft_combine_limbs(mp_limb_t *res, mp_limb_t **poly, slong length, mp_size_t coeff_limbs, mp_size_t output_limbs, mp_size_t total_limbs)¶
Evaluate the polynomial
poly
of the givenlength
atB^coeff_limbs
, whereB = 2^FLINT_BITS
, and add the result to the integer(res, total_limbs)
throwing away any bits that exceed the given number of limbs. The polynomial coefficients are assumed to have at leastoutput_limbs
limbs each, however any additional limbs are ignored.If the integer is initially zero the result will just be the evaluation of the polynomial.

void fft_combine_bits(mp_limb_t *res, mp_limb_t **poly, slong length, flint_bitcnt_t bits, mp_size_t output_limbs, mp_size_t total_limbs)¶
Evaluate the polynomial
poly
of the givenlength
at2^bits
and add the result to the integer(res, total_limbs)
throwing away any bits that exceed the given number of limbs. The polynomial coefficients are assumed to have at leastoutput_limbs
limbs each, however any additional limbs are ignored. If the integer is initially zero the result will just be the evaluation of the polynomial.
Test helper functions¶

void fermat_to_mpz(mpz_t m, mp_limb_t *i, mp_size_t limbs)¶
Convert the Fermat number
(i, limbs)
moduloB^limbs + 1
to anmpz_t m
. Assumesm
has been initialised. This function is used only in test code.
Arithmetic modulo a generalised Fermat number¶

void mpn_negmod_2expp1(mp_limb_t *z, const mp_limb_t *a, mp_size_t limbs)¶
Set
z
to the negation of the Fermat number \(a\) moduloB^limbs + 1
. The inputa
is expected to be fully reduced, and the output is fully reduced. Aliasing is permitted.

void mpn_addmod_2expp1_1(mp_limb_t *r, mp_size_t limbs, mp_limb_signed_t c)¶
Adds the signed limb
c
to the generalised Fermat numberr
moduloB^limbs + 1
. The compiler should be able to inline this for the case that there is no overflow from the first limb.

void mpn_normmod_2expp1(mp_limb_t *t, mp_size_t limbs)¶
Given
t
a signed integer oflimbs + 1
limbs in two’s complement format, reducet
to the corresponding value modulo the generalised Fermat numberB^limbs + 1
, whereB = 2^FLINT_BITS
.

void mpn_mul_2expmod_2expp1(mp_limb_t *t, mp_limb_t *i1, mp_size_t limbs, flint_bitcnt_t d)¶
Given
i1
a signed integer oflimbs + 1
limbs in two’s complement format reduced moduloB^limbs + 1
up to some overflow, computet = i1*2^d
modulo \(p\). The result will not necessarily be fully reduced. The number of bitsd
must be nonnegative and less thanFLINT_BITS
. Aliasing is permitted.

void mpn_div_2expmod_2expp1(mp_limb_t *t, mp_limb_t *i1, mp_size_t limbs, flint_bitcnt_t d)¶
Given
i1
a signed integer oflimbs + 1
limbs in two’s complement format reduced moduloB^limbs + 1
up to some overflow, computet = i1/2^d
modulo \(p\). The result will not necessarily be fully reduced. The number of bitsd
must be nonnegative and less thanFLINT_BITS
. Aliasing is permitted.
Generic butterflies¶

void fft_adjust(mp_limb_t *r, mp_limb_t *i1, mp_size_t i, mp_size_t limbs, flint_bitcnt_t w)¶
Set
r
toi1
times \(z^i\) moduloB^limbs + 1
where \(z\) corresponds to multiplication by \(2^w\). This can be thought of as part of a butterfly operation. We require \(0 \leq i < n\) where \(nw =\)limbs*FLINT_BITS
. Aliasing is not supported.

void fft_adjust_sqrt2(mp_limb_t *r, mp_limb_t *i1, mp_size_t i, mp_size_t limbs, flint_bitcnt_t w, mp_limb_t *temp)¶
Set
r
toi1
times \(z^i\) moduloB^limbs + 1
where \(z\) corresponds to multiplication by \(\sqrt{2}^w\). This can be thought of as part of a butterfly operation. We require \(0 \leq i < 2\cdot n\) and odd where \(nw =\)limbs*FLINT_BITS
.

void butterfly_lshB(mp_limb_t *t, mp_limb_t *u, mp_limb_t *i1, mp_limb_t *i2, mp_size_t limbs, mp_size_t x, mp_size_t y)¶
We are given two integers
i1
andi2
moduloB^limbs + 1
which are not necessarily normalised. We computet = (i1 + i2)*B^x
andu = (i1  i2)*B^y
modulo \(p\). Aliasing between inputs and outputs is not permitted. We requirex
andy
to be less thanlimbs
and nonnegative.

void butterfly_rshB(mp_limb_t *t, mp_limb_t *u, mp_limb_t *i1, mp_limb_t *i2, mp_size_t limbs, mp_size_t x, mp_size_t y)¶
We are given two integers
i1
andi2
moduloB^limbs + 1
which are not necessarily normalised. We computet = (i1 + i2)/B^x
andu = (i1  i2)/B^y
modulo \(p\). Aliasing between inputs and outputs is not permitted. We requirex
andy
to be less thanlimbs
and nonnegative.
Radix 2 transforms¶

void fft_butterfly(mp_limb_t *s, mp_limb_t *t, mp_limb_t *i1, mp_limb_t *i2, mp_size_t i, mp_size_t limbs, flint_bitcnt_t w)¶
Set
s = i1 + i2
,t = z1^i*(i1  i2)
moduloB^limbs + 1
wherez1 = exp(Pi*I/n)
corresponds to multiplication by \(2^w\). Requires \(0 \leq i < n\) where \(nw =\)limbs*FLINT_BITS
.

void ifft_butterfly(mp_limb_t *s, mp_limb_t *t, mp_limb_t *i1, mp_limb_t *i2, mp_size_t i, mp_size_t limbs, flint_bitcnt_t w)¶
Set
s = i1 + z1^i*i2
,t = i1  z1^i*i2
moduloB^limbs + 1
wherez1 = exp(Pi*I/n)
corresponds to division by \(2^w\). Requires \(0 \leq i < 2n\) where \(nw =\)limbs*FLINT_BITS
.

void fft_radix2(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2)¶
The radix 2 DIF FFT works as follows:
Input:
[i0, i1, ..., i(m1)]
, for \(m = 2n\) a power of \(2\).Output:
[r0, r1, ..., r(m1)]
= FFT[i0, i1, ..., i(m1)]
.Algorithm:
\(\bullet\) Recursively compute[r0, r2, r4, ...., r(m2)]
= FFT[i0+i(m/2), i1+i(m/2+1), ..., i(m/21)+i(m1)]
\(\bullet\) Let[t0, t1, ..., t(m/21)]
= [i0i(m/2), i1i(m/2+1), ..., i(m/21)i(m1)]
\(\bullet\) Let[u0, u1, ..., u(m/21)]
= [z1^0*t0, z1^1*t1, ..., z1^(m/21)*t(m/21)]
wherez1 = exp(2*Pi*I/m)
corresponds to multiplication by \(2^w\).\(\bullet\) Recursively compute[r1, r3, ..., r(m1)]
= FFT[u0, u1, ..., u(m/21)]
The parameters are as follows:
\(\bullet\)
2*n
is the length of the input and output arrays \(\bullet\) \(w\) is such that \(2^w\) is an \(2n\)th root of unity in the ring \(\mathbf{Z}/p\mathbf{Z}\) that we are working in, i.e. \(p = 2^{wn} + 1\) (here \(n\) is divisible by
GMP_LIMB_BITS
) \(\bullet\)
ii
is the array of inputs (each input is an array of limbs of length
wn/GMP_LIMB_BITS + 1
(the extra limbs being a “carry limb”). Outputs are written inplace.
We require \(nw\) to be at least 64 and the two temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.

void fft_truncate(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t trunc)¶
As for
fft_radix2
except that only the firsttrunc
coefficients of the output are computed and the input is regarded as having (implied) zero coefficients from coefficienttrunc
onwards. The coefficients must exist as the algorithm needs to use this extra space, but their value is irrelevant. The value oftrunc
must be divisible by 2.

void fft_truncate1(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t trunc)¶
As for
fft_radix2
except that only the firsttrunc
coefficients of the output are computed. The transform still needs all \(2n\) input coefficients to be specified.

void ifft_radix2(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2)¶
The radix 2 DIF IFFT works as follows:
Input:
[i0, i1, ..., i(m1)]
, for \(m = 2n\) a power of \(2\). Output:
[r0, r1, ..., r(m1)]
= IFFT[i0, i1, ..., i(m1)]
.
Algorithm:
 \(\bullet\) Recursively compute
[s0, s1, ...., s(m/21)]
= IFFT[i0, i2, ..., i(m2)]
 \(\bullet\) Recursively compute
[t(m/2), t(m/2+1), ..., t(m1)]
= IFFT[i1, i3, ..., i(m1)]
 \(\bullet\) Let
[r0, r1, ..., r(m/21)]
= [s0+z1^0*t0, s1+z1^1*t1, ..., s(m/21)+z1^(m/21)*t(m/21)]
wherez1 = exp(2*Pi*I/m)
corresponds to division by \(2^w\). \(\bullet\) Let
[r(m/2), r(m/2+1), ..., r(m1)]
= [s0z1^0*t0, s1z1^1*t1, ..., s(m/21)z1^(m/21)*t(m/21)]
The parameters are as follows:
 \(\bullet\)
2*n
is the length of the input and output arrays
 \(\bullet\) \(w\) is such that \(2^w\) is an \(2n\)th root of unity in the ring \(\mathbf{Z}/p\mathbf{Z}\) that we are working in, i.e. \(p = 2^{wn} + 1\) (here \(n\) is divisible by
GMP_LIMB_BITS
)
\(\bullet\)
ii
is the array of inputs (each input is an array of limbs of lengthwn/GMP_LIMB_BITS + 1
(the extra limbs being a “carry limb”). Outputs are written inplace.We require \(nw\) to be at least 64 and the two temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits. Output:

void ifft_truncate(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t trunc)¶
As for
ifft_radix2
except that the output is assumed to have zeros from coefficient trunc onwards and only the first trunc coefficients of the input are specified. The remaining coefficients need to exist as the extra space is needed, but their value is irrelevant. The value oftrunc
must be divisible by 2.Although the implementation does not require it, we assume for simplicity that
trunc
is greater than \(n\). The algorithm begins by computing the inverse transform of the first \(n\) coefficients of the input array. The unspecified coefficients of the second half of the array are then written: coefficienttrunc + i
is computed as a twist of coefficienti
by a root of unity. The values of these coefficients are then equal to what they would have been if the inverse transform of the right hand side of the input array had been computed with full data from the start. The functionifft_truncate1
is then called on the entire right half of the input array with this auxiliary data filled in. Finally a single layer of the IFFT is completed on all the coefficients up totrunc
being careful to note that this involves doubling the coefficients fromtrunc  n
up ton
.

void ifft_truncate1(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t trunc)¶
Computes the first
trunc
coefficients of the radix 2 inverse transform assuming the firsttrunc
coefficients are given and that the remaining coefficients have been set to the value they would have if an inverse transform had already been applied with full data.The algorithm is the same as for
ifft_truncate
except that the coefficients fromtrunc
onwards after the inverse transform are not inferred to be zero but the supplied values.

void fft_butterfly_sqrt2(mp_limb_t *s, mp_limb_t *t, mp_limb_t *i1, mp_limb_t *i2, mp_size_t i, mp_size_t limbs, flint_bitcnt_t w, mp_limb_t *temp)¶
Let \(w = 2k + 1\), \(i = 2j + 1\). Set
s = i1 + i2
,t = z1^i*(i1  i2)
moduloB^limbs + 1
wherez1^2 = exp(Pi*I/n)
corresponds to multiplication by \(2^w\). Requires \(0 \leq i < 2n\) where \(nw =\)limbs*FLINT_BITS
.Here
z1
corresponds to multiplication by \(2^k\) then multiplication by(2^(3nw/4)  2^(nw/4))
. We seez1^i
corresponds to multiplication by(2^(3nw/4)  2^(nw/4))*2^(j+ik)
.We first multiply by
2^(j + ik + wn/4)
then multiply by an additional2^(nw/2)
and subtract.

void ifft_butterfly_sqrt2(mp_limb_t *s, mp_limb_t *t, mp_limb_t *i1, mp_limb_t *i2, mp_size_t i, mp_size_t limbs, flint_bitcnt_t w, mp_limb_t *temp)¶
Let \(w = 2k + 1\), \(i = 2j + 1\). Set
s = i1 + z1^i*i2
,t = i1  z1^i*i2
moduloB^limbs + 1
wherez1^2 = exp(Pi*I/n)
corresponds to division by \(2^w\). Requires \(0 \leq i < 2n\) where \(nw =\)limbs*FLINT_BITS
.Here
z1
corresponds to division by \(2^k\) then division by(2^(3nw/4)  2^(nw/4))
. We seez1^i
corresponds to division by(2^(3nw/4)  2^(nw/4))*2^(j+ik)
which is the same as division by2^(j+ik + 1)
then multiplication by(2^(3nw/4)  2^(nw/4))
.Of course, division by
2^(j+ik + 1)
is the same as multiplication by2^(2*wn  j  ik  1)
. The exponent is positive as \(i \leq 2\cdot n\), \(j < n\), \(k < w/2\).We first multiply by
2^(2*wn  j  ik  1 + wn/4)
then multiply by an additional2^(nw/2)
and subtract.

void fft_truncate_sqrt2(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t trunc)¶
As per
fft_truncate
except that the transform is twice the usual length, i.e. length \(4n\) rather than \(2n\). This is achieved by making use of twiddles by powers of a square root of 2, not powers of 2 in the first layer of the transform.We require \(nw\) to be at least 64 and the three temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.

void ifft_truncate_sqrt2(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t trunc)¶
As per
ifft_truncate
except that the transform is twice the usual length, i.e. length \(4n\) instead of \(2n\). This is achieved by making use of twiddles by powers of a square root of 2, not powers of 2 in the final layer of the transform.We require \(nw\) to be at least 64 and the three temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.
Matrix Fourier Transforms¶

void fft_butterfly_twiddle(mp_limb_t *u, mp_limb_t *v, mp_limb_t *s, mp_limb_t *t, mp_size_t limbs, flint_bitcnt_t b1, flint_bitcnt_t b2)¶
Set
u = 2^b1*(s + t)
,v = 2^b2*(s  t)
moduloB^limbs + 1
. This is used to computeu = 2^(ws*tw1)*(s + t)
,v = 2^(w+ws*tw2)*(s  t)
in the matrix Fourier algorithm, i.e. effectively computing an ordinary butterfly with additional twiddles byz1^rc
for row \(r\) and column \(c\) of the matrix of coefficients. Aliasing is not allowed.

void ifft_butterfly_twiddle(mp_limb_t *u, mp_limb_t *v, mp_limb_t *s, mp_limb_t *t, mp_size_t limbs, flint_bitcnt_t b1, flint_bitcnt_t b2)¶
Set
u = s/2^b1 + t/2^b1)
,v = s/2^b1  t/2^b1
moduloB^limbs + 1
. This is used to computeu = 2^(ws*tw1)*s + 2^(ws*tw2)*t)
,v = 2^(ws*tw1)*s + 2^(ws*tw2)*t)
in the matrix Fourier algorithm, i.e. effectively computing an ordinary butterfly with additional twiddles byz1^(rc)
for row \(r\) and column \(c\) of the matrix of coefficients. Aliasing is not allowed.

void fft_radix2_twiddle(mp_limb_t **ii, mp_size_t is, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t ws, mp_size_t r, mp_size_t c, mp_size_t rs)¶
As for
fft_radix2
except that the coefficients are spaced byis
in the arrayii
and an additional twist byz^c*i
is applied to each coefficient where \(i\) starts at \(r\) and increases byrs
as one moves from one coefficient to the next. Herez
corresponds to multiplication by2^ws
.

void ifft_radix2_twiddle(mp_limb_t **ii, mp_size_t is, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t ws, mp_size_t r, mp_size_t c, mp_size_t rs)¶
As for
ifft_radix2
except that the coefficients are spaced byis
in the arrayii
and an additional twist byz^(c*i)
is applied to each coefficient where \(i\) starts at \(r\) and increases byrs
as one moves from one coefficient to the next. Herez
corresponds to multiplication by2^ws
.

void fft_truncate1_twiddle(mp_limb_t **ii, mp_size_t is, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t ws, mp_size_t r, mp_size_t c, mp_size_t rs, mp_size_t trunc)¶
As per
fft_radix2_twiddle
except that the transform is truncated as perfft_truncate1
.

void ifft_truncate1_twiddle(mp_limb_t **ii, mp_size_t is, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_size_t ws, mp_size_t r, mp_size_t c, mp_size_t rs, mp_size_t trunc)¶
As per
ifft_radix2_twiddle
except that the transform is truncated as perifft_truncate1
.

void fft_mfa_truncate_sqrt2(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t n1, mp_size_t trunc)¶
This is as per the
fft_truncate_sqrt2
function except that the matrix Fourier algorithm is used for the left and right FFTs. The total transform length is \(4n\) wheren = 2^depth
so that the left and right transforms are both length \(2n\). We requiretrunc > 2*n
and thattrunc
is divisible by2*n1
(explained below). The coefficients are produced in an order different fromfft_truncate_sqrt2
.The matrix Fourier algorithm, which is applied to each transform of length \(2n\), works as follows. We set
n1
to a power of 2 about the square root of \(n\). The data is then thought of as a set ofn2
rows each withn1
columns (so thatn1*n2 = 2n
).The length \(2n\) transform is then computed using a whole pile of short transforms. These comprise
n1
column transforms of lengthn2
followed by some twiddles by roots of unity (namelyz^rc
where \(r\) is the row and \(c\) the column within the data) followed byn2
row transforms of lengthn1
. Along the way the data needs to be rearranged due to the fact that the short transforms output the data in binary reversed order compared with what is needed.The matrix Fourier algorithm provides better cache locality by decomposing the long length \(2n\) transforms into many transforms of about the square root of the original length.
For better cache locality the sqrt2 layer of the full length \(4n\) transform is folded in with the column FFTs performed as part of the first matrix Fourier algorithm on the left half of the data.
The second half of the data requires a truncated version of the matrix Fourier algorithm. This is achieved by truncating to an exact multiple of the row length so that the row transforms are full length. Moreover, the column transforms will then be truncated transforms and their truncated length needs to be a multiple of 2. This explains the condition on
trunc
given above.To improve performance, the extra twiddles by roots of unity are combined with the butterflies performed at the last layer of the column transforms.
We require \(nw\) to be at least 64 and the three temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.

void ifft_mfa_truncate_sqrt2(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t n1, mp_size_t trunc)¶
This is as per the
ifft_truncate_sqrt2
function except that the matrix Fourier algorithm is used for the left and right IFFTs. The total transform length is \(4n\) wheren = 2^depth
so that the left and right transforms are both length \(2n\). We requiretrunc > 2*n
and thattrunc
is divisible by2*n1
.We set
n1
to a power of 2 about the square root of \(n\).As per the matrix fourier FFT the sqrt2 layer is folded into the final column IFFTs for better cache locality and the extra twiddles that occur in the matrix Fourier algorithm are combined with the butterflied performed at the first layer of the final column transforms.
We require \(nw\) to be at least 64 and the three temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.

void fft_mfa_truncate_sqrt2_outer(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t n1, mp_size_t trunc)¶
Just the outer layers of
fft_mfa_truncate_sqrt2
.

void fft_mfa_truncate_sqrt2_inner(mp_limb_t **ii, mp_limb_t **jj, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t n1, mp_size_t trunc, mp_limb_t **tt)¶
The inner layers of
fft_mfa_truncate_sqrt2
andifft_mfa_truncate_sqrt2
combined with pointwise mults.

void ifft_mfa_truncate_sqrt2_outer(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp, mp_size_t n1, mp_size_t trunc)¶
The outer layers of
ifft_mfa_truncate_sqrt2
combined with normalisation.
Negacyclic multiplication¶

void fft_negacyclic(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp)¶
As per
fft_radix2
except that it performs a sqrt2 negacyclic transform of length \(2n\). This is the same as the radix 2 transform except that the \(i\)th coefficient of the input is first multiplied by \(\sqrt{2}^{iw}\).We require \(nw\) to be at least 64 and the two temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.

void ifft_negacyclic(mp_limb_t **ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **temp)¶
As per
ifft_radix2
except that it performs a sqrt2 negacyclic inverse transform of length \(2n\). This is the same as the radix 2 inverse transform except that the \(i\)th coefficient of the output is finally divided by \(\sqrt{2}^{iw}\).We require \(nw\) to be at least 64 and the two temporary space pointers to point to blocks of size
n*w + FLINT_BITS
bits.

void fft_naive_convolution_1(mp_limb_t *r, mp_limb_t *ii, mp_limb_t *jj, mp_size_t m)¶
Performs a naive negacyclic convolution of
ii
withjj
, both of length \(m\), and sets \(r\) to the result. This is essentially multiplication of polynomials modulo \(x^m + 1\).

void _fft_mulmod_2expp1(mp_limb_t *r1, mp_limb_t *i1, mp_limb_t *i2, mp_size_t r_limbs, flint_bitcnt_t depth, flint_bitcnt_t w)¶
Multiply
i1
byi2
moduloB^r_limbs + 1
wherer_limbs = nw/FLINT_BITS
withn = 2^depth
. Uses the negacyclic FFT convolution CRT’d with a 1 limb naive convolution. We require thatdepth
andw
have been selected as per the wrapperfft_mulmod_2expp1
below.

slong fft_adjust_limbs(mp_size_t limbs)¶
Given a number of limbs, returns a new number of limbs (no more than the next power of 2) which will work with the Nussbaumer code. It is only necessary to make this adjustment if
limbs > FFT_MULMOD_2EXPP1_CUTOFF
.

void fft_mulmod_2expp1(mp_limb_t *r, mp_limb_t *i1, mp_limb_t *i2, mp_size_t n, mp_size_t w, mp_limb_t *tt)¶
As per
_fft_mulmod_2expp1
but with a tuned cutoff below which more classical methods are used for the convolution. The temporary space is required to fitn*w + FLINT_BITS
bits. There are no restrictions on \(n\), but iflimbs = n*w/FLINT_BITS
then iflimbs
exceedsFFT_MULMOD_2EXPP1_CUTOFF
the functionfft_adjust_limbs
must be called to increase the number of limbs to an appropriate value.
Integer multiplication¶

void mul_truncate_sqrt2(mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2, flint_bitcnt_t depth, flint_bitcnt_t w)¶
Integer multiplication using the radix 2 truncated sqrt2 transforms.
Set
(r1, n1 + n2)
to the product of(i1, n1)
by(i2, n2)
. This is achieved through an FFT convolution of length at most2^(depth + 2)
with coefficients of size \(nw\) bits wheren = 2^depth
. We requiredepth >= 6
. The input data is broken into chunks of data not exceeding(nw  (depth + 1))/2
bits. If breaking the first integer into chunks of this size results inj1
coefficients and breaking the second integer results inj2
chunks thenj1 + j2  1 <= 2^(depth + 2)
.If
n = 2^depth
then we require \(nw\) to be at least 64.

void mul_mfa_truncate_sqrt2(mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2, flint_bitcnt_t depth, flint_bitcnt_t w)¶
As for
mul_truncate_sqrt2
except that the cache friendly matrix Fourier algorithm is used.If
n = 2^depth
then we require \(nw\) to be at least 64. Here we also require \(w\) to be \(2^i\) for some \(i \geq 0\).

void flint_mpn_mul_fft_main(mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2)¶
The main integer multiplication routine. Sets
(r1, n1 + n2)
to(i1, n1)
times(i2, n2)
. We requiren1 >= n2 > 0
.
Convolution¶

void fft_convolution(mp_limb_t **ii, mp_limb_t **jj, slong depth, slong limbs, slong trunc, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **s1, mp_limb_t **tt)¶
Perform an FFT convolution of
ii
withjj
, both of length4*n
wheren = 2^depth
. Assume that all but the firsttrunc
coefficients of the output (placed inii
) are zero. Each coefficient is taken moduloB^limbs + 1
. The temporary spacest1
,t2
ands1
must havelimbs + 1
limbs of space andtt
must have2*(limbs + 1)
of free space.
FFT Precaching¶

void fft_precache(mp_limb_t **jj, slong depth, slong limbs, slong trunc, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **s1)¶
Precompute the FFT of
jj
for use with precache functions. The parameters are as forfft_convolution
.

void fft_convolution_precache(mp_limb_t **ii, mp_limb_t **jj, slong depth, slong limbs, slong trunc, mp_limb_t **t1, mp_limb_t **t2, mp_limb_t **s1, mp_limb_t **tt)¶
As per
fft_convolution
except that it is assumedfft_precache
has been called onjj
with the same parameters. This will then run faster than iffft_convolution
had been run with the originaljj
.