Using ball arithmetic¶
This section gives an introduction to working with real numbers in Arb (see arb.h – real numbers for the API and technical documentation). The general principles carry over to complex numbers, polynomials and matrices.
Ball semantics¶
Let
For all
, we have .
In other words,
Throughout the documentation (except where otherwise noted),
we will simply write
General subsets of
Given a ball arb_add()
) always round their output to prec bits.
Some functions are always exact (e.g. arb_neg()
), and thus do not take a prec argument.
The programming interface resembles that of GMP.
Each arb_t
variable must be initialized with arb_init()
before use
(this also sets its value to zero), and deallocated with arb_clear()
after use. Variables have pass-by-reference semantics.
In the list of arguments to a function, output variables come first,
followed by input variables, and finally the precision:
#include "arb.h"
int main()
{
arb_t x, y;
arb_init(x); arb_init(y);
arb_set_ui(x, 3); /* x = 3 */
arb_const_pi(y, 128); /* y = pi, to 128 bits */
arb_sub(y, y, x, 53); /* y = y - x, to 53 bits */
arb_clear(x); arb_clear(y);
}
Binary and decimal¶
While the internal representation uses binary floating-point numbers, it is usually preferable to print numbers in decimal. The binary-to-decimal conversion generally requires rounding. Three different methods are available for printing a number to standard output:
arb_print()
shows the exact internal representation of a ball, with binary exponents.arb_printd()
shows an inexact view of the internal representation, approximated by decimal floating-point numbers.arb_printn()
shows a decimal ball that is guaranteed to be an enclosure of the binary floating-point ball. By default, it only prints digits in the midpoint that are certain to be correct, up to an error of at most one unit in the last place. Converting from binary to decimal is generally inexact, and the output of this method takes this rounding into account when printing the radius.
This snippet computes a 53-bit enclosure of
arb_const_pi(x, 53);
arb_print(x); printf("\n");
arb_printd(x, 20); printf("\n");
arb_printn(x, 20, 0); printf("\n");
The output is:
(884279719003555 * 2^-48) +/- (536870913 * 2^-80)
3.141592653589793116 +/- 4.4409e-16
[3.141592653589793 +/- 5.61e-16]
The arb_get_str()
and arb_set_str()
methods are useful for
converting rigorously between decimal strings and binary balls
(arb_get_str()
produces the same string as arb_printn()
,
and arb_set_str()
can parse such strings back).
A potential mistake is to create a ball from a double
constant
such as 2.3
, when this actually represents
2.29999999999999982236431605997495353221893310546875
.
To produce a ball containing the rational number
arb_set_str(x, "2.3", prec)
arb_set_ui(x, 23);
arb_div_ui(x, x, 10, prec)
fmpq_set_si(q, 23, 10); /* q is a FLINT fmpq_t */
arb_set_fmpq(x, q, prec);
Quality of enclosures¶
The main problem when working with ball arithmetic (or interval arithmetic) is overestimation. In general, the enclosure of a value or set of values as computed with ball arithmetic will be larger than the smallest possible enclosure.
Overestimation results naturally from rounding errors and cancellations in the individual steps of a calculation. As a general principle, formula rewriting techniques that make floating-point code more numerically stable also make ball arithmetic code more numerically stable, in the sense of producing tighter enclosures.
As a result of the dependency problem, ball or interval
arithmetic can produce error
bounds that are much larger than the actual numerical errors
resulting from doing floating-point arithmetic.
Consider the expression
If all inputs to a calculation are “point values”, i.e.
exact numbers and known mathematical constants that can
be approximated arbitrarily closely (such as
Therefore, ball arithmetic is not a silver bullet: there will always be situations where some amount of numerical or mathematical analysis is required. Some experimentation may be required to find whether (and how) it can be used effectively for a given problem.
Predicates¶
A ball implementation of a predicate
For example, the following code would not be correct in general:
if (arb_is_positive(x))
{
... /* do things assuming that x > 0 */
}
else
{
... /* do things assuming that x <= 0 */
}
Instead, the following can be used:
if (arb_is_positive(x))
{
... /* do things assuming that x > 0 */
}
else if (arb_is_nonpositive(x))
{
... /* do things assuming that x <= 0 */
}
else
{
... /* do things assuming that the sign of x is unknown */
}
Likewise, we will write
Note that some predicates such as arb_overlaps()
and arb_contains()
actually are predicates on balls viewed as sets, and not ball implementations
of pointwise predicates.
Some predicates are also complementary.
For example arb_contains_zero()
tests whether the input ball
contains the point zero.
Negated, it is equivalent to arb_is_nonzero()
,
and complementary to arb_is_zero()
as a pointwise predicate:
if (arb_is_zero(x))
{
... /* do things assuming that x = 0 */
}
#if 1
else if (arb_is_nonzero(x))
#else
else if (!arb_contains_zero(x)) /* equivalent */
#endif
{
... /* do things assuming that x != 0 */
}
else
{
... /* do things assuming that the sign of x is unknown */
}
A worked example: the sine function¶
We implement the function arb_t
arithmetic.
Since there are infinitely many terms, we need to split the series
in two parts: a finite sum that can be evaluated directly, and
a tail that has to be bounded.
We stop as soon as we reach a term
#include "arb.h"
void arb_sin_naive(arb_t res, const arb_t x, slong prec)
{
arb_t s, t, u, tol;
slong k;
arb_init(s); arb_init(t); arb_init(u); arb_init(tol);
arb_one(tol);
arb_mul_2exp_si(tol, tol, -prec); /* tol = 2^-prec */
for (k = 0; ; k++)
{
arb_pow_ui(t, x, 2 * k + 1, prec);
arb_fac_ui(u, 2 * k + 1, prec);
arb_div(t, t, u, prec); /* t = x^(2k+1) / (2k+1)! */
arb_abs(u, t);
if (arb_le(u, tol)) /* if |t| <= 2^-prec */
{
arb_add_error(s, u); /* add |t| to the radius and stop */
break;
}
if (k % 2 == 0)
arb_add(s, s, t, prec);
else
arb_sub(s, s, t, prec);
}
arb_set(res, s);
arb_clear(s); arb_clear(t); arb_clear(u); arb_clear(tol);
}
This algorithm is naive, because the Taylor series is slow to converge
and suffers from catastrophic cancellation when
As a test, we compute
int main()
{
arb_t x, y;
slong prec;
arb_init(x); arb_init(y);
for (prec = 64; ; prec *= 2)
{
arb_set_str(x, "2016.1", prec);
arb_sin_naive(y, x, prec);
printf("Using %5ld bits, sin(x) = ", prec);
arb_printn(y, 10, 0); printf("\n");
if (!arb_contains_zero(y)) /* stopping condition */
break;
}
arb_clear(x); arb_clear(y);
}
The program produces the following output:
Using 64 bits, sin(x) = [+/- 2.67e+859]
Using 128 bits, sin(x) = [+/- 1.30e+840]
Using 256 bits, sin(x) = [+/- 3.60e+801]
Using 512 bits, sin(x) = [+/- 3.01e+724]
Using 1024 bits, sin(x) = [+/- 2.18e+570]
Using 2048 bits, sin(x) = [+/- 1.22e+262]
Using 4096 bits, sin(x) = [-0.7190842207 +/- 1.20e-11]
As an exercise, the reader may improve the naive algorithm by making it
subtract a well-chosen multiple of arb_const_pi()
, arb_div()
and arf_get_fmpz()
).
If done correctly, 64 bits of precision should be more than enough to
compute
This example illustrates how ball arithmetic can be used to perform
nontrivial calculations. To evaluate an infinite series, the user
needs to know how to bound the tail of the series, but everything
else is automatic.
When evaluating a finite formula that can be expressed
completely using built-in functions, all error bounding is automatic
from the point of view of the user.
In particular, the arb_sin()
method should be used to compute the sine
of a real number; it uses a much more efficient algorithm
than the naive code above.
This example also illustrates the “guess-and-verify” paradigm: instead of determining a priori the floating-point precision necessary to get a correct result, we guess some initial precision, use ball arithmetic to verify that the result is accurate enough, and restart with higher precision (or signal failure) if it is not.
If we think of rounding errors as essentially random processes, then a floating-point computation is analogous to a Monte Carlo algorithm. Using ball arithmetic to get a verified result effectively turns it into the analog of a Las Vegas algorithm, which is a randomized algorithm that always gives a correct result if it terminates, but may fail to terminate (alternatively, instead of actually looping forever, it might signal failure after a certain number of iterations).
The loop will fail to terminate if we attempt to determine the sign of
Using 64 bits, sin(x) = [+/- 3.96e-18]
Using 128 bits, sin(x) = [+/- 2.17e-37]
Using 256 bits, sin(x) = [+/- 6.10e-76]
Using 512 bits, sin(x) = [+/- 5.13e-153]
Using 1024 bits, sin(x) = [+/- 4.01e-307]
Using 2048 bits, sin(x) = [+/- 2.13e-615]
Using 4096 bits, sin(x) = [+/- 6.85e-1232]
Using 8192 bits, sin(x) = [+/- 6.46e-2465]
Using 16384 bits, sin(x) = [+/- 5.09e-4931]
Using 32768 bits, sin(x) = [+/- 5.41e-9863]
...
The sign of a nonzero real number can be
decided by computing it to sufficiently high accuracy, but the sign
of an expression that is exactly equal to zero cannot be decided
by a numerical computation unless the entire computation
happens to be exact (in this example, we could use the arb_sin_pi()
function which computes
It is up to the user to implement a stopping criterion appropriate for
the circumstances of a given application. For example, breaking
when it is clear that
More on precision and accuracy¶
The relation between the working precision and the accuracy of the output is not always easy predict. The following remarks might help to choose prec optimally.
For a ball
Absolute error:
Relative error:
(or if )Absolute accuracy:
Relative accuracy:
Expressed in bits, one takes the corresponding
Of course, if
The prec argument in Arb should be thought of as controlling the working precision. Generically, when evaluating a fixed expression (that is, when the sequence of operations does not depend on the precision), the absolute or relative error will be bounded by
where the
Sometimes, a partially accurate result can be used to estimate the
Built-in functions in Arb can roughly be characterized as belonging to one of two extremes (though there is actually a spectrum):
Simple operations, including basic arithmetic operations and many elementary functions. In most cases, for an input
, is evaluated by computing and then separately bounding the propagated error . The working precision is automatically increased internally so that is computed to prec bits of relative accuracy with an error of at most a few units in the last place (perhaps with rare exceptions). The propagated error can generally be bounded quite tightly as well (see General formulas and bounds). As a result, the enclosure will be close to the best possible at the given precision, and the user can estimate the precision to use accordingly.Complex operations, such as certain higher transcendental functions (for example, the Riemann zeta function). The function is evaluated by performing a sequence of simpler operations, each using ball arithmetic with a working precision of roughly prec bits. The sequence of operations might depend on prec; for example, an infinite series might be truncated so that the remainder is smaller than
. The final result can be far from tight, and it is not guaranteed that the error converges to zero as , though in practice, it should do so in most cases.
In short, the inclusion principle is the fundamental contract in Arb. Enclosures computed by built-in functions may or may not be tight enough to be useful, but the hope is that they will be sufficient for most purposes. Tightening the error bounds for more complex operations is a long term optimization goal, which in many cases will require a fair amount of research. A tradeoff also has to be made for efficiency: tighter error bounds allow the user to work with a lower precision, but they may also be much more expensive to compute.
Polynomial time guarantee¶
Arb provides a soft guarantee that the time used to evaluate a ball function will depend polynomially on prec and the bit size of the input, uniformly regardless of the numerical value of the input.
The idea behind this soft guarantee is to allow Arb to be used as a black box to evaluate expressions numerically without potentially slowing down, hanging indefinitely or crashing because of “bad” input such as nested exponentials. By controlling the precision, the user can cancel a computation before it uses up an unreasonable amount of resources, without having to rely on other timeout or exception mechanisms. A result that is feasible but very expensive to compute can still be forced by setting the precision high enough.
As motivation, consider evaluating
Therefore, Arb caps internal work parameters
(the internal working precision,
the number terms of an infinite series to add, etc.) by polynomial,
usually linear, functions of prec.
When the limit is exceeded, the output is set to a crude bound.
For example, if arb_sin()
will
simply return arb_exp()
will simply return
This is not just a failsafe, but occasionally a useful optimization. It is not entirely uncommon to have formulas where one term is modest and another term decreases exponentially, such as:
For example, the reflection formula of the digamma function has
a similar structure.
When
The polynomial time guarantee is “soft” in that there are a few exceptions.
For example, the complexity of computing the Riemann zeta function