After [9] and similar complexity bounds
for integer multiplication obtained by Toom and Cook, the improved
estimate O(N · log N ·
loglog N) from [16]
has been the best we knew for many years, as far as circuit complexity
or the speed of multitape Turing machines are concerned -- until
M. Fürer could now prove the better bound
O(N · log N · 2O(log*N))
for these models of computation,
cf. Proc. 39th ACM STOC (2007), 57-66.
Our studies of linking automata, however, (about SMM's,
see [13],
[23], and section 6 of the final
paper [35], in particular)
have revealed that such "pointer machines" are capable of
performing integer multiplication in linear time !
In 1981/82 we reconsidered these issues under more practical aspects,
especially towards numerical computations with complex polynomials.
Section 1 of [39] generalizes the
Fermat number method of [16] to fast
multiplication mod (2N + 1) with so-called
"suitable" numbers N, and [44]
reports our first experiences with a multitape Turing machine
implementation of that approach, which then became the starting point
of our TP Project.
In algebraic complexity (counting adds, subs and mults at unit cost), analogous methods apply for multiplying polynomials over rings, with [31] settling the particular case of characteristic 2. For (approximate) numerical multiplication of polynomials in R[x] or C[x], however, bit complexity models are more adequate. Then methods of binary segmentation as in [39] are preferable, saving a logarithmic factor. For the related case of numerical multiplication mod xn, we can save another 25% of the running time by the methods of [53].
In 1987, developing our fast routines for computing integer gcd's, we have replaced the original divide-and-conquer method of [17] by a much better suited version called "Controlled Euclidean Descent", as briefly outlined in Section 7.2.1 of [51]. In [49] one finds the same ideas carried out for the analogous case of binary quadratic forms.
Polynomial gcd's in Z[x] are the theme of [47], and how to reduce them to integer gcd computations by means of binary segmentation, while the case C[x] treated in [43] requires quite different methods, deeply connected with the infinitesimal features of complex numbers and polynomials.
Algorithmic issues of linear algebra have always been among our
favorite topics of research, starting with
[4] and [7]
on the quadratic convergence of Jacobi's method.
Then, after Strassen's seminal paper
Gaussian elimination is not optimal, the challenge was,
of course, to exploit his new matrix multiplication method for other
matrix computations as well, like our blockwise orthogonalization
methods in [19] and
[21] - as tutorial in
[22] - in particular
yielding a fast (and stable) method for unitary transformation of
complex matrices to upper Hessenberg form. Methods and theorems from
[36] have been used for
deriving smaller bounds for the exponent(s) of matrix multiplication.
To compute characteristic polynomials, we should distinguish
two cases. Over fields of finite characteristic,
[50] offers a purely algebraic approach
via computing sufficiently many power sums of the eigenvalues.
For the numerical case (working approximately over C),
one can apply Keller-Gehrig's algebraic method to matrices in
upper Hessenberg form, as we have briefly outlined at the end of the
preliminary report [40]
(now also available for downloads). - Meanwhile P. Kirrinnis
has considerably extended and developed these initial ideas to an
impressive work on fast computation of invariant subspaces.
In discussing algorithmic problems dealing with complex numbers,
we prefer a strictly constructive approach susceptible to actual
computing, thus very different from alternate views like the
perspective propagated with the BSS-model and in
the book reviewed in [55].
Algorithms are to be specified for some realistic machine model
(like multitape Turing machines, or simple RAM's) with some
fair cost measure of bit complexity. Any input number z shall
be given by some oracle answering any `queries' r
by providing binary approximations x, y with
|(x + iy) - z| < 2-r.
Then it is fairly obvious how to compose two oracles for inputs
u and v to obtain a machine program serving as
a new oracle for simple functions like
w = u+v, or
w = u×v, the latter
yielding N-bit precision in running time
O(µ(N)), where µ(N) shall
denote a time bound for N-bit integer multiplication (such
that µ(N)/N is monotone), like
µ(N) = cN for pointer machines.
Evaluation complexity of holomorphic functions f should be
considered locally. Restricted to any (closed) disk D
in C with convergent power series expansion,
fD is said to be computable in time
t(s), if there exists a machine M such
that on input of integer s > 0
and any z in D given by an oracle,
M is stopping after at most t(s)
steps with output w such that
|w - f(z)| < 2-s.
There is the beautiful result due to R. Brent (see references in
[48] ) that the
elementary functions exp, ln, cos, sin, arctan,
etc. (on any regular disk) are all computable in time
O(µ(s)·log s) - but what about higher
transcendental functions ?
Interesting upper bounds have been found for the Gamma, zeta, and
error function, to name a few, like evaluation of
Gamma(z) in time sß for any
ß > 1.5 (see Borwein & Borwein, Pi
and the AGM, page 332). The methods in
[46] have a different
focus, aiming at efficient mass production for multiple
evaluations of Riemann's zeta function. The last section of
[48] contains an excursus
how to use these techniques for computing a single value
zeta(½ + it) for large t up to
moderate precision t -c within time
O(tß), so for any
ß > 0.375 .
With regard to analytic continuation, we have the following
general
Theorem. If fD admits analytic
continuation along curve C to some other germ
fE: E -> C on
disk E, then there exists
c > 0 such that computability of
fD in time t(s) implies computability
of fE in time O(s)·t(cs).
With t(cs) = O(sß),
this implies computability of fE in time
O(sß+1).
Conjecture: In this polynomially bounded case, much sharper
fE should then also be computable in time
O(t(s)).
The evaluation complexity of any algebraic function is simply bounded by a constant multiple of integer multiplication time, which holds in the following strongly uniform sense, cf. Prop. (6.1) of [48]:
Consider A0, . . . ,An
in Z[z1, . . . ,zk]
with complex inputs z1,...,zk varying
in some compact subset K of Ck,
there satisfying
maxj|Aj(z)| > ½,
say. Then the branches w of the algebraic functions defined
by the equation
F(z,w) = A0(z) +
A1(z)·w + · · · +
An(z)·wn = 0
are computable up to relative precison 2-s in time
c·µ(s), with an estimation constant
c merely depending on F and K.
This general result is implied by our splitting circle method,
see [45] and the report
[40], in particular. Beyond
that there remains, of course, an abundance of more special problems
to find small estimation factors c of this sort,
depending on the degree n of the defining equation and many
other circumstances. - Just think of little examples like
F(z,w) = z1
- z2·w, or
z - w3 = 0,
leading to the discussion of efficient algorithms for
complex division, or for complex cube roots.
After the integer and gcd routines had been developed and implemented,
we started in 1988/89 to carry out all those detailed studies
needed for an efficient implementation of the splitting circle method,
by and by also implementing all the corresponding routines. -
Our "TP-book" [51] from
1993/94 describes an intermediate stage of these efforts.
- The routines for root finding as the main application could
eventually be completed by the end of 1998.
Meanwhile many enhancements have been added, while implementation of
some items is still pending, e.g. the improved polynomial division
method from [56] by means of Graeffe steps.
Turing Processing tells more about the
present state of all that software.