A Tour of NTL

Introduction

NTL is a high-performance, portable C++ library providing data structures and algorithms for arbitrary length integers; for vectors, matrices, and polynomials over the integers and over finite fields; and for arbitrary precision floating point arithmetic.

NTL provides high quality implementations of state-of-the-art algorithms for:

NTL's polynomial arithmetic is one of the fastest available anywhere, and has been used to set "world records" for polynomial factorization and determining orders of elliptic curves.

NTL's lattice reduction code is also one of the best available anywhere, in terms of both speed and robustness, and one of the few implementations of block Korkin-Zolotarev reduction with the Schnorr-Horner pruning heuristic. It has been used to "crack" several cryptosystems.

NTL can be easily installed in a matter of minutes on just about any platform, including PCs, and 32- and 64-bit workstations running Unix or Windows 95/NT. NTL achieves this portability by avoiding esoteric C++ features, and by avoiding assembly code; it should therefore remain usable for years to come with little or no maintenance, even as processors and operating systems continue to change and evolve. However, several compile-time flags can be set to tune the code for a particular type of platform.

NTL provides a clean and consistent interface to a large variety of classes representing mathematical objects. It provides a good environment for easily and quickly implementing new number-theoretic algorithms, without sacrificing performance.

NTL is free software that is intended for research and educational purposes only. It is written and maintained by Victor Shoup, with some code contributed by others (see Acknowledgements).

Some History

Work on NTL started around 1990, when I wanted to implement some new algorithms for factoring polynomials over finite fields. I found that none of the available software was adequate for this task, mainly because the code for polynomial arithmetic in the available software was too slow. So I wrote my own. My starting point was Arjen Lenstra's LIP package for long integer arithmetic, which was written in C. It soon became clear that using C++ instead of C would be much more productive and less prone to errors, mainly because of C++'s constructors and destructors which allow memory management to be automated. Using C++ has other benefits as well, like function and opertor overloading, which makes for more readable code.

One of the basic design principles of LIP was portability. I adopted this principle for NTL as well, for a number of reasons, not the least of which was that my computing environment kept changing whenever I changed jobs. Achieving portability is getting easier as standards, like IEEE floating point, get widely adopted, and as the definition of and implementations of the C++ language stabilize (which a few years ago was a huge headache, but is now only a big one, and in a few years will be a small one).

Since 1990, NTL has evolved in many ways, and it now provides a fairly polished and well-rounded programming interface.


Examples

Perhaps the best way to introduce the basics of NTL is by way of example.

Example 1

The first example makes use of the class ZZ, which represents "big integers": signed, arbitrary length integers. This program reads two big integers a and b, and prints a*a + b*b.

#include "ZZ.h"

main()
{
   ZZ a, b, c; 

   cin >> a; 
   cin >> b; 
   c = a*a + b*b; 
   cout << c << "\n";
}

Example 2

Here's a program that reads a list of integers from standard input and prints their sum of squares.

#include "ZZ.h"

main()
{
   ZZ acc, val;

   acc = 0;
   while (SkipWhiteSpace(cin)) {
      cin >> val;
      acc += val*val;
   }

   cout << acc << "\n";
}
The function SkipWhiteSpace is defined by NTL. It skips over white space, and returns 1 if there is something following it. Note that NTL's input operators raise an error if an input is missing or ill-formed, unlike the standard I/O library which does not.

Example 3

Here's a simple modular expoentiation routine for computing a^e mod n. NTL already provides a more sophisticated one, though.

ZZ PowerMod(const ZZ& a, const ZZ& n, const ZZ& e)
{
   if (e == 0) return to_ZZ(1);

   long k = NumBits(e);

   ZZ res;
   res = 1;

   for (long i = k-1; i >= 0; i--) {
      res = (res*res) % n;
      if (bit(e, i) == 1) res = (res*a) % n;
   }

   if (e < 0)
      return InvMod(res, n);
   else
      return res;
}
Note that as an alternative, we could implement the inner loop as follows:
   res = SqrMod(a, n);
   if (bit(e, i) == 1) res = MulMod(res, a, n);
We could also write this as:
   SqrMod(res, a, n);
   if (bit(e, i) == 1) MulMod(res, res, a, n);
This illustrates an important point about NTL's programming interface. For every function in NTL, there is a procedural version that stores its result in its first argument. The reason for using the procedural variant is efficieny: on every iteration through the above loop, the functional form of SqrMod will cause a temporary ZZ object to be created and destroyed, whereas the procedural version will not create any temporaries. Where performance is critical, then the procedural version is to be preferred. Although it is usually silly to get worked up about performance, it may be reasonable to argue that modular exponentiation is an important enough routine that it should be as fast as possible.

Note that when the functional version of a function can be naturally named with an operator, this is done. So for example, NTL provides a 3-argument mul routine for ZZ multiplication, and a functional version whose name is operator *, and not mul.

While we are taking about temporaries, consider the first version of the inner loop. Execution of the statement

   res = (res*res) % n;
will actually result in the creation of two temporary objects, one for the product, and one for the result of the mod operation, whose value is copied into res. Of course, the compiler automatically generates the code for cleaning up temporaries and other local objects at the right time. The programmer does not have to worry about this.

Example 4

This example is a bit more interesting. The following program prompts the user for an input, and applies a simple probabilistic primality test. Note that NTL already provides a slightly more sophisticated prime test.

#include "ZZ.h"

long witness(const ZZ& n, const ZZ& x)
{
   ZZ m, y, z;
   long j, k;

   if (x == 0) return 0;

   // compute m, k such that n-1 = 2^k * m, m odd:

   k = 1;
   m = n/2;
   while (m % 2 == 0) {
      k++;
      m /= 2;
   }

   z = PowerMod(x, m, n); // z = x^m % n
   if (z == 1) return 0;

   j = 0;
   do {
      y = z;
      z = (y*y) % n; 
      j++;
   } while (j < k && z != 1);

   return z != 1 || y != n-1;
}


long PrimeTest(const ZZ& n, long t)
{
   if (n == 2) return 1;
   if (n <= 1 || n % 2 == 0) return 0;

   ZZ x;
   long i;

   for (i = 0; i < t; i++) {
      x = RandomBnd(n); // random number between 0 and n-1

      if (witness(n, x)) 
         return 0;
   }

   return 1;
}

main()
{
   ZZ n;

   cout << "n: ";
   cin >> n;

   if (PrimeTest(n, 10))
      cout << n << " is probably prime\n";
   else
      cout << n << " is composite\n";
}
Note that in NTL, there are typically a number of ways to compute the same thing. For example, consider the computation of m and k in function witness. We could have written it thusly:
   k = 1;
   m = n >> 1;
   while (!IsOdd(m)) {
      k++;
      m >>= 1;
   }
It turns out that this is actually not significantly more efficient than the original version, because the implementation optimizes multiplication and division by 2.

The following is more efficient:

   k = 1;
   while (bit(n, k) == 0) k++;
   m = n >> k;
As it happens, there is a built-in NTL routine that does just what we want:
   m = n-1;
   k = MakeOdd(m);

Example 5

The following routine sums up the numbers in a vector of ZZ's.

#include "vec_ZZ.h"

ZZ sum(const vec_ZZ& v)
{
   ZZ acc;

   acc = 0;

   for (long i = 0; i < v.length(); i++)
      acc += v[i];

   return acc;
}

The class vec_ZZ is a dynamic-length array of ZZs; more generally, NTL provides template-like macros to create dynamic-length vectors over any type T. By convention, NTL names these vec_T. The reason that macros are used instead of true templates is simple: at the present time, compiler support for templates is not entirely satisfactory, and their use would make NTL much more difficult to port. At some point in the future, a template-version of NTL may be made available.

Vectors in NTL are indexed from 0, but in many situations it is convenient or more natural to index from 1. The generic vector class allows for this; the above example could be written as follows.

#include "vec_ZZ.h"

ZZ sum(ZZ& s, const vec_ZZ& v)
{
   ZZ acc;

   acc = 0;

   for (long i = 1; i <= v.length(); i++)
      acc += v(i); 

   return acc;
}

Example 6

There is also basic support for matrices in NTL. In general, the class mat_T is a special kind of vec_vec_T, where each row is a vector of the same length. Row i of matrix M can be accessed as M[i] (indexing from 0) or as M(i) (indexing from 1). Column j of row i can be accessed as M[i][j] or M(i)(j); for notational convenience, the latter is equivalent to M(i,j).

Here is a matrix multiplication routine, which in fact is already provided by NTL.

#include "mat_ZZ.h"

void mul(mat_ZZ& X, const mat_ZZ& A, const mat_ZZ& B)
{
   long n = A.NumRows();
   long l = A.NumCols();
   long m = B.NumCols();

   if (l != B.NumRows())
      Error("matrix mul: dimension mismatch");

   X.SetDims(n, m); // make X have n rows and m columns

   long i, j, k;
   ZZ acc, tmp;

   for (i = 1; i <= n; i++) {
      for (j = 1; j <= m; j++) {
         acc = 0;
         for(k = 1; k <= l; k++) {
            mul(tmp, A(i,k), B(k,j));
            add(acc, acc, tmp);
         }
         X(i,j) = acc;
      }
   }
}

In case of a dimension mismatch, the routine calls the Error function, which is a part of NTL and which simply prints the message and aborts. That is generally how NTL deals with errors. Currently, NTL makes no use of exceptions (for the same reason it does not use templates--see above), but a future version may incorporate them.

This routine will not work properly if X aliases A or B. The actual matrix multiplication routine in NTL takes care of this.

To call the multiplication routine, one can write

   mul(X, A, B);
or one can also use the operator notation
   X = A * B;

One thing you may have noticed by now is that NTL code generally avoids the type int, preferring instead to use long. This seems to go against what most "style" books preach, but nevertheless seems to make the most sense in today's world. Although int was originally meant to represent the "natural" word size, this seems to no longer be the case. On 32-bit machines, int and long are the same, but on 64-bit machines, they are often different, with int's having 32 bits and long's having 64 bits. Moreover, on such 64-bit machines, the "natural" word size is usually 64-bits; indeed, it is often more expensive to manipulate 32-bit integers. Thus, for simplicity, efficiency, and safety, NTL uses long for all integer values. If you are used to writing int all the time, it takes a little while to get used to this.

Example 7

NTL provides extensive support for very fast polynomial arithmetic. In fact, this was the main motivation for creating NTL in the first place, because existing computer algebra systems and software libraries had very slow polynomial arithmetic. The class ZZX represents univariate polynomials with integer coefficients. The following program reads a polynomial, factors it, and prints the factorization.

#include "ZZXFactoring.h"

main()
{
   ZZX f;

   cin >> f;

   vec_pair_ZZX_long factors;
   ZZ c;

   factor(c, factors, f);

   cout << c << "\n";
   cout << factors << "\n";
}
When this program is compiled an run on input
   [2 10 14 6]
which represents the polynomial 2 + 10*X + 14*x^2 +6*X^3, the output is
   2
   [[[1 3] 1] [[1 1] 2]]
The first line of output is the content of the polynomial, which is 2 in this case as each coefficient of the input polynomial is divisible by 2. The second line is a vector of pairs, the first member of each pair is an irreducible factor of the input, and the second is the exponent to which is appears in the factorization. Thus, all of the above simply means that
2 + 10*X + 14*x^2 +6*X^3 = 2 * (1 + 3*X) * (1 + X)^2 

Admittedly, I/O in NTL is not exactly user friendly, but then NTL has no pretensions about being an interactive computer algebra system: it is a library for programmers.

Example 8

Here is another example. The following program prints out the first 100 cyclotomic polynomials.


#include "ZZX.h"

main()
{
   vec_ZZX phi(INIT_SIZE, 100);  

   for (long i = 1; i <= 100; i++) {
      ZZX t;
      t = 1;

      for (long j = 1; j <= i-1; j++)
         if (i % j == 0)
            t *= phi(j);

      phi(i) = (ZZX(i, 1) - 1)/t;  // ZZX(i, a) == X^i * a

      cout << phi(i) << "\n";
   }
}
Note how we declare and initialize t in this example. In general, the default initial value for any arithmetic object in NTL is zero. In this case, we want to initialize to 1. The following does not work:
   ZZX t = 1;  // error
because there is no constructor for a ZZX taking an int as an argument. This is intentional: if NTL did define such a constructor, this would act as an implicit conversion operator, and this would be undesirable for a number of reasons. Note that for convenience NTL does overload the assignment operator to act as an explicit conversion operator in this case.

So, one can initialize t to 1 as above, or as follows:

   ZZX t = to_ZZX(1);

Example 9

NTL also supports modular integer arithmetic. The class ZZ_p represents the integers mod p. Despite the notation, p need not in general be prime, except in situations where this is mathematically required. The classes vec_ZZ_p, mat_ZZ_p, and ZZ_pX represent vectors, matrices, and polynomials mod p, and work much the same way as the corresponding classes for ZZ.

Here is a program that reads a prime number p, and a polynomial f modulo p, and factors it.

#include "ZZ_pXFactoring.h"

main()
{
   ZZ p;
   cin >> p;
   ZZ_p::init(p);

   ZZ_pX f;
   cin >> f;

   cout << CanZass(f) << "\n";
   // calls "Cantor/Zassenhaus" algorithm 
}

As a program is running, NTL keeps track of a "current modulus" for the class ZZ_p, which can be initialized or changed using ZZ_p::init. This must be done before any variables are declared or computations are done that depend on this modulus.

Please note that for efficiency reasons, NTL does not make any attempt to ensure that variables declared under one modulus are not used under a different one. If that happens, the behavior of a program in this case is completely unpredictable.

Example 10

There is a mechanism for saving and restoring a modulus, which the following example illustrates. This routine takes as input an integer polynomial and a prime, and tests if the polynomial is irreducible modulo the prime.

#include "ZZX.h"
#include "ZZ_pXFactoring.h"

long IrredTestMod(const ZZX& f, const ZZ& p)
{
   ZZ_pBak bak;  // save current modulus in bak
   bak.save();

   ZZ_p::init(p);  // set the current modulus to p

   return DetIrredTest(to_ZZ_pX(f));

   // old modulus is restored automatically when bak is destroyed
   // upon return
}

Consider the conversion function to_ZZ_pX in this example. This is of course the natural map reducing each coefficient mod p. NTL offers a plethora of conversion functions, in both functional and procedural form. In procedural form, they all are called simply conv. So for example, one could have written:

   ZZ_pX f1;
   conv(f1, f);
   return DetIrredTest(f1);

Example 11

Suppose in the above example that p is known in advance to be a small, single-precision prime. In this case, NTL provides a class zz_p, that acts just like ZZ_p, along with corresponding classes vec_zz_p, mat_zz_p, and zz_pX. The interfaces to all of the routines are generally identical to those for ZZ_p. However, the routines are much more efficient, in both time and space.

For small primes, the routine in the previous example could be coded as follows.

#include "ZZX.h"
#include "lzz_pXFactoring.h"
long IrredTestMod(const ZZX& f, long p)
{
   zz_pBak bak; 
   bak.save();

   zz_p::init(p);  

   return DetIrredTest(to_zz_pX(f));
}

Example 12

This example illustrates the GF2X and mat_GF2 classes with a simple routine to test if a polynomial over GF(2) is irreducible using linear algebra. NTL's built-in irreducibility test is to be preferred, however.


#include "GF2X.h"
#include "mat_GF2.h"

long MatIrredTest(const GF2X& f)
{
   long n = deg(f);

   if (n <= 0) return 0;
   if (n == 1) return 1;

   if (GCD(f, diff(f)) != 1) return 0;

   mat_GF2 M;

   M.SetDims(n, n);

   GF2X x_squared = GF2X(2, 1);

   GF2X g;
   g = 1;

   for (long i = 0; i < n; i++) {
      VectorCopy(M[i], g, n);
      M[i][i] += 1;
      g = (g * x_squared) % f;
   }

   long rank = gauss(M);

   if (rank == n-1)
      return 1;
   else
      return 0;
}

Note that the statement

   g = (g * x_squared) % f;
could be replace d by the more efficient code sequence
   MulByXMod(g, g, f);
   MulByXMod(g, g, f);
but this would not significantly impact the overall running time, since it is the Gaussian elimination that dominates the running time.


Programming Interface

In this section, we give a general overview of the NTL's programming interface.

Basic Ring Classes

The basic ring classes are:

All these classes all support basic arithmetic operators

   +, -, (unary) -, +=, -=, ++, --, 
   *, *=, /, /=, %, %=,
as well as procedural variants
   add(x, a, b); // x = a + b
   sub(x, a, b); // x = a - b
   negate(x, a); // x = - a
   mul(x, a, b); // a = a * b
   div(x, a, b); // x = a / b
   rem(x, a, b); // x = a % b
   DivRem(x, y, a, b); // x = a / b, y = a % b

However, the operations

   %, %=, rem, DivRem
do not exist for classes ZZ_p, zz_p, GF2, ZZ_pE, zz_pE, GF2E.

The standard equality operators (== and !=) are provided for each class. In addition, the class ZZ supports the usual inequality operators.

The integers and polnomial classes also support "shift operators" for left and right shifting. For polynomial classes, this means multiplication or division by a power of X.

Floating Point Classes

In addition to the above ring classes, NTL also provides three different floating point classes:

Vectors and Matrices

There are also vectors and matrices over

   ZZ ZZ_p zz_p ZZ_pE zz_pE GF2E RR
which support the usual arithmetic operations.

Functional and Procedural forms

Generally, for any function defined by NTL, there is a functional form, and a procedural form. For example:

   ZZ x, a, n;
   x = InvMod(a, n);  // functional form
   InvMod(x, a, n);   // procedural form

This example illustrates the normal way these two forms differ syntactically. Howerver, there are exceptions. First, if there is a operator that can play the role of the functional form, that is the notation used:

   ZZ x, a, b;
   x = a + b;    // functional form
   add(x, a, b); // procedural form
Second, if the functional form's name would be ambiguous, the return type is simply appended to its name:
   ZZ_p x;
   x = random_ZZ_p();  // functional form
   random(x);          // procedural form
Third, there are a number of conversion functions (see below), whose name in procedural form is conv, but whose name in functioanl form is to_T, where T is the return type:
   ZZ x;  
   double a;

   x = to_ZZ(a);  // functioanl form
   conv(x, a);    // procedural form

The use of the procedural form may be more efficient, since it will generally avoid the creation of a temporary object to store its result. However, it is generally silly to get too worked up about such efficiencies, and the functional form is usually preferable because the resulting code is usually easier to understand.

The above rules converning procedural and functional forms apply to essentially all of the arithmetic classes supported by NTL, with the exception of xdouble and quad_float. These two classes only support the functional/operator notation for arithmetic operations (but do support both forms for conversion).

Conversions and Promotions

NTL does not provide automatic conversions from, say, int to ZZ. C++ experts generally consider such automatic conversions bad form in library design, and I would agree with them.

As mentioned above, there are numerous explicit conversion routines, which come in both functional and procedural forms. A complete list of these can be found in conversions.txt. This is the only place these are documented; they do not appear in the files ZZ.txt, etc.

Even though there are no automatic conversions, users of NTL can still have most of their benefits. This is because all of the basic arithmetic operations (in both their functional and procedural forms), comparison operators, and assignment are overloaded to get the effect of automatic "promotions". For example:

   ZZ x, a;

   x = a + 1;
   if (x < 0) 
      mul(x, 2, a);
   else
      x = -1;

In the documentation, not all of these promotions are document explicitly. Doing so would make the documentation hard to read. Instead, the documentation files contain comments of the form:

ZZ operator+(const ZZ& a, const ZZ& b);

// PROMOTIONS: operator + promotes long to ZZ on (a, b).
This means that in addition to the declared function, there are two other functions that take arguments of type (long, ZZ) and (ZZ, long) respectively.

Moreover, that it is in generally more efficient to write

   x = y + 2;
than it is to write
   x = y + to_ZZ(2);
The former notation generally avoids the creation of a temporary ZZ object to hold the value 2.

Also, don't have any inhibitions about writing tests like

   if (x == 0) ...
and assignments like
   x = 1; 
These are all optimized, and do not execute significaltly slower than the "lower level" (and much less natural-looking)
   if (IsZero(x)) ...
and
   set(x);


Summary of NTL's Main Modules

NTL consists of a number of software modules. Generally speaking, for each module foo, there is a header file foo.h, and an implementation file foo.c. There is also a documentation file foo.txt. This takes the form of a header file, but stripped of implementation details and declarations of some of the more esoteric routines and data structures, and it contains more complete and usually clearer documentation than in the header file.

The following is a summary of the main NTL modules. The corresponding ".txt" file can be obtained by clicking on the module name.

GF2 class GF2: integers mod 2

GF2X class GF2X: polynomials over GF(2) (much more efficient than using zz_pX with p=2); includes routines for GCDs and minimal polynomials

GF2XFactoring routines for factoring polynomials over GF(2); also includes routines for testing for and constructing irreducible polynomials

GF2XVec class GF2XVec: fixed-length vectors of fixed-length GF2Xs; less flexible, but more efficient than vec_GF2X

GF2E class GF2E: polynomial extension field/ring over GF(2), implemented as GF(2)[X]/(P).

GF2EX class GF2EX class GF2EX: polynomials over GF2E; includes routines for modular polynomials arithmetic, modular composition, minimal and characteristic polynomials, and interpolation.

GF2EXFactoring routines for factoring polynomials over GF2E; also includes routines for testing for and constructing irreducible polynomials

HNF routines for computing the Hermite Normal Form of a lattice

LLL routines for performing lattice basis reduction, including very fast and robust implementations of the Schnorr-Euchner LLL and Block Korkin Zolotarev reduction algorithm, as well as an integer-only reduction algorithm.

RR class RR: arbitrary-precision floating point numbers.

ZZ class ZZ: arbitrary length integers; includes routines for GCDs, Jacobi symbols, modular arithmetic, and primality testing; also includes small prime generation routines and in-line routines for single-precision modular arithmetic

ZZVec class ZZVec: fixed-length vectors of fixed-length ZZs; less flexible, but more efficient than vec_ZZ

ZZX class ZZX: polynomials over ZZ; includes routines for GCDs, minimal and characteristic polynomials, norms and traces

ZZXFactoring routines for factoring univariate polynomials over ZZ

ZZ_p class ZZ_p: integers mod p

ZZ_pE class ZZ_pE: ring/field extension of ZZ_p

ZZ_pEX class ZZ_pEX: polynomials over ZZ_pE; includes routines for modular polynomials arithmetic, modular composition, minimal and characteristic polynomials, and interpolation.

ZZ_pEXFactoring routines for factoring polynomials over ZZ_pE; also includes routines for testing for and constructing irreducible polynomials

ZZ_pX class ZZ_pX: polynomials over ZZ_p; includes routines for modular polynomials arithmetic, modular composition, minimal and characteristic polynomials, and interpolation.

ZZ_pXFactoring routines for factoring polynomials over ZZ_p; also includes routines for testing for and constructing irreducible polynomials

lzz_p class zz_p: integers mod p, where p is single-precision

lzz_pE class zz_pE: ring/field extension of zz_p

lzz_pEX class zz_pEX: polynomials over zz_pE; provides the same functionality as class ZZ_pEX, but for single-precision p

lzz_pEXFactoring routines for factoring polynomials over zz_pE; provides the same functionality as class ZZ_pEX, but for single-precision p

lzz_pX class zz_pX: polynomials over zz_p; provides the same functionality as class ZZ_pX, but for single-precision p

lzz_pXFactoring routines for factoring polynomials over zz_p; provides the same functionality as class ZZ_pX, but for single-precision p

mat_GF2 class mat_GF2: matrices over GF2; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination

mat_GF2E class mat_GF2E: matrices over GF2E; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination

mat_RR class mat_RR: matrices over RR; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, and solving nonsingular systems of linear equations.

mat_ZZ class mat_ZZ: matrices over ZZ; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, and solving nonsingular systems of linear equations

mat_ZZ_p class mat_ZZ_p: matrices over ZZ_p; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination

mat_ZZ_pE class mat_ZZ_pE: matrices over ZZ_pE; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination

mat_lzz_p class mat_zz_p: matrices over zz_p; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination

mat_lzz_pE class mat_zz_pE: matrices over zz_pE; includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination

mat_poly_ZZ routine for computing the characteristic polynomial of a mat_ZZ

mat_poly_ZZ_p routine for computing the characteristic polynomial of a mat_ZZ_p

mat_poly_lzz_p routine for computing the characteristic polynomial of a mat_zz_p

ntl_matrix template-like macros for dynamic-size 2-dimensional arrays

ntl_pair template-like macros for pairs

ntl_vector template-like macros for dynamic-size vectors

quad_float class quad_float: quadruple-precision floating point numbers.

tools some basic types and utility routines, including the timing function GetTime().

vec_GF2 class vec_GF2: vectors over GF2, with arithmetic

vec_GF2E class vec_GF2E: vectors over GF2E, with arithmetic

vec_RR class vec_RR: vectors over RR, with arithmetic

vec_ZZ class vec_ZZ: vectors over ZZ, with arithmetic

vec_ZZ_p class vec_ZZ_p: vectors over ZZ_p, with arithmetic

vec_ZZ_pE class vec_ZZ_pE: vectors over ZZ_pE, with arithmetic

vec_lzz_p class vec_zz_p: vectors over zz_p, with arithmetic

vec_lzz_pE class vec_zz_pE: vectors over zz_pE, with arithmetic

xdouble class xdouble: double-precision floating point numbers with extended exponent range.


Obtaining and Installing NTL for UNIX

To obtain the source code and documentation for NTL, download ntl-3.0.tar.gz, placing it an empty directory, and then, working in this directory, execute the following command:

gunzip ntl-3.0.tar.gz
tar xvf ntl.tar
There is a makefile, which you might need to edit just a little. You need to specify a C++ compiler, and optionally, a compatible C compiler. A few source files are written in pure C, and will compile under C or C++, but using a C compiler sometimes yields better code.

The default settings in the makefile use the Gnu compilers g++ and gcc. In fact, if you are using Gnu, you should not have to edit the makefile at all, except to fine-tune some settings affecting performance.

There are a variety of compiler flags that you can set to customize the compilation, affecting the quality of the compiled code.

After editing "makefile", just execute make. The first thing that the makefile does is to build the file mach_desc.h, which defines some machine characteristics such as word size and machine precision. This is done by compiling and running a C program called MakeDesc that figures out these characteristics on its own, and prints some diagnostics to the terminal.

After this, the makefile will compile all the source files, and then create the library ntl.a.

Finally, the makefile compiles and runs a series of test programs. The output generated should indicate if there are any problems.

Executing make clean will remove unnecessary object files.

Executing make clobber removes everything that was generated by a previous installation. Make sure you do this if you re-build NTL for a different architecture!

When linking a program, you need to include ntl.a and -lm as libraries. If you have a driver program foo.c, just execute make foo to build the program foo.

The compilation should run smoothly on just about any UNIX platform. The only real trouble I've ran into is the GNU compiler for PowerPC. The release I have has a code generation bug for which I've found no easy work-around. I've been told that the bug has been fixed in an upcoming release. The IBM AIX compilers xlc/xlC, however, work fine.


Obtaining and Installing NTL for Windows

The WinNTL distribution of NTL can be used on any Windows 95 or NT platform (but not on Windows 3.11 or earlier). Actually, there is nothing Windows-specific about WinNTL. The source code is identical to the UNIX NTL distribution; only the packaging is slightly different, and no assumptions are made about the program development environment. Thus, it should be possible to install WinNTL on other operating systems (e.g., Macintosh, OS/2) with little difficulty.

To obtain the source code and documentation for NTL, download WinNTL-3_0.zip. Then "unzip" this file into a directory (folder). You will find several directories.

The directory "doc" contains all of NTL's documentation, including this "tour" (tour.html) and the ".txt" files explaining each module, which can be accessed directly, or through the "tour".

The directory "src" contains all of the source files for the library, all with ".cpp" extensions. The file "lip.cpp" can be compiled as a C source file (this can sometimes yield a marginal performance gain).

The directory "include" contains all of the ".h" files. In this directory is a file called "mach_desc.h", which contains all of the platform-dependent macro definitions. The default settings should be correct for any x86- or Pentium- based system running Windows; however, the correct definitions can depend on the compiler and run-time environment. Therefore, to be on the safe side, you might consider compiling and running the program "MakeDesc", whose source files are in directory "MakeDesc". This program will dynamically build a correct "mach_desc.h" for your platform (processor, compiler, run-time environment). To get accurate results, you must compile this program using the level of optimization (or higher) that you will use for NTL. The program will print some diagnostics to the screen, and create the file "mach_desc.h" (in the current directory, and not necessarily in the "include" directory, where it needs to go).

The directory "tests" contains several test programs. For each program FooTest, there is a source file FooTest.cpp, and optionally two files FooTestIn and FooTestOut. If the latter exist, then the program should be run with the FooTestIn as standard input; correct output (printed to standard output) should match the contents of FooTestOut exactly; note that these programs also print diagnostic output on the screen (through standard error output).

The directory "GetTime" contains several alternative definitions of the GetTime() function. The file "GetTime.cpp" in the "src" directory should be OK, but your compiler might like one of the definitions in the directory "GetTime" better.

Since there are a number of incompatible compilers and program development environments available for Windows, no attempt has been made to provide automatic tools for building and testing, as is done for the Unix distribution. Nevertheless, it should be straightforward to install NTL (even if it involves a bit of pointing and clicking). First, compile all of the files in "src", and create a static library. Make sure the compiler knows where to find NTL's include files. Then, to compile a program using the library, make sure the compiler knows about the library and the directory of NTL's include files. You might also want to try out some of the compiler flags that NTL understands to customize the code generation. In any case, if you want to do any serious computations, you will certainly want to compile everything with your compiler's code optimizer on.

NTL has been successfully installed and tested on Windows 95 platforms with both the Microsoft and Borland compilers.

If you are using Windows but have some kind of Unix development toolkit, then you might try using the Unix distribution of NTL. I've never tried this, so I have no idea if it works.


NTL Implementation and Portability

NTL is designed to be portable, fast, and relatively easy to use and extend.

To make NTL portable, no assembly code is used. This is highly desirable, as architectures are constantly changing and evolving, and maintaining assembly code is quite costly. By avoiding assembly code, NTL should remain usable, with virtually no maintenance, for many years.

The main drawback of this philosophy is that without assembly code, one cannot use machine instructions to obtain double-word products, or perform double-word by single-word division. There are a number of possible strategies for dealing with this. NTL's basic strategy uses a combination of integer and floating-point instruction sequences, carefully crafted to to exploit any pipelining or parallel instruction execution that the underlying processor may support. This strategy is much faster than the naive strategy of using a half-word radix.

To carry out this strategy, NTL makes two requirements of its platform, neither of which are guaranteed by the C++ language definition, but nevertheless appear to be essentially universal:

  1. Integers are represented using 2's complement, and integer overflow is not trapped, but rather just wraps around.
  2. Double precision floating point conforms to the IEEE standard.

Actually, with some modification, NTL would not need the first requirement, by exploiting language definitions dealing with unsigned arithmetic. Future versions of NTL may incorporate this modification, if there is any need for it (but this seems unlikely at the moment).

Relying on floating point may seem prone to errors, but with the guarantees provided by the IEEE standard, one can prove the correctness of the NTL code that uses floating point. Actually, NTL is quite conservative, and substantially weaker conditions are sufficient for correctness. In particular, NTL works correctly with any rounding mode, and also with any mix of double precision and extended double precision operations (which arise, for example, with Intel x86 processors). One exception to this is the quad_float module (and by inference the LLL_QP and related routines) which requires something quite close to the IEEE standard (although a mix of double and extended double precision will still work).

With this strategy, NTL represents arbitrary length integers using a 30-bit radix on 32-bit machines, and a 50-bit radix on 64-bit machines.

This general strategy is used in A. K. Lenstra's LIP library for arbitrary-length integer arithmetic. Indeed, NTL's integer arithmetic evolved from LIP, but over time almost all of this code has been rewritten to enhance performance. LIP's philosophy of "portability plus performance" carries on in NTL.

Long integer multiplication is implemented using the classical algorithm, crossing over to Karatsuba for very big numbers. Polynomial multiplication and division is carried out using a combination of the classical algorithm, Karatsuba, the FFT using small primes, and the FFT using the Schoenhagge-Strassen approach. Also, many algorithms employed throughout NTL are recent inventions of the author (Victor Shoup) and his colleagues (Joachim von zur Gathen, Erich Kaltofen).


Some Performance Data

NTL is high-performance software, offering high-quality implementations of the best algorithms. Here are some timing figures from the current version of NTL. The figures were obtained using an IBM RS6000 Workstation, Model 43P-133, which has a 133 MHz PowerPC Model 604 processor. The operating system is AIX and the compiler is xlC. The compiler options were -O2 -qarch=ppc -DNTL_AVOID_FLOAT -DNTL_TBL_REM.

The first problem considered is the factorization of univariate polynomials modulo a prime p. As test polynomials, we take the family of polynomials defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995]. For every n, we define p to be the first prime greater than 2^{n-2}*PI, and the polynomial is

\sum_{i=0}^n a_{n-i} X^i,

where a_0 = 1, and a_{i+1} = a_i^2 + 1. Here are some running times:

n 64 128 256 512 1024
hh:mm:ss 2 13 1:53 21:01 4:05:25

Also of interest is space usage. The n = 512 case used 4MB main memory, and the n = 1024 case used 17 MB main memory.

Another test suite, this time using small primes, was used by Kaltofen and Lobo (Proc. ISSAC '94). One of their polynomials is a degree 10001 polynomial, modulo the prime 127. This polynomial was factored with NTL in just over 3 hours, using 17MB of memory.

The second problem considered is factoring univariate polynomials over the integers. We use two test suites. In the first, we factor F_n(X)*F_{n+1}(X), where

F_n(X) = \sum_{i=0}^n f_{n-i}*X^i,

and f_i is the i-th Fibonacci number (f_0 = 1, f_1 = 1, f_2 = 2, ...). Here are some running times:

n 100 200 300 400 500 1000
hh:mm:ss 11 34 1:44 2:35 3:35 15:20

The space in the n=500 case was under 5MB, and in the n=1000 case, under 13MB.

The second test suite comes from Paul Zimmermann . The polynomial P1(X) has degree 156, coefficients up to 424 digits, and 36 factors (12 of degree 2, 15 of degree 4, 9 of degree 8). The polynomial P2(X) has degree 196, coefficients up to 419 digits and 12 factors (2 of degree 2, 4 of degree 12 and 6 of degree 24). The polynomial P3(X) has degree 336, coefficients up to 597 digits and 16 factors (4 of degree 12 and 12 of degree 24). The polynomial P4(X) has degree 462, coefficients up to 756 digits, and two factors of degree 66 and 396. More details on this test suite are available.

Our running times (hh:mm:ss) were as follows:

21, 23, 1:16, 1:37:10.

In all cases less than 5MB of main memory was used.

NTL's lattice basis reduction code has been used to push the envelope on breaking new lattice-based cryptosystems. To date, NTL's lattice code has been used to break the GGH cryptosystem [Goldreich, Goldwasser, Halevi, Crypto '97] in dimension 200. This was done in about 3 days time on a 140MHz UltraSparc. The experiments were designed and conducted by Phong Nguyen (Phong.Nguyen@ens.fr) using NTL 1.7.

Please don't get the impression that NTL requires 3 days to reduce all dimension 200 lattices. "Random" lattices of this dimension can be reduced in a matter of minutes. The lattices arising from GGH are particularly difficult to reduce, however.


Summary of Changes

Changes between NTL 2.0 and 3.0

Compatibility

Here is a detailed list of the changes to the programming interface.

Tips on making the transition

Changes between NTL 1.7 and 2.0

Changes between NTL 1.5 and NTL 1.7

Changes between NTL 1.0 and NTL 1.5


Acknowledgements

I'd like to thank Arjen Lenstra (arjen.lenstra@citicorp.com) and Keith Briggs (kmb28@cus.cam.ac.uk) for letting me steal their code and not complaining too much.

Arjen Lenstra wrote LIP, a long integer package, which formed the basis of NTL. Keith Briggs developed a quadratic precision package, which was incorporated into NTL 1.7.

Thanks also to Juergen Gerhard (jngerhar@plato.uni-paderborn.de) for pointing out the deficiency in the NTL-1.0 ZZX arithmetic, for contributing the Schoenhage/Strassen code to NTL 1.5, and for helping to track down some bugs.

Also, many thanks to Phong Nguyen (Phong.Nguyen@ens.fr) for putting the new LLL code (NTL 1.7) through a torture test of lattices arising from new lattice-based cryptosystems; this led to a number of significant improvements in the LLL code.

Thanks also to Dan Boneh for encouraging me to improve NTL's programming interface.


Back to NTL page

Back to Victor Shoup's home page