Re: OONSTD: Abstract Data Types

Kent Budge (kgbudge@valinor.sandia.gov)
Wed, 8 Jul 1998 10:08:25 -0600

E. Robert Tisdale writes:
>
> Standards are necessary to support portable high-performance applications.
> I believe that there will be a standard for object oriented numerical computing.
> There will be defacto standards even if there is no formal standard.
> There is no need to impose object oriented numerical standards on anyone.
> Compiler vendors should NOT be required to provide numerical libraries.

I'd like my compiler vendor to supply the basic transcendental functions,
STL, and the complex number library. Beyond that, you may have a point.

I certainly have no interest in using a linear solver that is supplied
by a compiler vendor. I am perfectly happy using Sandia's AZTEC
package, which runs on every imaginable platform, can be called from
just about any language I'm interested in (it's written in C), is
efficient, and is well-documented. (And it's free, at least for me.)
I might wrap a set of C++ classes around it, but so far I haven't felt
a great need to do so.

I think what most of us have in mind by "numerical C++ library" is a set
of classes that support stuff numerical programmers do a lot, but no one
else is much interested in. I could compile a pretty respectable list
of what these things are by simply copying the table of contents of
the latest edition of "Numerical Recipes in C."

But how much of this stuff really needs class interfaces? Or use of
templates? I see no need to ++ code for which the existing C interface
is adequate. For that matter, how much of this stuff really needs to be
standardized? I could get an awful lot of mileage out of just two
++izations: a set of function classes to replace every function
pointer appearing in a Numerical Recipes argument list, and a simple
1-D array to be used with computed indices.

One thing that requires a standard class is complex numbers.
complex<T> is part of the proposed standard C++ library. And that's
where it belongs. Because, although no one who isn't a numerical
programmer is much interested in it, there's almost no disagreement
about what the class should look like. We just need to settle on the
spelling. And it's surprisingly easy for a tyro to botch the
implementation.

> (Perhaps valarray should be removed from the Standard Template Library.)

Valarray isn't part of the Standard Template Library. It's part of the
numerics library, along with complex numbers and a number of generic
algorithms. I discussed its history and motivation earlier in this
thread.

> Third party vendors should provide object oriented numerical libraries.

They won't unless there's money to be made in doing so. And the evidence
right now is that there is not. We're all having too much fun building
our own and arguing their merits.

There were two principal objections to valarray when it was first
proposed to X3J16. The first was a technical objection from the Rogue
Wave people that it didn't support the kind of math libraries they were
writing. This objection was answered by modifying and extending the
proposal. (In the process, the proposal was much improved.) The
second objection was an economic one from the Microsoft and Borland
representatives: Valarray costs money to implement, and at least 99%
of their customers will never use it.

This second objection, incidentally, is still the most persuasive
argument for killing valarray, if that's what you want to do.
Unfortunately, this is a razor that would shave complex<T> as well.
(Heck, there are still purists around who object to the decision to
support floating point numbers in C.)

> Compliance with standards would help third party vendors sell their libraries.
> Standards should NOT be designed by the standards committee.
> Competing standards should be proposed by individuals. The standards committee
> should simply select the best proposal and reject all other proposals.

This isn't how the standardization process works. The function of an
ANSI or ISO committee is to standardize existing practice. This does
*not* mean taking proposals and choosing the one that wins a plurality
on a committee vote. Rather, the committee is supposed to collect
information on what the existing practice is, identify inconsistencies,
and come to a consensus (e.g. almost a unanimous vote) on how the
inconsistencies should be resolved. If a consensus cannot be reached
on some topic, this is taken as an indication that there is not yet a
sufficient body of existing practice to justify standardizing that
particular area.

Of course, it is widely recognized that X3J16 has wildly exceeded this
charter. I believe the root cause of this was that the charter for
X3J16 was unusually broad to begin with. Templates and exceptions were
recognized as something that should be in the language, but prior art
(existing practice) simply didn't exist. (No one had a working
implementation except Stroustrup's group.) So the X3J16 charter
specifically noted that the portion of the standard dealing with
templates and exceptions would not be based on existing practice. This
implied a creative process for standardizing these parts of the
language. As the standard for templates (in particular) came together,
it was found that numerous adjustments were required to other portions
of the language to allow templates to do the things that people came to
expect from them. And once the creative juices began to flow, it was
hard to resist the temptation to add other things that solved important
problems and seemed oh so innocuous: run-time type identification,
new-style casts, and so on.

Another reason for X3J16 exceeding the original charter was
Stroustrup's oft-voiced opinion that the biggest mistake he made in
releasing C++ to the public as early as he did was that he had not yet
assembled an adequate class library to go with it. This was a strong
hint to those of us in the library working group that we ought to set
our sights high, and (you guessed it) be creative. The ultimate
expression of this was STL. STL is a brilliant concept. It belongs in
the standard library. But an adequate specification for STL required
extensive changes to the specification for templates, which impacted the
entire rest of the language. It also led to the unhappy precedent of
standardizing language features that had not yet been implemented by
anyone. (This may explain why so many current compilers gag on STL.)

In any case, the standardization process very nearly broke down. When
I first joined the committee, I witnessed occasions on which a straw
vote over a proposed resolution to an issue was 40 in favor, 3 against
-- and the three objections were enough to send the proposal back to
the working group for further study. The desire for consensus was that
strong. But two years later, at the last meeting I was able to attend,
a very important extension was approved -- and adopted -- by the barest
of majorities. The desire to get a standard out Real Soon Now had
become greater than the desire for consensus.

This should give you some idea why the committee didn't just vote on
proposals submitted by outsiders. The committee received many such
proposals -- each of us received six mailings a year, of at least three
hundred pages, a good share of which was proposals from people on or
off the committee. I daresay none of them was acceptable in the
original form. Even the few that were not rejected out of hand
required adjustments either to the proposal or to the rest of the
language -- and every such adjustment became controversial. STL was
such a proposal. It was so obviously brilliant that a majority of the
committee became committed to seeing it in the standard. But a large
minority fought it for the sole reason that it came late in the process
and required huge adjustments to everything else. (I don't recall
anyone objecting to it on its technical merits alone.)

...
>
> All of the existing vector, matrix and tensor libraries probably implement
> no more than a handful of ADTs which are both fundamentally different from
> each other and are useful to significant a number of application programmers.
> We could, if necessary, specify standards for each of them. Personally,
> I am not interested just now in convenient rapid prototyping systems
> like Matlab, Mathematica, Octave, MatClass, etc. or sparse matrix libraries.
> I am not particularly interested in any programming languages except C++.
> I am interested in a high-performance C++ class library which supports
> scalar, vector, matrix and tensor arithmetic on dense arrays of real numbers.

I, too, have only a limited interest in languages other than C++. Java
is C++ Lite without the ethanol. Eiffel is worthy of more serious
consideration, but I find it rather overconstrained at times, and it
doesn't support value types or genericity as well as I like. I don't
know anything about object-oriented ADA other than that it exists and
was standardized before C++.

Fortran-90 is not an object-oriented language, notwithstanding claims to
the contrary. I'd have a hard time taking it seriously, except that so
many others do.

Sparse matrices are of great interest to most numerical programmers.
They are the representation of local operators, which are awfully
important for (among other things) solving differential equations. I
think we could get an awful lot of mileage out of standardizing just
two layouts -- the layout of banded matrices, which should not be
controversial; and the layout of arbitrary sparse matrices, which is
more so. I would be happy to settle for modified sparse row (MSR)
format, which is recognizec by AZTEC and which has been successfully
generalized for distributed computing. It is more efficient (if
slightly less intuitive) than the Yale format used by ITPACK. But I
could probably live with two standard formats, so long as efficient
conversion routines exist. (Copying memory is relatively cheap.) I
believe both formats are described in "Numerical Recipes in C."

No doubt someone will want to add triangular matrices to the list, or
skyline matrices. I'm not particularly interested in the former and
I think a general sparse matrix layout is adequate for the latter.

The only thing controversial about the layout of dense arrays on single
processors is whether they should be row-dominant or column-dominant. It's
a toss-up, because Fortran chose one and C chose the other, so either way
you're equally screwed. (The "Numerical Recipes in C" approach is not
worthy of serious consideration.) The layout on parallel machines is
probably not controversial, either -- the only sane decomposition is by
block, which brings you back to the row/column dominance problem.

>
> The ADT that I have in mind would permit the application programmer
> to view a one-dimensional array of real numbers as a vector, matrix
> or [third order] tensor like many of the existing libraries
> except that the actual representation of the one-dimensional array
> would be hidden from the application programmer in order to permit
> library developers to implement it as they see fit. For example,
> the library developer might decide to store all or part of the 1D array
> in cache memory or distribute it across the nodes of a multi-processor.
> It may not possible to return a pointer or a reference to any element
> of the 1D array so special library functions are required to access elements.
> The C++ language binding would define SubScalar, SubVector, SubMatrix
> and SubTensor classes to reference all or part of the 1D array
> as a scalar, vector, matrix and tensor respectively.
> These classes would contain a "handle" to reference the 1D array,
> an offset from the beginning of the 1D array to the first element
> of the subscalar, subvector, submatrix or subtensor, and a stride and length
> for each dimension of the subvector, submatrix or subtensor.
> Vector, Matrix and Tensor classes would be derived
> from SubVector, SubMatrix and SubTensor classes respectively.
> SubScalar, SubVector, SubMatrix and SubTensor classes
> reference a 1D array created by Vector, Matrix or Tensor classes
> which allocate their own 1D array when they are created
> then deallocate their own 1D array when they are destroyed.
>
> This design is similar to MV++, Blitz++, etc. but is more general.
> If the library developer implemented the 1D array using an ordinary
> one dimensional C array of type float or double, it may be possible
> to use BLAS and LAPACK to implement some of the desired functionality.
> Otherwise, the library developer would need to provide functions
> compatible with the chosen representation. Bob Tisdale

What you are proposing sounds very much like valarray<T>, except that
you put greater emphasis on hiding the representation. Why? Programs
rarely have direct control over cache memory, and some exposure of the
implementation is required if you want to cross languages. (Like it or
not, most of us will be interfacing with FORTRAN and/or C for a long
time to come.) A distributed array doesn't make sense to me; the
communications cost would be prohibitive. Message passing is a more
sensible approach to distributed computing.

Kent G. Budge
Sandia National Laboratories