Hi Marc, there are several reasons for the poor results you get:
1. gcc is not able to optimize away expression templates overhead.
This causes there to be a "start-up" cost for evaluating every
expression. If the expression involves big arrays, the cost is
negligible. However, you are benchmarking 4x4 matrix products,
so the overhead is quite noticible.
2. Tensor notation is less efficient than other expressions.
The presence of index placeholders causes blitz to revert to
a less efficient evaluation routine -- evaluateWithIndexTraversalN()
instead of evaluateWithStackTraversalN(). (see blitz/array/eval.cc)
This causes a performance loss even for big arrays.
3. You will likely notice a BIG speed improvement if you use
TinyMatrix and TinyVector, e.g.
TinyMatrix<complex<double>,4,4> mat1;
TinyVector<complex<double>,4> vec1, vec2;
vec2 = product(mat1,vec1);
Cheers,
Todd
Marc Wilken wrote:
>
> Do you have done a benchmark of matrix-vector multiplication?
> I compared a pure fotran77 code with the sum(a(i,j)*v(j),j) notation.
> The fortran77 code is approximately
> 7 times faster than the C++ code using the blitz-library. Is this
> possible, or did I something wrong?
> Here is the code and the compiler -options I used:
> Fortran-code:
> PROGRAM matmult
>
> INTEGER*4 kk,ii,jj
> COMPLEX*16 mat1(4,4),vec1(4),vec2(4)
>
> DO kk=1,4
> DO ii=1,4
> mat1(kk,ii)=2.
> ENDDO
> ENDDO
>
> mat1(1,1)=1.
> mat1(2,2)=2.
> mat1(3,3)=cmplx(3.,2.)
> mat1(4,4)=4.
>
> vec1(1)=1.
> vec1(2)=2.
> vec1(3)=3.
> vec1(4)=4.
>
>
> DO kk=1,100000
> vec2(1)=0.
> vec2(2)=0.
> vec2(3)=0.
> vec2(4)=0.
> DO ii=1,4
> DO jj=1,4
> vec2(ii) = mat1(ii,jj)*vec1(jj)+vec2(ii)
> ENDDO
> ENDDO
> ENDDO
> END
>
> Fortran-makefile:
> OPTS=-O
>
> matmult: matmult.o
> f77 $^ -o $@
>
> matmult.o: matmult.f
> f77 $(OPTS) -c $<
>
>
>
> C++-code:
> #include <blitz/array.h>
>
> int main()
> {
> Array<complex<double>,1> dcv1(4);
> Array<complex<double>,1> dcv2(4);
> Array<complex<double>,2> dcm1(4,4);
>
> firstIndex i;
> secondIndex j;
> int kk;
>
> dcv1(0)=1.;
> dcv1(1)=2.;
> dcv1(2)=3.;
> dcv1(3)=4.;
>
> dcm1=2.;
> dcm1(0,0)=1.;
> dcm1(1,1)=2.;
> dcm1(2,2)=complex<double>(3.,2.);
> dcm1(3,3)=4.;
>
> for(int kk=0;kk<100000;kk++){
> dcv4=sum(dcm3(i,j)*dcv3(j),j);
> };
>
> return 0;
> }
>
> compiler gcc-options:
> CPPFLAGS = -O2 -ffast-math -ftemplate-depth-30 -funroll-loops
> -fstrict-aliasing -fno-gcse
>
> Thanks in advance
> Marc
>
> --------------------- blitz-support list --------------------------------
> * To subscribe/unsubscribe: use the handy web form at
> http://oonumerics.org/blitz/lists.html
>
>
-- Todd Veldhuizen tveldhui@acm.org Indiana Univ. Comp. Sci. http://extreme.indiana.edu/~tveldhui/--------------------- blitz-support list -------------------------------- * To subscribe/unsubscribe: use the handy web form at http://oonumerics.org/blitz/lists.html
This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 05:10:07 EST