Page 1 of 1

openmpi -abinit hangs on Tutorial t41 !?!

Posted: Thu Sep 30, 2010 10:43 am
by spamrefuse
Hi,

My PC has an Intel Core i7 CPU 950 @ 3.07GHz.
I use abinit 6.2.3 on an up-to-date Fedora system with openmpi 1.4.1

I have configured and compiled as follows:

Code: Select all

$ ./configure --prefix=/opt/abinit --program-suffix="_openmpi" \
            --disable-netcdf --disable-bigdft --disable-wannier90 --disable-libxc \
            --with-linalg-libs="-L/usr/lib  -L/usr/lib/openmpi/lib -lblas -llapack -lscalapack -lmpiblacs" \
       --enable-mpi --enable-mpi-io --with-mpi-prefix="/usr/lib/openmpi" --enable-scalapack \
       LD_LIBRARY_PATH=/usr/lib/openmpi/lib > configure.out
$ make > make.out
# make install


You can view the contents of the files configure.out and make.out here:

http://skku.homeip.net/lahaye/abinit/

Then I run the mpi executable on the tutorial t41:

Code: Select all

$ /usr/lib/openmpi/bin/mpirun -v -np 4 /opt/abinit/bin/abinit_openmpi < t4x.files > log &
[1] 19252


The top command shows that four instances of abinit_openmpi are indeed running, and

Code: Select all

$ ls -1
log
t41.in
t4x.files
t4x_LOG_0001
t4x_LOG_0002
t4x_LOG_0003
t4x.out
t4x_STATUS
t4x_STATUS_P-0001
t4x_STATUS_P-0002
t4x_STATUS_P-0003


Also these files are at http://skku.homeip.net/lahaye/abinit/

The log file seems to hang for ever at:

Code: Select all

$ tail log
  scfcv : before setvtr, energies%e_hartree=   0.0000000000000000     

 ewald : nr and ng are    3 and   11
  mklocl_recipspace : will add potential with strength vprtrb(:)=   0.0000000000000000        0.0000000000000000     
  setvtr : istep,n1xccc,moved_rhor=           1           0           0
  scfcv : after setvtr, energies%e_hartree=   0.0000000000000000     

 ITER STEP NUMBER     1
 vtorho : nnsclo_now=  2, note that nnsclo,dbl_nnsclo,istep=  0 0  1


Although the four mpi executables of abinit show up as running
in the 'top' command, there is nothing updated in the STATUS and
LOG files. Also, no other output files are created.

Any idea what is the problem with this OpenMPI abinit version?
Note: the sequential version of abinit has no problem with tutorial t41 !

Thank you,
Rob.

PS: I have compiled and run the typical 'helloworld.c' openmpi program
without any problem:

Code: Select all

#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

  MPI_Finalize();
}


Code: Select all

$ /usr/lib/openmpi/bin/mpiCC helloworld.c
$ /usr/lib/openmpi/bin/mpirun -np 6 ./a.out
Process 0 on condor.dns.org out of 6
Process 2 on condor.dns.org out of 6
Process 4 on condor.dns.org out of 6
Process 3 on condor.dns.org out of 6
Process 1 on condor.dns.org out of 6
Process 5 on condor.dns.org out of 6

Re: openmpi -abinit hangs on Tutorial t41 !?!

Posted: Mon Oct 11, 2010 11:23 am
by mverstra
You only have 2 k-points. I believe this is the issue: abinit doesn't accept having empty processors (no k-points left for procs 3 and 4)

Matthieu