openmpi -abinit hangs on Tutorial t41 !?!

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
spamrefuse
Posts: 35
Joined: Wed Jan 20, 2010 3:08 am

openmpi -abinit hangs on Tutorial t41 !?!

Post by spamrefuse » Thu Sep 30, 2010 10:43 am

Hi,

My PC has an Intel Core i7 CPU 950 @ 3.07GHz.
I use abinit 6.2.3 on an up-to-date Fedora system with openmpi 1.4.1

I have configured and compiled as follows:

Code: Select all

$ ./configure --prefix=/opt/abinit --program-suffix="_openmpi" \
            --disable-netcdf --disable-bigdft --disable-wannier90 --disable-libxc \
            --with-linalg-libs="-L/usr/lib  -L/usr/lib/openmpi/lib -lblas -llapack -lscalapack -lmpiblacs" \
       --enable-mpi --enable-mpi-io --with-mpi-prefix="/usr/lib/openmpi" --enable-scalapack \
       LD_LIBRARY_PATH=/usr/lib/openmpi/lib > configure.out
$ make > make.out
# make install


You can view the contents of the files configure.out and make.out here:

http://skku.homeip.net/lahaye/abinit/

Then I run the mpi executable on the tutorial t41:

Code: Select all

$ /usr/lib/openmpi/bin/mpirun -v -np 4 /opt/abinit/bin/abinit_openmpi < t4x.files > log &
[1] 19252


The top command shows that four instances of abinit_openmpi are indeed running, and

Code: Select all

$ ls -1
log
t41.in
t4x.files
t4x_LOG_0001
t4x_LOG_0002
t4x_LOG_0003
t4x.out
t4x_STATUS
t4x_STATUS_P-0001
t4x_STATUS_P-0002
t4x_STATUS_P-0003


Also these files are at http://skku.homeip.net/lahaye/abinit/

The log file seems to hang for ever at:

Code: Select all

$ tail log
  scfcv : before setvtr, energies%e_hartree=   0.0000000000000000     

 ewald : nr and ng are    3 and   11
  mklocl_recipspace : will add potential with strength vprtrb(:)=   0.0000000000000000        0.0000000000000000     
  setvtr : istep,n1xccc,moved_rhor=           1           0           0
  scfcv : after setvtr, energies%e_hartree=   0.0000000000000000     

 ITER STEP NUMBER     1
 vtorho : nnsclo_now=  2, note that nnsclo,dbl_nnsclo,istep=  0 0  1


Although the four mpi executables of abinit show up as running
in the 'top' command, there is nothing updated in the STATUS and
LOG files. Also, no other output files are created.

Any idea what is the problem with this OpenMPI abinit version?
Note: the sequential version of abinit has no problem with tutorial t41 !

Thank you,
Rob.

PS: I have compiled and run the typical 'helloworld.c' openmpi program
without any problem:

Code: Select all

#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

  MPI_Finalize();
}


Code: Select all

$ /usr/lib/openmpi/bin/mpiCC helloworld.c
$ /usr/lib/openmpi/bin/mpirun -np 6 ./a.out
Process 0 on condor.dns.org out of 6
Process 2 on condor.dns.org out of 6
Process 4 on condor.dns.org out of 6
Process 3 on condor.dns.org out of 6
Process 1 on condor.dns.org out of 6
Process 5 on condor.dns.org out of 6

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: openmpi -abinit hangs on Tutorial t41 !?!

Post by mverstra » Mon Oct 11, 2010 11:23 am

You only have 2 k-points. I believe this is the issue: abinit doesn't accept having empty processors (no k-points left for procs 3 and 4)

Matthieu
Matthieu Verstraete
University of Liege, Belgium

Locked