[Solved] Parallelism of multi-datasets

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
ljludwig
Posts: 77
Joined: Fri Jun 08, 2012 5:19 am

[Solved] Parallelism of multi-datasets

Post by ljludwig » Fri Jun 22, 2012 11:11 pm

Hello All:

There is a peculiar problem, that whenever I use multidataset mode (ndtset >1) in parallel abinit (mpirun), it reports problem in the log file:

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 14.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

On the other hand,
1) the .in file and .files file are good, since I can run them in serial with multi-datasets.
2) the mpi seems good, too, since without multi-dataset mode, it can run in mpi.


The log file tells more information:
MPI_ERROR_STRING: MPI_ERR_UNKNOWN: unknown error
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 24963 on
node node7 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here).


The information only says to call "init", but does not tell how to call it. I tried to look up the answer but still don't get it.

Could anyone shed some light on this problem? It is really frustrating. Thank you in advance.

ljludwig
Posts: 77
Joined: Fri Jun 08, 2012 5:19 am

Re: [Solved] Parallelism of multi-datasets

Post by ljludwig » Tue Jun 26, 2012 3:25 pm

In this issue, I may have to reconcile the problem with the number of datasets are greater than the number of cpus... It might be due to the different configuration procedure in compiling abinit.

Locked