Dear All:
I'm performing geometry optimization over 16x12 cores, but the code Abort after the first iteration (some times after many Iterations). I checked the log file, where I found that it ends up with the following error after the end of the first iteration:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ioarr: writing density data
ioarr: file name is tbase1_xo_TIM1_DEN
m_wffile.F90:327:COMMENT
MPI/IO accessing FORTRAN file header: detected record mark length=4
opened file : tbase1_xo_TIM1_DOS_AT0005 unit 10
opened file : tbase1_xo_TIM1_DOS_AT0010 unit 11
about to write to the DOS file
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 10 in communicator MPI_COMM_WORLD
with errorcode 14.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 167 with PID 13616 on
node n005-ib exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[n017:15322] 133 more processes have sent help message help-mpi-api.txt / mpi-abort
[n017:15322] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
"log" 5510L, 392185C 5510,1 Bot
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
I'm using ABINIT 7.4.3 version with MPI . I don't really know what does this error mean, nor what is causing it. especially this is happening many times.
I will appreciate any help ....
Best regards,
conta ..
problem in parallelization during running !!!!
Moderator: bguster