I'm trying to relax zinc in bulk on an 11 node cluster. I have included the input file (zn.in). After it relaxes, I am given the error
Code: Select all
----iterations are completed or convergence reached----
-P-0000 wffopen.F90:165:WffOpen
-P-0000 MPI_ERR_IO: input/output error
-P-0000
-P-0000 leave_new : decision taken to exit ...
as shown in the log file (log.in) which I've included. I am also given the MPI error
Code: Select all
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 9 in communicator MPI_COMM_WORLD
with errorcode 14.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 10 with PID 9336 on
node XXX.XXX.XXX.XXX exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[azzaruh:03046] 10 more processes have sent help message help-mpi-api.txt / mpi-abort
[azzaruh:03046] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I've also discovered that in every node, there is the error as shown in the LOG file
Code: Select all
-P-0009 wffopen.F90:165:WffOpen
-P-0009 MPI_ERR_NO_SUCH_FILE: no such file or directory
-P-0009
-P-0009 leave_new : decision taken to exit ...
I've included the log file from one of the nodes (t4x_LOG_0009.in). Unfortunately, I've found out that I do have the file wffopen.F90 in abinit-6.12.3/src/59_io_mpi/wffopen.F90. So, I would appreciate any comment or help. Thanks.