MPI_ERR_NO_SUCH_FILE: no such file or directory  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
sidiq
Posts: 11
Joined: Tue Jul 24, 2012 11:04 am

MPI_ERR_NO_SUCH_FILE: no such file or directory

Post by sidiq » Thu Feb 21, 2013 6:21 am

Hi,

I'm trying to relax zinc in bulk on an 11 node cluster. I have included the input file (zn.in). After it relaxes, I am given the error

Code: Select all

 ----iterations are completed or convergence reached----

-P-0000  wffopen.F90:165:WffOpen
-P-0000 MPI_ERR_IO: input/output error
-P-0000
-P-0000  leave_new : decision taken to exit ...


as shown in the log file (log.in) which I've included. I am also given the MPI error

Code: Select all

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 9 in communicator MPI_COMM_WORLD
with errorcode 14.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 10 with PID 9336 on
node XXX.XXX.XXX.XXX exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[azzaruh:03046] 10 more processes have sent help message help-mpi-api.txt / mpi-abort
[azzaruh:03046] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages


I've also discovered that in every node, there is the error as shown in the LOG file

Code: Select all

-P-0009  wffopen.F90:165:WffOpen
-P-0009 MPI_ERR_NO_SUCH_FILE: no such file or directory
-P-0009
-P-0009  leave_new : decision taken to exit ...


I've included the log file from one of the nodes (t4x_LOG_0009.in). Unfortunately, I've found out that I do have the file wffopen.F90 in abinit-6.12.3/src/59_io_mpi/wffopen.F90. So, I would appreciate any comment or help. Thanks.
Attachments
zn.in
(352 Bytes) Downloaded 298 times
log.in
(249.14 KiB) Downloaded 290 times
t4x_LOG_0009.in
(44.74 KiB) Downloaded 296 times

User avatar
Alain_Jacques
Posts: 279
Joined: Sat Aug 15, 2009 9:34 pm
Location: Université catholique de Louvain - Belgium

Re: MPI_ERR_NO_SUCH_FILE: no such file or directory  [SOLVED]

Post by Alain_Jacques » Thu Feb 21, 2013 12:03 pm

In fact, the error message means that a particular node hasn't find a wavefunction file it needs.
Abinit parallelization is not a blackbox. First of all, I see in your input file that you only selected parallelization over k points. No need to complicate things, keep the default value of paral_kgb=0, remove the npXXX variables) and adjust the number of processors assigned to a relevant value. I would highly suggest to choose a divisor of the number of k points - 24 here at the first glance - so try mpirun -n 4 ... or mpirun -n 8 ... but 11 is odd.
And if you want to play with parallelization on bands or on ffts, let abinit give you some clues about the right combination by adding paral_kgb=-8 in you input file as explained in the tutorials.

btw ntime=1000 ... are you sure?

Kind regards,

Alain

sidiq
Posts: 11
Joined: Tue Jul 24, 2012 11:04 am

Re: MPI_ERR_NO_SUCH_FILE: no such file or directory

Post by sidiq » Mon Feb 25, 2013 12:45 pm

Hi Alain,

You were right. I changed paral_kgb=0 and chose the number of nodes as a factor of the k-points and everthing works great. I have to admit that i exited the parallel tutorial after i thought that paral_kgb=-n will recommend the parallel variables for you in the same way that prtkpt 1 will recommend k-point variables for you. But I guess there's always something around the corner..

Thanks for the help..

Locked