Problem with number of nodes during mpirun
Posted: Tue Jul 12, 2016 9:37 am
Dear all,
I have managed to successfully install OpenMPI-1.10.3 and Abinit-8.0.8 over my 5 node cluster. I then proceeded to run a job with this command
mpirun -np 2 -machinefile cluster abinit < t4x.files
In the file cluster I have the IP addresses given as such but with the last three commented out
node1
node2
#node3
#node4
#node5
It proceeds to run the job as given in my input file. If I change it to this command
mpirun -np 3 -machinefile cluster abinit < t4x.files
In the file cluster I have the IP addresses given as such but with the last two commented out
node1
node2
node3
#node4
#node5
I get
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
In the case of -np 2, I can choose any two IP addresses and it will still run a parallel job, but the job does not run when -np 3,4 or 5.
I'd appreciate any help provided. Thank you.
I have managed to successfully install OpenMPI-1.10.3 and Abinit-8.0.8 over my 5 node cluster. I then proceeded to run a job with this command
mpirun -np 2 -machinefile cluster abinit < t4x.files
In the file cluster I have the IP addresses given as such but with the last three commented out
node1
node2
#node3
#node4
#node5
It proceeds to run the job as given in my input file. If I change it to this command
mpirun -np 3 -machinefile cluster abinit < t4x.files
In the file cluster I have the IP addresses given as such but with the last two commented out
node1
node2
node3
#node4
#node5
I get
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
In the case of -np 2, I can choose any two IP addresses and it will still run a parallel job, but the job does not run when -np 3,4 or 5.
I'd appreciate any help provided. Thank you.