parallel configuration
Posted: Tue Oct 04, 2016 6:41 am
Hello dear ABINIT users;
I compiled succesfully the 7.10.4 parallel ABINIT version on a clusieur :32 nodes Bi-CPU Intel Xeon X5670 - 2x 6 cores @ 2.93 GHz - 24 Go.
I am trying a parallel geometrical optimization calculation of LafeO3 orthorombic phase. The calculation stops after few ntime steps. I need assistance to resolve this problem. Here is the error messsage at the end of log file :
----------------------------------------------------------------------------------
At line 808 of file mover.F90
Fortran runtime error: End of file
mpirun has exited due to process rank 2 with PID 18979 on
node farabi17 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
-------------------------------------------------------------------------------------------------------
and here is the submitted batch file:
------------------------------------------------------------------------------------------------
#!/bin/sh
#SBATCH --partition=materiaux
#SBATCH -A materiaux
#SBATCH --nodes=2
#SBATCH --tasks-per-node=8
#SBATCH --mail-type=ALL # Réception d'un mail à la fin du job
#SBATCH --output=log-%j.out # Fichier de sortie du programme
#SBATCH --error=log-%j.err # Fichier d'erreur du programme
#SBATCH --mail-user=n_ilesdz@yahoo.fr
#module load abinit/7.10.4
mpirun abinit < lafeo3.files >& lafeo3.log
---------------------------------------------------------------------------------------------
and finally, the input file LaFeO3 :
---------------------------------------------------------------------------------------------
# lafeo3 orthorombique
# optimisation géométrique
spgroup 62
kptopt 1 # Option for the automatic generation of k points, taking
# into account the symmetry
nsppol 2
spinat 0. 0. 0.0
0. 0. 0.0
0. 0. 0.0
0. 0. 0.0
0. 0. 7.0
0. 0. 7.0
0. 0. -7.0
0. 0. -7.0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0.0
0. 0. 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
nspden 2
acell 1.02902893502723E+01 1.45257208767473E+01 1.02742096569928E+01
angdeg 90 90 90
nsym 0
tolsym 1.e-4
optcell 1
ionmov 2
ntime 30
ntypat 3 # There is only one type of atom
znucl 57 26 8 # The keyword "znucl" refers to the atomic number of the
nband 92
occopt 1
#Definition of the atoms
natom 20 # There are two atoms
typat 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3
xred
2.54991329693666E-02 2.50000000000000E-01 9.94305905930338E-01
9.74500867030633E-01 7.50000000000000E-01 5.69409406966170E-03
5.25499132969367E-01 2.50000000000000E-01 5.05694094069662E-01
4.74500867030633E-01 7.50000000000000E-01 4.94305905930338E-01
0.00000000000000E+00 0.00000000000000E+00 5.00000000000000E-01
5.00000000000000E-01 0.00000000000000E+00 0.00000000000000E+00
5.00000000000000E-01 5.00000000000000E-01 0.00000000000000E+00
0.00000000000000E+00 5.00000000000000E-01 5.00000000000000E-01
4.92535721476408E-01 2.50000000000000E-01 6.66590204457015E-02
5.07464278523592E-01 7.50000000000000E-01 9.33340979554299E-01
9.92535721476408E-01 2.50000000000000E-01 4.33340979554299E-01
7.46427852359163E-03 7.50000000000000E-01 5.66659020445701E-01
2.24379670389852E-01 5.36034348079099E-01 2.24274009664965E-01
7.75620329610148E-01 4.63965651920901E-01 7.75725990335035E-01
7.24379670389852E-01 5.36034348079099E-01 2.75725990335035E-01
2.75620329610148E-01 4.63965651920901E-01 7.24274009664965E-01
2.75620329610148E-01 3.60343480790995E-02 7.24274009664965E-01
7.24379670389852E-01 9.63965651920901E-01 2.75725990335035E-01
7.75620329610148E-01 3.60343480790995E-02 7.75725990335035E-01
2.24379670389852E-01 9.63965651920901E-01 2.24274009664965E-01
#Definition of the planewave basis set
ecut 45
pawecutdg 90
ecutsm 0.5
pawovlp 0
#Definition of the SCF procedure
nstep 40 # Maximal number of SCF cycles
diemac 14 # Although this is not mandatory, it is worth to
diemix 0.5d0 # precondition the SCF cycle. The model dielectric
# function used as the standard preconditioner
# is described in the "dielng" input variable section.
toldff 5.0d-6
tolmxf 5.0d-6
ixc 11
# add to conserve old < 6.7.2 behavior for calculating forces at each SCF step
optforces 1
--------------------------------------------------------------------------------------------------------------------------
It should be noticed that sequential calculation terminated successfully, It should be perhaps due to MPI communication ??? I am looking for your assistance to resolve this problem.
Respectfully
Iles Nadia
LPC2ME
Oran 1 University
I compiled succesfully the 7.10.4 parallel ABINIT version on a clusieur :32 nodes Bi-CPU Intel Xeon X5670 - 2x 6 cores @ 2.93 GHz - 24 Go.
I am trying a parallel geometrical optimization calculation of LafeO3 orthorombic phase. The calculation stops after few ntime steps. I need assistance to resolve this problem. Here is the error messsage at the end of log file :
----------------------------------------------------------------------------------
At line 808 of file mover.F90
Fortran runtime error: End of file
mpirun has exited due to process rank 2 with PID 18979 on
node farabi17 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
-------------------------------------------------------------------------------------------------------
and here is the submitted batch file:
------------------------------------------------------------------------------------------------
#!/bin/sh
#SBATCH --partition=materiaux
#SBATCH -A materiaux
#SBATCH --nodes=2
#SBATCH --tasks-per-node=8
#SBATCH --mail-type=ALL # Réception d'un mail à la fin du job
#SBATCH --output=log-%j.out # Fichier de sortie du programme
#SBATCH --error=log-%j.err # Fichier d'erreur du programme
#SBATCH --mail-user=n_ilesdz@yahoo.fr
#module load abinit/7.10.4
mpirun abinit < lafeo3.files >& lafeo3.log
---------------------------------------------------------------------------------------------
and finally, the input file LaFeO3 :
---------------------------------------------------------------------------------------------
# lafeo3 orthorombique
# optimisation géométrique
spgroup 62
kptopt 1 # Option for the automatic generation of k points, taking
# into account the symmetry
nsppol 2
spinat 0. 0. 0.0
0. 0. 0.0
0. 0. 0.0
0. 0. 0.0
0. 0. 7.0
0. 0. 7.0
0. 0. -7.0
0. 0. -7.0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0
0. 0. 0.0
0. 0. 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
nspden 2
acell 1.02902893502723E+01 1.45257208767473E+01 1.02742096569928E+01
angdeg 90 90 90
nsym 0
tolsym 1.e-4
optcell 1
ionmov 2
ntime 30
ntypat 3 # There is only one type of atom
znucl 57 26 8 # The keyword "znucl" refers to the atomic number of the
nband 92
occopt 1
#Definition of the atoms
natom 20 # There are two atoms
typat 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3
xred
2.54991329693666E-02 2.50000000000000E-01 9.94305905930338E-01
9.74500867030633E-01 7.50000000000000E-01 5.69409406966170E-03
5.25499132969367E-01 2.50000000000000E-01 5.05694094069662E-01
4.74500867030633E-01 7.50000000000000E-01 4.94305905930338E-01
0.00000000000000E+00 0.00000000000000E+00 5.00000000000000E-01
5.00000000000000E-01 0.00000000000000E+00 0.00000000000000E+00
5.00000000000000E-01 5.00000000000000E-01 0.00000000000000E+00
0.00000000000000E+00 5.00000000000000E-01 5.00000000000000E-01
4.92535721476408E-01 2.50000000000000E-01 6.66590204457015E-02
5.07464278523592E-01 7.50000000000000E-01 9.33340979554299E-01
9.92535721476408E-01 2.50000000000000E-01 4.33340979554299E-01
7.46427852359163E-03 7.50000000000000E-01 5.66659020445701E-01
2.24379670389852E-01 5.36034348079099E-01 2.24274009664965E-01
7.75620329610148E-01 4.63965651920901E-01 7.75725990335035E-01
7.24379670389852E-01 5.36034348079099E-01 2.75725990335035E-01
2.75620329610148E-01 4.63965651920901E-01 7.24274009664965E-01
2.75620329610148E-01 3.60343480790995E-02 7.24274009664965E-01
7.24379670389852E-01 9.63965651920901E-01 2.75725990335035E-01
7.75620329610148E-01 3.60343480790995E-02 7.75725990335035E-01
2.24379670389852E-01 9.63965651920901E-01 2.24274009664965E-01
#Definition of the planewave basis set
ecut 45
pawecutdg 90
ecutsm 0.5
pawovlp 0
#Definition of the SCF procedure
nstep 40 # Maximal number of SCF cycles
diemac 14 # Although this is not mandatory, it is worth to
diemix 0.5d0 # precondition the SCF cycle. The model dielectric
# function used as the standard preconditioner
# is described in the "dielng" input variable section.
toldff 5.0d-6
tolmxf 5.0d-6
ixc 11
# add to conserve old < 6.7.2 behavior for calculating forces at each SCF step
optforces 1
--------------------------------------------------------------------------------------------------------------------------
It should be noticed that sequential calculation terminated successfully, It should be perhaps due to MPI communication ??? I am looking for your assistance to resolve this problem.
Respectfully
Iles Nadia
LPC2ME
Oran 1 University