Parallelisation run crashes - npband (solved)
Posted: Fri Dec 02, 2011 2:55 pm
Hi all,
I'm trying to parallelize my code over bands for a spin-orbit calculation. For that, I use, as a first test, paral_kgb 1 npkpt 2 npband 4.
All other variables are kept the same as when I was not forcing any band paralellization.
The number of bands is 48, which is a multiplier of npband 4...
For some reason, it crashes with no specific warning or error. Sometimes, depending on the number of processors I ask, it does not really crash, but the jobs hangs at the same place indefinitely.
I do ask for the amount of memory estimated at the beginning of the abinit run.
Here the input file :
And here, the end of the log file before it crashes :
Thanks in advance for any help you can provide.
EDIT : Thanks to David, it is solved. I don't know when or why, but in the process of parallelizing the input file, I changed the nstep variable to zero. A non-zero positive value solves the problem. So in the end, it had nothing to do with the parallelization.
I'm trying to parallelize my code over bands for a spin-orbit calculation. For that, I use, as a first test, paral_kgb 1 npkpt 2 npband 4.
All other variables are kept the same as when I was not forcing any band paralellization.
The number of bands is 48, which is a multiplier of npband 4...
For some reason, it crashes with no specific warning or error. Sometimes, depending on the number of processors I ask, it does not really crash, but the jobs hangs at the same place indefinitely.
I do ask for the amount of memory estimated at the beginning of the abinit run.
Here the input file :
Code: Select all
# SO calculation
nspinor 2 # spinor instead of scalar
nsppol 1 # mandatory if nspinor 2
nspden 4 # if magnetic system (1 would be useful for bulk Bi)
# Paralellization
paral_kgb 1
npkpt 2
npband 4
# Common data
ecut 60
kptrlatt 2 0 0
-2 2 0
0 0 1
shiftk 0 0 0
tsmear 0.001
intxc 1
ixc 1
kptopt 4 # 1 or 2 not allowed for nspden 4 (no time reversal symmetry), value of 4 all symmetries except time reversal
# Relaxation
nstep 0
ionmov 2
ntime 10
optcell 0
#dilatmx 1.1
ecutsm 0.5
# Unit cell parameters of the 2x2x1 supercell
acell 9.2336153342E+00 9.2336153342E+00 2.0000000000E+01
rprim 0.000000000000000 1.000000000000000 0.000000000000000
0.866025403784439 0.500000000000000 0.000000000000000
0.000000000000000 0.000000000000000 -1.000000000000000
# Number and types of the atoms
ntypat 2
natom 9
typat 1 1 1 1 2 1 1 1 1
znucl 6 83
occopt 7
# Reduced coordinates of the atoms
xred 1.6667993666E-01 1.6667994614E-01 2.0984311651E-02
1.6667994614E-01 6.6664011720E-01 2.0984311651E-02
6.6664011720E-01 1.6667993666E-01 2.0984311651E-02
6.6666666667E-01 6.6666666667E-01 1.9803656205E-02
6.6666666667E-01 6.6666666667E-01 -2.8076970156E-01
3.3333333333E-01 3.3333333333E-01 2.0882092730E-02
3.3282070492E-01 8.3358965318E-01 2.0710339224E-02
8.3358964190E-01 3.3282070492E-01 2.0710339224E-02
8.3358965318E-01 8.3358964190E-01 2.0710339224E-02
tolvrs 1.0d-14
#~abinit/users/utils/AbinitStructureViewer.py
And here, the end of the log file before it crashes :
Code: Select all
---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------
getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 72 72 144
ecut(hartree)= 60.000 => boxcut(ratio)= 2.06487
scfcv : before setvtr, energies%e_hartree= 0.000000000000000E+000
ewald : nr and ng are 4 and 21
mklocl_recipspace : will add potential with strength vprtrb(:)=
0.000000000000000E+000 0.000000000000000E+000
setvtr : istep,n1xccc,moved_rhor= 1 0 0
scfcv : after setvtr, energies%e_hartree= 0.000000000000000E+000
Thanks in advance for any help you can provide.
EDIT : Thanks to David, it is solved. I don't know when or why, but in the process of parallelizing the input file, I changed the nstep variable to zero. A non-zero positive value solves the problem. So in the end, it had nothing to do with the parallelization.