Page 1 of 1

Parallelisation run crashes - npband (solved)

Posted: Fri Dec 02, 2011 2:55 pm
by nleconte
Hi all,

I'm trying to parallelize my code over bands for a spin-orbit calculation. For that, I use, as a first test, paral_kgb 1 npkpt 2 npband 4.
All other variables are kept the same as when I was not forcing any band paralellization.

The number of bands is 48, which is a multiplier of npband 4...

For some reason, it crashes with no specific warning or error. Sometimes, depending on the number of processors I ask, it does not really crash, but the jobs hangs at the same place indefinitely.

I do ask for the amount of memory estimated at the beginning of the abinit run.

Here the input file :

Code: Select all

# SO calculation
 nspinor 2  # spinor instead of scalar
 nsppol 1   # mandatory if nspinor 2
 nspden 4  # if magnetic system (1 would be useful for bulk Bi)

# Paralellization
 paral_kgb 1
 npkpt 2
 npband 4

# Common data
 ecut 60
 kptrlatt  2   0   0
          -2   2   0
           0   0   1
 shiftk    0   0   0
 tsmear 0.001
 intxc 1     
 ixc 1
 kptopt 4  # 1 or 2 not allowed for nspden 4 (no time reversal symmetry), value of 4 all symmetries except time reversal

# Relaxation
 nstep 0
 ionmov 2
 ntime 10
 optcell 0
 #dilatmx 1.1
 ecutsm 0.5


# Unit cell parameters of the 2x2x1 supercell
 acell   9.2336153342E+00  9.2336153342E+00  2.0000000000E+01
 rprim   0.000000000000000  1.000000000000000  0.000000000000000
         0.866025403784439  0.500000000000000  0.000000000000000
         0.000000000000000  0.000000000000000 -1.000000000000000

# Number and types of the atoms
 ntypat  2
 natom 9
 typat 1 1 1 1 2 1 1 1 1
 znucl 6 83
 occopt  7

# Reduced coordinates of the atoms
 xred     1.6667993666E-01  1.6667994614E-01  2.0984311651E-02
          1.6667994614E-01  6.6664011720E-01  2.0984311651E-02
          6.6664011720E-01  1.6667993666E-01  2.0984311651E-02
          6.6666666667E-01  6.6666666667E-01  1.9803656205E-02
          6.6666666667E-01  6.6666666667E-01 -2.8076970156E-01
          3.3333333333E-01  3.3333333333E-01  2.0882092730E-02
          3.3282070492E-01  8.3358965318E-01  2.0710339224E-02
          8.3358964190E-01  3.3282070492E-01  2.0710339224E-02
          8.3358965318E-01  8.3358964190E-01  2.0710339224E-02

 tolvrs 1.0d-14
 
#~abinit/users/utils/AbinitStructureViewer.py


And here, the end of the log file before it crashes :

Code: Select all

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  72  72 144
         ecut(hartree)=     60.000   => boxcut(ratio)=   2.06487
  scfcv : before setvtr, energies%e_hartree=  0.000000000000000E+000

 ewald : nr and ng are    4 and   21
  mklocl_recipspace : will add potential with strength vprtrb(:)=
  0.000000000000000E+000  0.000000000000000E+000
  setvtr : istep,n1xccc,moved_rhor=           1           0           0
  scfcv : after setvtr, energies%e_hartree=  0.000000000000000E+000


Thanks in advance for any help you can provide.

EDIT : Thanks to David, it is solved. I don't know when or why, but in the process of parallelizing the input file, I changed the nstep variable to zero. A non-zero positive value solves the problem. So in the end, it had nothing to do with the parallelization.