Parallelisation run crashes - npband (solved)

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
nleconte
Posts: 9
Joined: Sat Apr 17, 2010 6:52 pm

Parallelisation run crashes - npband (solved)

Post by nleconte » Fri Dec 02, 2011 2:55 pm

Hi all,

I'm trying to parallelize my code over bands for a spin-orbit calculation. For that, I use, as a first test, paral_kgb 1 npkpt 2 npband 4.
All other variables are kept the same as when I was not forcing any band paralellization.

The number of bands is 48, which is a multiplier of npband 4...

For some reason, it crashes with no specific warning or error. Sometimes, depending on the number of processors I ask, it does not really crash, but the jobs hangs at the same place indefinitely.

I do ask for the amount of memory estimated at the beginning of the abinit run.

Here the input file :

Code: Select all

# SO calculation
 nspinor 2  # spinor instead of scalar
 nsppol 1   # mandatory if nspinor 2
 nspden 4  # if magnetic system (1 would be useful for bulk Bi)

# Paralellization
 paral_kgb 1
 npkpt 2
 npband 4

# Common data
 ecut 60
 kptrlatt  2   0   0
          -2   2   0
           0   0   1
 shiftk    0   0   0
 tsmear 0.001
 intxc 1     
 ixc 1
 kptopt 4  # 1 or 2 not allowed for nspden 4 (no time reversal symmetry), value of 4 all symmetries except time reversal

# Relaxation
 nstep 0
 ionmov 2
 ntime 10
 optcell 0
 #dilatmx 1.1
 ecutsm 0.5


# Unit cell parameters of the 2x2x1 supercell
 acell   9.2336153342E+00  9.2336153342E+00  2.0000000000E+01
 rprim   0.000000000000000  1.000000000000000  0.000000000000000
         0.866025403784439  0.500000000000000  0.000000000000000
         0.000000000000000  0.000000000000000 -1.000000000000000

# Number and types of the atoms
 ntypat  2
 natom 9
 typat 1 1 1 1 2 1 1 1 1
 znucl 6 83
 occopt  7

# Reduced coordinates of the atoms
 xred     1.6667993666E-01  1.6667994614E-01  2.0984311651E-02
          1.6667994614E-01  6.6664011720E-01  2.0984311651E-02
          6.6664011720E-01  1.6667993666E-01  2.0984311651E-02
          6.6666666667E-01  6.6666666667E-01  1.9803656205E-02
          6.6666666667E-01  6.6666666667E-01 -2.8076970156E-01
          3.3333333333E-01  3.3333333333E-01  2.0882092730E-02
          3.3282070492E-01  8.3358965318E-01  2.0710339224E-02
          8.3358964190E-01  3.3282070492E-01  2.0710339224E-02
          8.3358965318E-01  8.3358964190E-01  2.0710339224E-02

 tolvrs 1.0d-14
 
#~abinit/users/utils/AbinitStructureViewer.py


And here, the end of the log file before it crashes :

Code: Select all

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  72  72 144
         ecut(hartree)=     60.000   => boxcut(ratio)=   2.06487
  scfcv : before setvtr, energies%e_hartree=  0.000000000000000E+000

 ewald : nr and ng are    4 and   21
  mklocl_recipspace : will add potential with strength vprtrb(:)=
  0.000000000000000E+000  0.000000000000000E+000
  setvtr : istep,n1xccc,moved_rhor=           1           0           0
  scfcv : after setvtr, energies%e_hartree=  0.000000000000000E+000


Thanks in advance for any help you can provide.

EDIT : Thanks to David, it is solved. I don't know when or why, but in the process of parallelizing the input file, I changed the nstep variable to zero. A non-zero positive value solves the problem. So in the end, it had nothing to do with the parallelization.
Attachments
bismuth.out
Output file
(11.98 KiB) Downloaded 243 times
log.out
Log file
(26.65 KiB) Downloaded 260 times

Locked