Inconsistent results for parallel calculations  [SOLVED]

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
Rishøj
Posts: 6
Joined: Sun Apr 06, 2014 1:47 pm
Location: Denmark

Inconsistent results for parallel calculations

Post by Rishøj » Sun Apr 06, 2014 4:02 pm

Hi everyone.

I have some difficulties with Abinit running in parallel. All my calculations seem to take more SCF cycles, when I run the job in parallel - and some jobs even fail to converge! Sometimes, when I do structural relaxation, the job will fail to converge at some specific time step when the job is run in parallel, but will finish without problems when run in serial (or more specifically, one MPI core).

I only seem to see the problem for 2D materials. I have never encountered the problem for bulk materials. I have not tried 1D materials or molecules.

The problem occurs in may different scenarios, but here's an input file that generates the error for me

Code: Select all

# Monolayer hcp Fe

# Output quantities
prtwf 0
prtdos 0
prtden 1


#spin related quantities
 spinat   0.0 0.0 2.0
 nsppol   2
 nband   14


# Convergence stuff
 tolvrs   1.0d-10
 ecut   20.0
 pawecutdg 40.
   ngkpt   2*15 1
nshiftk   1
 shiftk   0.0 0.0 0.0
 nstep   200
 occopt   3
 tsmear   0.01
 
# Pseudo related
#iscf   17
ixc 1

#Geometry
  acell 2*2.46 20 angstrom
rprim  sqrt(3/4)  0.5  0
       sqrt(3/4)  -0.5  0 
       0.0  0.0  -1
  natom   1
 ntypat   1
  typat   1
   xred   0.0  0.0  0.0
 znucl    26


When I run this input file, it converges in 65 SCF cycles when I run it on a single core, but fail to converge in 200 SCF cycles on both 2 and 4 cores.



I use Abinit 7.6.2 with OpemMPI and I have tried several things to solve the problem, including:
  • Trying with another Abinit version (/.4.3)
  • Compiling with either Intel Fortran or Gnu Fortran compilers (both Abinit and MPI)
  • Compiling with either MKL or the netlib-fallback library.
  • Compiling with or without FFTW3.
  • Trying with another computer.
  • Trying with both OpenMPI and MPICH2


The upload system on this site does not allow for me to include all of the test files, so I have uploaded them to my own website. I have included my input-file, my files-file and the pseudo I used for the calculations, including the log file for all three runs (1, 2 and 4 cores).
http://ricehigh.dk/problem_example.zip


Can anyone help me on how to proceed?

User avatar
gmatteo
Posts: 291
Joined: Sun Aug 16, 2009 5:40 pm

Re: Inconsistent results for parallel calculations

Post by gmatteo » Sun Apr 06, 2014 10:58 pm

It may be due to numerical instabilities that are amplified when you increase the number of processors

The log files contains several warnings about negative densities

--- !WARNING
message: |
Density went too small (lower than xc_denpos) at 82023 points
and was set to xc_denpos= 1.00E-14. Lowest was -0.11E-06.
Likely due to too low boxcut or too low ecut for pseudopotential core charge.
src_file: mkdenpos.F90
src_line: 176
...

Can you try to:

1) Increase pawecutdg and ecut
2) Test other PAW pseudopotentials
3) Test other values of iprcell (see http://www.abinit.org/documentation/hel ... tml#iprcel)
to see if one can accelerate the SCF cycle in the sequential case

Rishøj
Posts: 6
Joined: Sun Apr 06, 2014 1:47 pm
Location: Denmark

Re: Inconsistent results for parallel calculations

Post by Rishøj » Mon Apr 07, 2014 10:33 am

Thanks for the input gmatteo,

I have just tried tried with larger ecut and pawecutdg (ecut=30 and pawecutdg=80), without significant changes in convergence (62 SCF cycles compared to 65).

I have also tried with other pseudos (for instance the JTH GGA pseudo).

And finally, if I try setting iprcel to a nonzero value (I have tried 25, 35, and 45), I get the following error:

Code: Select all

--- !ERROR
message: |
    The distribfft passed was already allocated for fine grid
src_file: m_distribfft.F90
src_line: 219


Furthermore, I have a larger calculation that I have tried running on a colleagues computer, where it runs in parallel without problems, while it fails to converge on mine. (He uses Abinit 6.x with MPICH2).

Do you have any other suggestions in regards to solving my problem?

Rishøj
Posts: 6
Joined: Sun Apr 06, 2014 1:47 pm
Location: Denmark

Re: Inconsistent results for parallel calculations

Post by Rishøj » Mon Apr 07, 2014 6:01 pm

I fact, if I run the exact same input file on the University's cluster, I get no problems at all with 4 cores. - also with abinit 7.6.2 and openmpi.

I wonder if there is a problem with my computer or with some installed package..?

How do I find out what causes the error?

Rishøj
Posts: 6
Joined: Sun Apr 06, 2014 1:47 pm
Location: Denmark

Re: Inconsistent results for parallel calculations  [SOLVED]

Post by Rishøj » Fri Apr 25, 2014 10:39 am

I have just recompiled Abinit with the gfortran compiler and now it seems to work.

ldamewood
Posts: 14
Joined: Tue Mar 09, 2010 11:39 pm

Re: Inconsistent results for parallel calculations

Post by ldamewood » Fri May 09, 2014 11:32 pm

I don't know if this is the same issue you were having, but if you were using the intel compiler, I found setting

Code: Select all

FC="mpiifort -fp-model strict"
in abinit fixed my numeric instabilities when using >4 mpi processes. I described my struggles in this post: viewtopic.php?f=2&t=2651

Locked