[SOLVED] Segfault / Invalid Pointer during SCF cycle

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
Steven Miller
Posts: 5
Joined: Fri Jul 30, 2010 10:39 pm

[SOLVED] Segfault / Invalid Pointer during SCF cycle

Post by Steven Miller » Sat Jul 31, 2010 12:05 am

Hello! I'm attempting to run an optimization run on tetragonal zirconia using a custom Zr PAW dataset and I appear to have discovered a bug in the abinit code. The dataset I used provided good results in a cubic configuration, but will cause abinit to crash under certain conditions in the tetragonal arrangement. Changing the dataset does not fix the problem. The code appears to crash during the first SCF cycle. The error occurs in versions 6.0.4 and 6.2.1. The code appears to run correctly if ecut is about 16 Ha, but crashes at 20 Ha. In past attempts, it seemed to crash for values of ecut between 19 and 26, but would run for values of ecut between 12 and 18, or for values above 26 Ha, but in some cases would produce warnings about negative charge densities. Also, making changes to the dataset using atompaw did not seem to help, even when changing the type of basis functions or projectors, etc (ie bloechl, vanderbilt, rrkj, etc.) According to the plots made when generating the PAW, the hamiltonians of the exact function and the PAW function seemed to agree very well, and the wfn.i plots all seemed pretty reasonable.

Has anyone encountered such a problem before? If not, I would like to try to track down the problem, but could use some assistance in finding the bug. I attached the input file for abinit. The point where abinit fails in the log has the following:

ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
leave_test : synchronization done...
vtorho: loop on k-points and spins done in parallel
leave_test : synchronization done...

*********** RHOIJ (atom 1) **********
2.06391 -0.01834 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.02995 0.00000 ...
-0.01834 0.05303 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00097 0.00000 ...
0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.00000 0.00000 0.00000 0.00000 0.14258 ...
0.00000 0.00000 0.00000 2.09851 0.00000 0.00000 -0.03432 0.00000 0.16676 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.14258 0.00000 0.00000 ...
0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 0.00000 0.00000 0.00000 0.00000 -0.03304 ...
0.00000 0.00000 0.00000 -0.03432 0.00000 0.00000 0.03735 0.00000 -0.03763 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 -0.03304 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.16676 0.00000 0.00000 -0.03763 0.00000 0.41508 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.39175 0.00000 0.00000 ...
-0.02995 -0.00097 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.30738 0.00000 ...
0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.00000 0.00000 0.00000 0.00000 0.39175 ...
... only 12 components have been written...

*********** RHOIJ (atom 6) **********
2.11479 0.04015 0.00000 -0.01752 0.00000 0.00000 -0.00289 0.00000
0.04015 0.01195 0.00000 -0.00158 0.00000 0.00000 -0.00182 0.00000
0.00000 0.00000 2.03026 0.00000 0.03309 0.01767 0.00000 0.00198
-0.01752 -0.00158 0.00000 2.01532 0.00000 0.00000 0.03259 0.00000


At this point, abinit ends with a "Segmentation Fault" message. Does anyone have any ideas or hints on how to tackle this? If it helps, atoms 1 and 2 are Zr, and atoms 3-6 are oxygen. Thanks!

Also, FYI, I'm running it on an Ubuntu 10.04 LTS workstation cluster 64-bit edition using gcc to compile. I'm using MPICH2, although the problem occurs whether running in parallel or not.
-Steve

Update: I compiled and reran abinit with the configure options --enable-debug=yes --enable-optim=no --enable-mpi=no, and the log file shows a little more diagnostic information. At the very end, after the "RHOIJ (atom 6)" lines, it ends with:

symrhoij.F90:745 : exit

pawmknhat.F90:117 : enter

pawmknhat.F90:359 : exit

transgrid.F90:108 : enter
Attachments
r01.in
(704 Bytes) Downloaded 323 times
Last edited by Steven Miller on Thu Aug 12, 2010 4:30 pm, edited 3 times in total.

Steven Miller
Posts: 5
Joined: Fri Jul 30, 2010 10:39 pm

Re: Segfault / Invalid Pointer during SCF cycle

Post by Steven Miller » Tue Aug 03, 2010 11:24 pm

Ok, I found the problem. It seems that in the log file for the ecut=20 case, the coarse FFT mesh was actually finer along one dimension than the fine FFT mesh:

Code: Select all

 getng is called for the coarse grid:
 For input ecut=  2.205000E+01 best grid ngfft=      20      30      40
       max ecut=  2.211017E+01
 However, must be changed due to symmetry =>      20      40      40
       with max ecut=  0.221102E+02

 ==== FFT mesh ====
  FFT mesh divisions ........................    20   40   40
  Augmented FFT divisions ...................    21   41   40
  FFT algorithm .............................   112
  FFT cache size ............................    16
 getmpw: optimal value of mpw=    1036

 getng is called for the fine grid:
 For input ecut=  3.528000E+01 best grid ngfft=      27      36      54
       max ecut=  3.654946E+01
 However, must be changed due to symmetry =>      36      36      54
       with max ecut=  0.365495E+02

 ==== FFT mesh ====
  FFT mesh divisions ........................    36   36   54
  Augmented FFT divisions ...................    37   37   54
  FFT algorithm .............................   112
  FFT cache size ............................    16
  getdim_nloc : enter
  pspheads(1)%nproj(0:3)=           2           2           2           0


It seems that the grid generation relied on the assumption that the fine FFT mesh is always finer along each dimension than the coarse mesh. This was causing an incomplete coatofin vector to be created in the indgrid subroutine, which was eventually causing memory corruption in the transgrid subroutine when it attempted to remap the rhog array into a finer mesh in vectg. This problem is solved in the latest version of ABINIT 6.2.2, and you can also get around it by making sure your fine grid is always coarser than your course grid.

Locked