Hello! I'm attempting to run an optimization run on tetragonal zirconia using a custom Zr PAW dataset and I appear to have discovered a bug in the abinit code. The dataset I used provided good results in a cubic configuration, but will cause abinit to crash under certain conditions in the tetragonal arrangement. Changing the dataset does not fix the problem. The code appears to crash during the first SCF cycle. The error occurs in versions 6.0.4 and 6.2.1. The code appears to run correctly if ecut is about 16 Ha, but crashes at 20 Ha. In past attempts, it seemed to crash for values of ecut between 19 and 26, but would run for values of ecut between 12 and 18, or for values above 26 Ha, but in some cases would produce warnings about negative charge densities. Also, making changes to the dataset using atompaw did not seem to help, even when changing the type of basis functions or projectors, etc (ie bloechl, vanderbilt, rrkj, etc.) According to the plots made when generating the PAW, the hamiltonians of the exact function and the PAW function seemed to agree very well, and the wfn.i plots all seemed pretty reasonable.
Has anyone encountered such a problem before? If not, I would like to try to track down the problem, but could use some assistance in finding the bug. I attached the input file for abinit. The point where abinit fails in the log has the following:
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
leave_test : synchronization done...
vtorho: loop on k-points and spins done in parallel
leave_test : synchronization done...
*********** RHOIJ (atom 1) **********
2.06391 -0.01834 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.02995 0.00000 ...
-0.01834 0.05303 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00097 0.00000 ...
0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.00000 0.00000 0.00000 0.00000 0.14258 ...
0.00000 0.00000 0.00000 2.09851 0.00000 0.00000 -0.03432 0.00000 0.16676 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.14258 0.00000 0.00000 ...
0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 0.00000 0.00000 0.00000 0.00000 -0.03304 ...
0.00000 0.00000 0.00000 -0.03432 0.00000 0.00000 0.03735 0.00000 -0.03763 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 -0.03304 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.16676 0.00000 0.00000 -0.03763 0.00000 0.41508 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.39175 0.00000 0.00000 ...
-0.02995 -0.00097 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.30738 0.00000 ...
0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.00000 0.00000 0.00000 0.00000 0.39175 ...
... only 12 components have been written...
*********** RHOIJ (atom 6) **********
2.11479 0.04015 0.00000 -0.01752 0.00000 0.00000 -0.00289 0.00000
0.04015 0.01195 0.00000 -0.00158 0.00000 0.00000 -0.00182 0.00000
0.00000 0.00000 2.03026 0.00000 0.03309 0.01767 0.00000 0.00198
-0.01752 -0.00158 0.00000 2.01532 0.00000 0.00000 0.03259 0.00000
At this point, abinit ends with a "Segmentation Fault" message. Does anyone have any ideas or hints on how to tackle this? If it helps, atoms 1 and 2 are Zr, and atoms 3-6 are oxygen. Thanks!
Also, FYI, I'm running it on an Ubuntu 10.04 LTS workstation cluster 64-bit edition using gcc to compile. I'm using MPICH2, although the problem occurs whether running in parallel or not.
-Steve
Update: I compiled and reran abinit with the configure options --enable-debug=yes --enable-optim=no --enable-mpi=no, and the log file shows a little more diagnostic information. At the very end, after the "RHOIJ (atom 6)" lines, it ends with:
symrhoij.F90:745 : exit
pawmknhat.F90:117 : enter
pawmknhat.F90:359 : exit
transgrid.F90:108 : enter
[SOLVED] Segfault / Invalid Pointer during SCF cycle
Moderator: bguster
-
- Posts: 5
- Joined: Fri Jul 30, 2010 10:39 pm
[SOLVED] Segfault / Invalid Pointer during SCF cycle
- Attachments
-
- r01.in
- (704 Bytes) Downloaded 323 times
Last edited by Steven Miller on Thu Aug 12, 2010 4:30 pm, edited 3 times in total.
-
- Posts: 5
- Joined: Fri Jul 30, 2010 10:39 pm
Re: Segfault / Invalid Pointer during SCF cycle
Ok, I found the problem. It seems that in the log file for the ecut=20 case, the coarse FFT mesh was actually finer along one dimension than the fine FFT mesh:
It seems that the grid generation relied on the assumption that the fine FFT mesh is always finer along each dimension than the coarse mesh. This was causing an incomplete coatofin vector to be created in the indgrid subroutine, which was eventually causing memory corruption in the transgrid subroutine when it attempted to remap the rhog array into a finer mesh in vectg. This problem is solved in the latest version of ABINIT 6.2.2, and you can also get around it by making sure your fine grid is always coarser than your course grid.
Code: Select all
getng is called for the coarse grid:
For input ecut= 2.205000E+01 best grid ngfft= 20 30 40
max ecut= 2.211017E+01
However, must be changed due to symmetry => 20 40 40
with max ecut= 0.221102E+02
==== FFT mesh ====
FFT mesh divisions ........................ 20 40 40
Augmented FFT divisions ................... 21 41 40
FFT algorithm ............................. 112
FFT cache size ............................ 16
getmpw: optimal value of mpw= 1036
getng is called for the fine grid:
For input ecut= 3.528000E+01 best grid ngfft= 27 36 54
max ecut= 3.654946E+01
However, must be changed due to symmetry => 36 36 54
with max ecut= 0.365495E+02
==== FFT mesh ====
FFT mesh divisions ........................ 36 36 54
Augmented FFT divisions ................... 37 37 54
FFT algorithm ............................. 112
FFT cache size ............................ 16
getdim_nloc : enter
pspheads(1)%nproj(0:3)= 2 2 2 0
It seems that the grid generation relied on the assumption that the fine FFT mesh is always finer along each dimension than the coarse mesh. This was causing an incomplete coatofin vector to be created in the indgrid subroutine, which was eventually causing memory corruption in the transgrid subroutine when it attempted to remap the rhog array into a finer mesh in vectg. This problem is solved in the latest version of ABINIT 6.2.2, and you can also get around it by making sure your fine grid is always coarser than your course grid.