[SOLVED] Segfault / Invalid Pointer during SCF cycle
Posted: Sat Jul 31, 2010 12:05 am
Hello! I'm attempting to run an optimization run on tetragonal zirconia using a custom Zr PAW dataset and I appear to have discovered a bug in the abinit code. The dataset I used provided good results in a cubic configuration, but will cause abinit to crash under certain conditions in the tetragonal arrangement. Changing the dataset does not fix the problem. The code appears to crash during the first SCF cycle. The error occurs in versions 6.0.4 and 6.2.1. The code appears to run correctly if ecut is about 16 Ha, but crashes at 20 Ha. In past attempts, it seemed to crash for values of ecut between 19 and 26, but would run for values of ecut between 12 and 18, or for values above 26 Ha, but in some cases would produce warnings about negative charge densities. Also, making changes to the dataset using atompaw did not seem to help, even when changing the type of basis functions or projectors, etc (ie bloechl, vanderbilt, rrkj, etc.) According to the plots made when generating the PAW, the hamiltonians of the exact function and the PAW function seemed to agree very well, and the wfn.i plots all seemed pretty reasonable.
Has anyone encountered such a problem before? If not, I would like to try to track down the problem, but could use some assistance in finding the bug. I attached the input file for abinit. The point where abinit fails in the log has the following:
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
leave_test : synchronization done...
vtorho: loop on k-points and spins done in parallel
leave_test : synchronization done...
*********** RHOIJ (atom 1) **********
2.06391 -0.01834 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.02995 0.00000 ...
-0.01834 0.05303 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00097 0.00000 ...
0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.00000 0.00000 0.00000 0.00000 0.14258 ...
0.00000 0.00000 0.00000 2.09851 0.00000 0.00000 -0.03432 0.00000 0.16676 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.14258 0.00000 0.00000 ...
0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 0.00000 0.00000 0.00000 0.00000 -0.03304 ...
0.00000 0.00000 0.00000 -0.03432 0.00000 0.00000 0.03735 0.00000 -0.03763 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 -0.03304 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.16676 0.00000 0.00000 -0.03763 0.00000 0.41508 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.39175 0.00000 0.00000 ...
-0.02995 -0.00097 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.30738 0.00000 ...
0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.00000 0.00000 0.00000 0.00000 0.39175 ...
... only 12 components have been written...
*********** RHOIJ (atom 6) **********
2.11479 0.04015 0.00000 -0.01752 0.00000 0.00000 -0.00289 0.00000
0.04015 0.01195 0.00000 -0.00158 0.00000 0.00000 -0.00182 0.00000
0.00000 0.00000 2.03026 0.00000 0.03309 0.01767 0.00000 0.00198
-0.01752 -0.00158 0.00000 2.01532 0.00000 0.00000 0.03259 0.00000
At this point, abinit ends with a "Segmentation Fault" message. Does anyone have any ideas or hints on how to tackle this? If it helps, atoms 1 and 2 are Zr, and atoms 3-6 are oxygen. Thanks!
Also, FYI, I'm running it on an Ubuntu 10.04 LTS workstation cluster 64-bit edition using gcc to compile. I'm using MPICH2, although the problem occurs whether running in parallel or not.
-Steve
Update: I compiled and reran abinit with the configure options --enable-debug=yes --enable-optim=no --enable-mpi=no, and the log file shows a little more diagnostic information. At the very end, after the "RHOIJ (atom 6)" lines, it ends with:
symrhoij.F90:745 : exit
pawmknhat.F90:117 : enter
pawmknhat.F90:359 : exit
transgrid.F90:108 : enter
Has anyone encountered such a problem before? If not, I would like to try to track down the problem, but could use some assistance in finding the bug. I attached the input file for abinit. The point where abinit fails in the log has the following:
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
leave_test : synchronization done...
vtorho: loop on k-points and spins done in parallel
leave_test : synchronization done...
*********** RHOIJ (atom 1) **********
2.06391 -0.01834 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.02995 0.00000 ...
-0.01834 0.05303 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00097 0.00000 ...
0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.00000 0.00000 0.00000 0.00000 0.14258 ...
0.00000 0.00000 0.00000 2.09851 0.00000 0.00000 -0.03432 0.00000 0.16676 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 1.98097 0.00000 0.00000 0.00587 0.00000 0.14258 0.00000 0.00000 ...
0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 0.00000 0.00000 0.00000 0.00000 -0.03304 ...
0.00000 0.00000 0.00000 -0.03432 0.00000 0.00000 0.03735 0.00000 -0.03763 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.00587 0.00000 0.00000 0.02812 0.00000 -0.03304 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.16676 0.00000 0.00000 -0.03763 0.00000 0.41508 0.00000 0.00000 0.00000 ...
0.00000 0.00000 0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.39175 0.00000 0.00000 ...
-0.02995 -0.00097 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.30738 0.00000 ...
0.00000 0.00000 0.14258 0.00000 0.00000 -0.03304 0.00000 0.00000 0.00000 0.00000 0.00000 0.39175 ...
... only 12 components have been written...
*********** RHOIJ (atom 6) **********
2.11479 0.04015 0.00000 -0.01752 0.00000 0.00000 -0.00289 0.00000
0.04015 0.01195 0.00000 -0.00158 0.00000 0.00000 -0.00182 0.00000
0.00000 0.00000 2.03026 0.00000 0.03309 0.01767 0.00000 0.00198
-0.01752 -0.00158 0.00000 2.01532 0.00000 0.00000 0.03259 0.00000
At this point, abinit ends with a "Segmentation Fault" message. Does anyone have any ideas or hints on how to tackle this? If it helps, atoms 1 and 2 are Zr, and atoms 3-6 are oxygen. Thanks!
Also, FYI, I'm running it on an Ubuntu 10.04 LTS workstation cluster 64-bit edition using gcc to compile. I'm using MPICH2, although the problem occurs whether running in parallel or not.
-Steve
Update: I compiled and reran abinit with the configure options --enable-debug=yes --enable-optim=no --enable-mpi=no, and the log file shows a little more diagnostic information. At the very end, after the "RHOIJ (atom 6)" lines, it ends with:
symrhoij.F90:745 : exit
pawmknhat.F90:117 : enter
pawmknhat.F90:359 : exit
transgrid.F90:108 : enter