Berry's phase array bounds errors

Documentation, Web site and code modifications

Moderators: baguetl, routerov

Locked
recohen
Posts: 36
Joined: Tue Apr 30, 2013 10:48 pm

Berry's phase array bounds errors

Post by recohen » Sun Jun 11, 2017 10:03 pm

I have been trying to do constant D computations wth abinit and having occasional success but some structures always crash. I compiled initberry and berryphase_new with bounds checking and extra printouts, and it seems clear there is a problem. Usually it just crashes with no useful traceback or error but now I get: back trace terminated abnormally. forrtl: severe (408): fort: (2): Subscript #1 of the array INDKK_F2IBZ has value 26371 which is greater than the upper bound of 512 , which is some key. This is for a 8x8x8 k-point mesh on 128 processors. I find that even on 1 cpu (which is very slow) it also crashes , so it seems not to just be a parallelization problem. However, even when I run on one processors nprocs correctly is shown as 1, but Num_procs is given incorrectly in the output. This number shows as 4 when I run on 128 processors and as 64 when I run on 1! So with mpirun -bootstrap slurm -n 128 abinit_check <files >&OUTFILE
I see nproc = 128:
==== OpenMP parallelism is ON ====
- Max_threads: 2
- Num_threads: 2
- Num_procs: 4
- Dynamic: F
- Nested: F
and with nproc = 1
I see
==== OpenMP parallelism is ON ====
- Max_threads: 2
- Num_threads: 2
- Num_procs: 256
- Dynamic: F
- Nested: F

I get similar failures with export OMP_NUM_THREADS=1 , so that is not the problem.
I am running on an Intel KNL clyster with 64 cores per node.
Maybe something like this is causing dimensioning to go awry. I do see that the call that is used in the num-procs above is not good, because it just reports the numbers
of cores free "at that time on the device," whereas the nproc number is the correct one. Maybe this is a red herring. Anyway, and help would be appreciated.

Sometimes instead of an mpi crash or bonds crash, I see the error "the determinant of the overlap matrix is found to be 0." Anyway, I suspect one or more dimensioning problems in the Berry's phase routines.

Thanks!

Sincerely,

Ron Cohen
rcohen@carnegiescience.edu
Attachments
new.in
(2.15 KiB) Downloaded 513 times
OUTFILE.out
(213.08 KiB) Downloaded 502 times
files.in
(80 Bytes) Downloaded 520 times

recohen
Posts: 36
Joined: Tue Apr 30, 2013 10:48 pm

Re: Berry's phase array bounds errors

Post by recohen » Sun Jun 11, 2017 10:18 pm

I built ABINIT 7.10.5 and ran with the same input and environment, and it seems to run OK. So it does seem to be a problem with ABINIT 8. I attached the beginning of the output for 7.10.5 . I also include abinit -b output from both builds. Thanks,

Ron
Attachments
OUTFILE7105.out
(198.64 KiB) Downloaded 622 times
b7105.out
(3.33 KiB) Downloaded 636 times
b.842.out
(4.54 KiB) Downloaded 626 times

Locked