I am running 12.6.3 on OpenSuse 11.3. Abinit was compiled with the latest Intel fortran/cc/mpi software. I have been preparing to do Berry phase calculations for a finite electric field by practicing on GaAs before I start using a more complicated structure. I have included my input and corresponding output files below for reference. I am interested in including spin-orbit-coupling on my real structure thus I am using the HGH pseudopotentials for GaAs. The input file calculates the SCF for GaAs on 8x8x8 M-P grid with shifting that gives rise to a total of 28 k-points in the irreducible Brillouin zone. I would like to parallelize my calculation over these k-points. If I remove the final two lines
berryopt -1
rfdir 1 1 1
the calculation runs to completion under mpi with 28 cores with no errors. When I add the berryopt -1 input variable, the program crashes violently with as you can see below.
initberry: enter
Simple Lattice Grid
initberry: for direction 1, nkstr = 8, nstr = 256
initberry: for direction 2, nkstr = 8, nstr = 256
initberry: for direction 3, nkstr = 8, nstr = 256
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
abinit 000000000140301E Unknown Unknown Unknown
abinit 0000000000D01F30 Unknown Unknown Unknown
abinit 00000000004E7C5E Unknown Unknown Unknown
abinit 000000000041ECEE Unknown Unknown Unknown
abinit 00000000004101F8 Unknown Unknown Unknown
abinit 000000000040853A Unknown Unknown Unknown
abinit 00000000004070DC Unknown Unknown Unknown
libc.so.6 00002B735E744BFD Unknown Unknown Unknown
abinit 0000000000406FD9 Unknown Unknown Unknown
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Running the same input file in serial form results in the calculation proceeding to completion, e.g. the same section as above reads
setup2: Arith. and geom. avg. npw (full set) are 3265.773 3265.765
symatm: atom number 1 is reached starting at atom
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
symatm: atom number 2 is reached starting at atom
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
initberry: enter
Simple Lattice Grid
initberry: for direction 1, nkstr = 8, nstr = 256
initberry: for direction 2, nkstr = 8, nstr = 256
initberry: for direction 3, nkstr = 8, nstr = 256
initro : for itypat= 1, take decay length= 1.1500,
initro : indeed, coreel= 28.0000, nval= 3 and densty= 0.0000E+00.
initro : for itypat= 2, take decay length= 1.0000,
initro : indeed, coreel= 28.0000, nval= 5 and densty= 0.0000E+00.
================================================================================
getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 45 45 45
ecut(hartree)= 35.000 => boxcut(ratio)= 2.23590
with a final result of:
DATA TYPE INFORMATION:
REAL: Data type name: REAL(DP)
Kind value: 8
Precision: 15
Smallest nonnegligible quantity relative to 1: 0.22204460E-15
Smallest positive number: 0.22250739-307
Largest representable number: 0.17976931+309
INTEGER: Data type name: INTEGER(default)
Kind value: 4
Bit size: 32
Largest representable number: 2147483647
LOGICAL: Data type name: LOGICAL
Kind value: 4
CHARACTER: Data type name: CHARACTER Kind value: 1
What can I do to get the mpi version working correctly. My actually structure is quite large (a slab calculation) and I really need to use more than one cpu for speed reasons. Have I encountered a bug or I am doing something wrong. Any help offered would be gratefully received.
Paul Fons
Berry phase calculation crashing under MPI
Moderators: mverstra, joaocarloscabreu
Berry phase calculation crashing under MPI
- Attachments
-
- gaasBerry.in
- Abinit input file
- (1.62 KiB) Downloaded 306 times
-
- gaasBerry.out
- Log file for gaasBerry.in run
- (32.03 KiB) Downloaded 268 times
Re: Berry phase calculation crashing under MPI
Hi,
it's not an MPI problem, it's a berry's phase problem. In version 6.12.3 the spin polarized and spinor versions of the berry's phase code were not very thoroughly implemented and debugged (especially the nspinor part). I did that for both norm conserving and PAW since that version, I just tested your case GaAs with HGH and parallelized over kpts, with berryopt -1, and it ran fine on version 7.0.1 (the current development version). If you like I can ask Xavier if it's ok to send you a snapshot of the current code base.
it's not an MPI problem, it's a berry's phase problem. In version 6.12.3 the spin polarized and spinor versions of the berry's phase code were not very thoroughly implemented and debugged (especially the nspinor part). I did that for both norm conserving and PAW since that version, I just tested your case GaAs with HGH and parallelized over kpts, with berryopt -1, and it ran fine on version 7.0.1 (the current development version). If you like I can ask Xavier if it's ok to send you a snapshot of the current code base.
Josef W. Zwanziger
Professor, Department of Chemistry
Canada Research Chair in NMR Studies of Materials
Dalhousie University
Halifax, NS B3H 4J3 Canada
jzwanzig@gmail.com
Professor, Department of Chemistry
Canada Research Chair in NMR Studies of Materials
Dalhousie University
Halifax, NS B3H 4J3 Canada
jzwanzig@gmail.com