mpiio segfault band/k-point/FFT parallelization

anurag · Post by **anurag** » Mon May 31, 2010 6:02 am

Hello Everyone,

I am running a constrained LDA calculation (manually specifying occupation numbers) for a 128 atom supercell. Since the convergence with k-point parallelization is quite slow I would like to take advantage of band/k-point/FFT parallelization. So I used the following input variables in my input file

#parallelization variables
paral_kgb 1
wfoptalg=4
nloalg=4
fftalg=401
intxc=0
fft_opt_lob=2

npkpt 4
npband 9
npfft 2
bandpp 2
#accesswff 1 1 1 1
istwfk 1 1 1 1

I found that since I compiled the code with enable_mpi option accesswff is automatically set to 1 even if I comment it out. (configure options are listed below) The code seem to do doing well as far as electronic convergence is concerned (toldfe is set to 1e-6) as shown below
......
ETOT 38 -1560.2634967021 -1.744E-02 2.442E-05 5.902E+02 8.009E-03 2.116E-02
ETOT 39 -1560.2728614790 -9.365E-03 1.369E-05 8.446E+00 1.519E-02 2.355E-02
ETOT 40 -1560.2710868467 1.775E-03 2.177E-05 1.455E+02 8.362E-03 2.002E-02
ETOT 41 -1560.2735058946 -2.419E-03 1.241E-05 5.197E+00 5.002E-03 2.201E-02
ETOT 42 -1560.2735661510 -6.026E-05 1.888E-05 2.061E+00 2.548E-03 2.093E-02
ETOT 43 -1560.2732355317 3.306E-04 8.588E-06 1.796E+01 3.270E-03 2.138E-02
ETOT 44 -1560.2736186433 -3.831E-04 9.703E-06 8.753E-01 2.021E-03 2.112E-02
ETOT 45 -1560.2736311134 -1.247E-05 8.009E-06 1.148E-01 3.549E-04 2.147E-02
ETOT 46 -1560.2736317200 -6.065E-07 1.044E-05 2.272E-02 2.073E-04 2.141E-02
ETOT 47 -1560.2736323600 -6.400E-07 8.383E-06 9.248E-03 1.722E-04 2.149E-02

but soon after it crashes when it attempts to write the WFK file
*********************************
At SCF step 47, etot is converged :
for the second time, diff in etot= 6.400E-07 < toldfe= 1.000E-06
forstrnps : usepaw= 0
forstrnps: loop on k-points and spins done in parallel
-P-0000 leave_test : synchronization done...
strhar : before mpi_comm, harstr= 1.072006351610150E-002
9.865018747734584E-003 9.871758113503866E-003 1.198027124816839E-007
1.494415344832276E-007 7.404431448065469E-004
strhar : after mpi_comm, harstr= 1.887503908462354E-002
1.887500817960579E-002 1.658736148652544E-002 1.388586001033960E-007
1.243844495641594E-007 -6.792742386575647E-008
strhar : ehart,ucvol= 232.336116098840 26865.6696106559

Cartesian components of stress tensor (hartree/bohr^3)
sigma(1 1)= -3.13906187E-04 sigma(3 2)= -1.47617615E-08
sigma(2 2)= -3.13907380E-04 sigma(3 1)= -3.67408460E-09
sigma(3 3)= -3.49901037E-04 sigma(2 1)= 5.10828585E-09

ioarr: writing density data
ioarr: file name is 4exciteo_DEN

m_wffile.F90:272:COMMENT
MPI/IO accessing FORTRAN file header: detected record mark length= 4
ioarr: data written to disk file 4exciteo_DEN
-P-0000 leave_test : synchronization done...
================================================================================

----iterations are completed or convergence reached----

outwf : write wavefunction to file 4exciteo_WFK
-P-0000 leave_test : synchronization done...

m_wffile.F90:272:COMMENT
MPI/IO accessing FORTRAN file header: detected record mark length= 4
***********************************************************************************************

portion of the log file at the beginning is listed below

************** portion of output from log file *************************************

==== FFT mesh ====
FFT mesh divisions ........................ 216 216 240
Augmented FFT divisions ................... 217 217 240
FFT algorithm ............................. 401
FFT cache size ............................ 16
FFT parallelization level ................. 1
Number of processors in my FFT group ...... 2
Index of me in my FFT group ............... 0
No of xy planes in R space treated by me .. 108
No of xy planes in G space treated by me .. 120
MPI communicator for FFT .................. 0
Value of ngfft(15:18) ..................... 0 0 0 0
getmpw: optimal value of mpw= 33137
getdim_nloc : enter
pspheads(1)%nproj(0:3)= 0 1 1 1

getdim_nloc : deduce lmnmax = 15, lnmax = 3,
lmnmaxso= 15, lnmaxso= 3.
memory : analysis of memory needs
================================================================================
Values of the parameters that define the memory need of the present run
intxc = 0 ionmov = 0 iscf = 7 xclevel = 1
lmnmax = 3 lnmax = 3 mband = 450 mffmem = 1
P mgfft = 240 mkmem = 1 mpssoang= 4 mpw = 33137
mqgrid = 3001 natom = 128 nfft = 5598720 nkpt = 4
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 1 n1xccc = 2501 ntypat = 3 occopt = 0
================================================================================
P This job should need less than 1841.517 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 1820.272 Mbytes ; DEN or POT disk file : 42.717 Mbytes.
================================================================================
******************************************************

I have access to three different clusters with different file systems (raid1, bluearc-fc and lusterfs) and I see the same problem everywhere. Another observation is that if the estimated WF file is < 1GB then MPI I/O works fine (similar parallelization variables work) and the code is able to write the WFK properly.

I even tried to set accesswff to 0 (but still using paral_kgb 1) which successfully completes the calculation but the WFK file is not apparently complete since cut3d gives an error.

can someone please suggest ways to solve this issue ? I can post the input file if that will help (I am using TM pseudos).

thank you.

Anurag

I am running ABINIT 6.0.4 compiled on a linux cluster with Intel 10.0 fortran compiler. Build information is listed below. ABINIT was configured as follows:

$ ./configure --prefix=/global/home/users/achaudhry --enable-mpi-fft=yes --enable-mpi-io=yes --enable-fttw=yes --enable-64bit-flags=yes FC=mpif90 F77=mpif90 --enable-scalapack CC=mpicc CXX=mpiCC --with-mpi-runner=/global/software/centos-5.x86_64/modules/openmpi/1.4.1-intel/bin/mpirun --with-mpi-level=2 CC_LIBS=-lmpi CXX_LIBS=-lmpi++ -lmpi --with-fc-vendor=intel --with-linalg-includes=-I/global/software/centos-5.x86_64/modules/mkl/10.0.4.023/include --with-linalg-libs=-L/global/software/centos-5.x86_64/modules/mkl/10.0.4.023/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_intel_solver_lp64 -lmkl_lapack -lmkl_core -lguide -lpthread --with-scalapack-libs=-L/global/software/centos-5.x86_64/modules/mkl/10.0.4.023/lib/em64t -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -liomp5 -lpthread --disable-wannier90

=== Build Information ===
Version : 6.0.4
Build target : x86_64_linux_intel0.0
Build date : 20100520

=== Compiler Suite ===
C compiler : intel10.1
CFLAGS : -I/global/software/centos-5.x86_64/modules/openmpi/1.3.3-intel/include
C++ compiler : intel10.1
CXXFLAGS : -g -O2 -vec-report0
Fortran compiler : intel0.0
FCFLAGS : -g -extend-source -vec-report0
FC_LDFLAGS : -static-libgcc -static-intel

=== Optimizations ===
Debug level : yes
Optimization level : standard
Architecture : intel_xeon

=== MPI ===
Parallel build : yes
Parallel I/O : yes

=== Linear algebra ===
Library type : external
Use ScaLAPACK : yes

=== Plug-ins ===
BigDFT : yes
ETSF I/O : yes
LibXC : yes
FoX : no
NetCDF : yes
Wannier90 : no

=== Experimental features ===
Bindings : no
Error handlers : no
Exports : no
GW double-precision : no
Macroave build : yes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:

CC_INTEL CXX_INTEL FC_INTEL

HAVE_FC_EXIT HAVE_FC_FLUSH HAVE_FC_GET_ENVIRONMEN...

HAVE_FC_LONG_LINES HAVE_FC_NULL HAVE_MPI

HAVE_MPI2 HAVE_MPI_IO HAVE_SCALAPACK

HAVE_STDIO_H USE_MACROAVE

mverstra · Post by **mverstra** » Tue Jun 01, 2010 2:05 pm

Hi, this is the same problem as viewtopic.php?f=9&t=356 , but you have precious additional information that it crashes on many other platforms and file systems. I have tried openmpi 1.4.2, with no improvement, as well as the latest development version 6.1.2 of abinit.

Some things we could try:

* remove optimization during compilation

* find precise file size which starts abinit crashing (probably an mpi problem actually)

* it could be that with PAW it does not crash. This would be an important addition.

* I thought it might be an allocation of too much memory for the buffer arrays, but that's not what the code complains about, and it crashes far inside openmpi

If you have any more hints or information, please share it here!

Matthieu

anurag · Post by **anurag** » Tue Jun 01, 2010 4:35 pm

Hello,

I have some experience with the points you mentioned.

>>>> find precise file size which starts abinit crashing (probably an mpi problem actually)

ABINIT prints out approx. size of the WF file in the log file (I do not know how that is estimated). I found that for estimated file sizes < 1GB MPI I/O works fine. I have successfully done calculations for few systems with WFK file ~900MB. Unfortunately, I do not have precise numbers.

>>> it could be that with PAW it does not crash. This would be an important addition.

The same problem persists even with PAW pseudopotentials. I have even tested for spin polarised and unpolarised case. Every single time its the same issue with MPI I/O.

I had posted another thread which seems to have been lost. Anyway, I will restate it again without hopefully annoying people.

Since the _DEN file is typically much smaller than the _WFK file, paral_kgb option is always successful in writing it (with mpi i/o). Is there a possibility to restart the calculation with this density file and run a non-SCF calculation to print out the WF file running with regular k-point parallelization (paral_kgb 0). Even when the WF file is big (> 1GB) I have seen that the code is able to write it properly when just k-point parallelism is employed.

I do not know how to restart such a calculation in ABINIT. I attempted to run a 2 dataset calculation (first is a SCF with paral_kgb 1 to print out _DEN file and second is a non-SCF calculation with paral_kgb 0 to print out the WF file) but ran into a strange error.
ABINIT complained that ndtset should be non-zero when I did specify it to be ndtset 2.

Any helpful comments are welcome.

thanks,
Anurag

kaneod · Post by **kaneod** » Mon Feb 20, 2012 5:43 am

Hi all,

This is an old thread but I should point out the WFK writing issue still holds in 6.12.1...I have been getting hangs exactly like this and setting prtwf=0 stops it, although for multi dataset runs it is annoying because you can't daisy chain the wavefunctions from one set to the next. My system is a 90-atom molecule using the PAW method.

I'm on Abinit 6.12.1 using gfortran 4.6.2 and linked against mvapich2 1.8, fftw3 and openblas, intel x86_64 and infiniband.

nleconte · Post by **nleconte** » Tue Sep 25, 2012 2:43 pm

Old thread indeed, but just got the same problem with 6.12.1...

kaneod · Post by **kaneod** » Fri Sep 28, 2012 7:23 am

It still happens in 6.12.3. As I've said in another thread, it's hard to show exactly what is going on because the hangs/crashes only happen on certain systems and might be separate phenomena. We have a patched version of 6.12.3 here that seems to avoid the problem but we're not sure why it works yet (!!!) so progress has been slow making a public version we trust. Working on it!

delaveau · Post by **delaveau** » Wed Oct 03, 2012 9:28 am

A way to minimize the memory needed to write WF file in case of MPI_IO is in
wffreadwrite_mpio.F90 to decrease MAXBAND=500 ( number of band write in one shot in WF file).
The celerity of written is link to the number of band written in one shot.
put MAXBAND to a lower value and compile again then try again

M. Delaveau

delaveau · Post by **delaveau** » Wed Oct 03, 2012 9:48 am

A way of decreasing memory needed for writing WF is to decrease MAXBAND in rwwf.F90 . then compile again.
MAXBAND is the number of band written in one shot in WF.
It's value is important for the celerity of writting.

Muriel Delaveau

delaveau · Post by **delaveau** » Fri Oct 05, 2012 10:02 am

The crash migth be because of memory use in rwwf.F90 . For performance reason, the writting order is made MAXBAND by MAXBAND. and is set arbitarly to 500.
You can decrease this value to decrease the memory used.
Hoping it will help

Muriel Delaveau

delaveau · Post by **delaveau** » Fri Oct 05, 2012 10:22 am

A possible cause for crash of big case is the size of the message to be written or read at the same time.
The size of the message is parameted in WffReadWrite_mpio.F90 by MAXBAND=500
the size of the message could be decrease by decreasing MAXBAND.
MAXBAND=500 has been choseen for performance reason because it minimize the number of disk acces

I hope it will help

Muriel Delaveau

delaveau · Post by **delaveau** » Fri Oct 05, 2012 2:58 pm

A possible cause for crash of big case is the size of the message to be written or read at the same time.

The size of the message is parameted in WffReadWrite_mpio.F90 by MAXBANd=500
It could be decrease by decreasing MAXBAND .
I hope it could help

Muriel Delaveau

ABINIT Discussion Forums

mpiio segfault band/k-point/FFT parallelization

mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization

Re: mpiio segfault band/k-point/FFT parallelization