I have been running paral_kgb with bands and a bit of fft parallelism for a largeish molecule in a box (limited input file below) on xeon intel 11.1 openmpi 1.3.0 cluster nic3 here in Liege. The speedup is appreciated, even if not linear. I notice that band parallelization does not reduce memory use per core much, whereas fft parallelism does (but it is less efficient, and there is definitely an underestimation in the abinit prediction of memory usage - sometimes by half)
I now have a problem in the mpi-io writing of the WFK file for the full case with many bands: I get a segfault inside wffreadwrite_mpio:
Code: Select all
[node066:03178] *** Process received signal ***
[node066:03178] Signal: Segmentation fault (11)
[node066:03178] Signal code: Address not mapped (1)
[node066:03178] Failing at address: 0x11891a8c0
[node066:03176] *** Process received signal ***
[node066:03176] Signal: Segmentation fault (11)
[node066:03176] Signal code: (128)
[node066:03176] Failing at address: (nil)
[node066:03178] [ 0] /lib64/libpthread.so.0 [0x3246e0e4c0]
[node066:03178] [ 1] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(ADIOI_Calc_my_req+0x15d) [0x2b2cf916d09d]
[node066:03178] [ 2] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(ADIOI_GEN_WriteStridedColl+0x3f2) [0x2b2cf917fda2]
[node066:03178] [ 3] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(MPIOI_File_write_all+0xc0) [0x2b2cf918a6f0]
[node066:03178] [ 4] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_write_all+0x23) [0x2b2cf918a623]
[node066:03178] [ 5] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so [0x2b2cf916c7f0]
[node066:03178] [ 6] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/libmpi.so.0(MPI_File_write_all+0x4a) [0x2b2cc9a62e0a]
[node066:03178] [ 7] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/libmpi_f77.so.0(mpi_file_write_all_f+0x8f) [0x2b2cc97e71ff]
[node066:03178] [ 8] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(wffreadwrite_mpio_+0x16f0) [0x1055380]
[node066:03178] [ 9] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(writewf_+0x1896) [0x104c3a6]
[node066:03178] [10] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(rwwf_+0x8e8) [0x104aaf8]
[node066:03178] [11] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(outwf_+0x21cd) [0xc3268d]
[node066:03178] [12] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(gstate_+0xc3b0) [0x502250]
[node066:03178] [13] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(gstateimg_+0x1409) [0x45bd89]
[node066:03178] [14] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(driver_+0x6ddb) [0x451dbb]
[node066:03178] [15] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(MAIN__+0x3b07) [0x448057]
[node066:03178] [16] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(main+0x3c) [0x44453c]
[node066:03178] [17] /lib64/libc.so.6(__libc_start_main+0xf4) [0x324621d974]
[node066:03178] [18] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(ctrtri_+0x41) [0x444449]
The density file is written correctly (in mpiio).
If I lower ecut or make the box smaller the writing works (and the file can be re-used as input). It looks like it is linked to the file size. http://lists.mcs.anl.gov/pipermail/mpich-discuss/2007-May/002311.html is perhaps related, but for mpich, and I have not found a solution to it.
I have tried printing out the size of the buffer arrays in wffreadwrite_mpio:
call to MPI_FILE_WRITE_ALL 8947488
but they look ok, and the values too. The size of the buffer is not that big (a few hundred Mb), and I did not get an actual complaint about memory allocation, so that is probably not the problem.
Does anyone have a suggestion? Has anyone tried openmpi 1.4 with better results? Or would downgrading to 1.2.6 help? I doubt it
thanks,
Matthieu
nstep 100 # max iteration scf cicle
nband 600 # Number of (occ and empty) bands to be computed
paral_kgb 1
npkpt 2
npband 10 # Nr of Processors at the BAND level, default 1
# npband has to be a divisor or equal to nband
npfft 3 # Nr of Processors at the FFT level, default nproc
# npfft should be a divisor or equal to the number
# of FFT planes along the 2nd and 3rd dimensions
bandpp 1
wfoptalg 4 nloalg 4 fftalg 401 intxc 0 fft_opt_lob 2
accesswff 1
occopt 3 # 7 = gaussian 4 = cold smearing 3 = FD
tsmear 0.01 eV
optstress 0
optforces 0
kptopt 1
ngkpt 1 1 1
nshiftk 1
shiftk 0 0 0
ndtset 1
tolwfr 1.0d-12
nbdbuf 10 # buffer: the last 10 wfns will not converge
istwfk 1*1
ixc 1
ecut 35
nline 6
diemac 1
diemix 0.7
#### Don't use iprcel 45
nsppol 2
spinat
0 0 0.5
0 0 -0.2
...
0 0 0
0 0 0
0 0 0
0 0 0.5
acell 17 10 40 Angstr # 10 angst vacuum
rprim 1.0000 0.0000 0.0000
0.0000 1.0000 0.0000
0.0000 0.0000 1.0000
natom 59
ntypat 2
typat ...
znucl 6 1
xcart
...