mpi-io output for large files

mverstra · Post by **mverstra** » Wed May 26, 2010 2:08 pm

Hello,

I have been running paral_kgb with bands and a bit of fft parallelism for a largeish molecule in a box (limited input file below) on xeon intel 11.1 openmpi 1.3.0 cluster nic3 here in Liege. The speedup is appreciated, even if not linear. I notice that band parallelization does not reduce memory use per core much, whereas fft parallelism does (but it is less efficient, and there is definitely an underestimation in the abinit prediction of memory usage - sometimes by half)

I now have a problem in the mpi-io writing of the WFK file for the full case with many bands: I get a segfault inside wffreadwrite_mpio:

Code: Select all

[node066:03178] *** Process received signal ***
[node066:03178] Signal: Segmentation fault (11)
[node066:03178] Signal code: Address not mapped (1)
[node066:03178] Failing at address: 0x11891a8c0
[node066:03176] *** Process received signal ***
[node066:03176] Signal: Segmentation fault (11)
[node066:03176] Signal code:  (128)
[node066:03176] Failing at address: (nil)
[node066:03178] [ 0] /lib64/libpthread.so.0 [0x3246e0e4c0]
[node066:03178] [ 1] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(ADIOI_Calc_my_req+0x15d) [0x2b2cf916d09d]
[node066:03178] [ 2] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(ADIOI_GEN_WriteStridedColl+0x3f2) [0x2b2cf917fda2]
[node066:03178] [ 3] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(MPIOI_File_write_all+0xc0) [0x2b2cf918a6f0]
[node066:03178] [ 4] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_write_all+0x23) [0x2b2cf918a623]
[node066:03178] [ 5] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/openmpi/mca_io_romio.so [0x2b2cf916c7f0]
[node066:03178] [ 6] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/libmpi.so.0(MPI_File_write_all+0x4a) [0x2b2cc9a62e0a]
[node066:03178] [ 7] /cvos/shared/apps/openmpi/intel/64/1.3.0/lib64/libmpi_f77.so.0(mpi_file_write_all_f+0x8f) [0x2b2cc97e71ff]
[node066:03178] [ 8] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(wffreadwrite_mpio_+0x16f0) [0x1055380]
[node066:03178] [ 9] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(writewf_+0x1896) [0x104c3a6]
[node066:03178] [10] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(rwwf_+0x8e8) [0x104aaf8]
[node066:03178] [11] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(outwf_+0x21cd) [0xc3268d]
[node066:03178] [12] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(gstate_+0xc3b0) [0x502250]
[node066:03178] [13] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(gstateimg_+0x1409) [0x45bd89]
[node066:03178] [14] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(driver_+0x6ddb) [0x451dbb]
[node066:03178] [15] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(MAIN__+0x3b07) [0x448057]
[node066:03178] [16] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(main+0x3c) [0x44453c]
[node066:03178] [17] /lib64/libc.so.6(__libc_start_main+0xf4) [0x324621d974]
[node066:03178] [18] /u/mverstra/CODES/ABINIT/6.1.2-private/tmp-ifort/src/98_main/abinit(ctrtri_+0x41) [0x444449]

The density file is written correctly (in mpiio).

If I lower ecut or make the box smaller the writing works (and the file can be re-used as input). It looks like it is linked to the file size. http://lists.mcs.anl.gov/pipermail/mpich-discuss/2007-May/002311.html is perhaps related, but for mpich, and I have not found a solution to it.

I have tried printing out the size of the buffer arrays in wffreadwrite_mpio:

call to MPI_FILE_WRITE_ALL 8947488

but they look ok, and the values too. The size of the buffer is not that big (a few hundred Mb), and I did not get an actual complaint about memory allocation, so that is probably not the problem.

Does anyone have a suggestion? Has anyone tried openmpi 1.4 with better results? Or would downgrading to 1.2.6 help? I doubt it

thanks,

Matthieu

nstep 100 # max iteration scf cicle
nband 600 # Number of (occ and empty) bands to be computed
paral_kgb 1
npkpt 2
npband 10 # Nr of Processors at the BAND level, default 1
# npband has to be a divisor or equal to nband
npfft 3 # Nr of Processors at the FFT level, default nproc
# npfft should be a divisor or equal to the number
# of FFT planes along the 2nd and 3rd dimensions
bandpp 1
wfoptalg 4 nloalg 4 fftalg 401 intxc 0 fft_opt_lob 2
accesswff 1

occopt 3 # 7 = gaussian 4 = cold smearing 3 = FD
tsmear 0.01 eV

optstress 0
optforces 0
kptopt 1
ngkpt 1 1 1
nshiftk 1
shiftk 0 0 0

ndtset 1
tolwfr 1.0d-12
nbdbuf 10 # buffer: the last 10 wfns will not converge
istwfk 1*1
ixc 1
ecut 35
nline 6
diemac 1
diemix 0.7
#### Don't use iprcel 45

nsppol 2
spinat
0 0 0.5
0 0 -0.2
...
0 0 0
0 0 0
0 0 0
0 0 0.5

acell 17 10 40 Angstr # 10 angst vacuum
rprim 1.0000 0.0000 0.0000
0.0000 1.0000 0.0000
0.0000 0.0000 1.0000
natom 59
ntypat 2
typat ...
znucl 6 1
xcart
...

torrent · Post by **torrent** » Wed May 26, 2010 7:26 pm

Hi MJ,

Yes, you should have a better behaviour with openmpi 1.4.1 !
We had a lot of of difficulties with the 1.3 version of openmpi (related to mpi-io)... and they disappear with openmpi 1.4.1.
I don't know if it will solve you problem, but try it...

The estimation of memory usage within "kgb" parallelization has not been implemented (when you lanch a kgb run , you get the k-point parallelization estimation).

Marc

mverstra · Post by **mverstra** » Tue Jun 01, 2010 1:49 pm

Hi again,

no, unfortunately openmpi 1.4.2 segfaults in exactly the same way, when writing the wf file...

any other suggestions?

Matthieu

mverstra · Post by **mverstra** » Tue Jun 01, 2010 11:05 pm

Slight progress: ifort with debugging tells me it is trying to access cg_disk while it is not allocated. Unfortunately, the $#^@# compiler is incapable of telling me where and in which routine. Semi useless...

Matthieu

torrent · Post by **torrent** » Wed Jun 02, 2010 12:00 pm

OK Matthieu,

As I'm the one who recently modified (improved), mpi-io access, I feel obliged to help...

cg_disk should only be accessed when mkmem=0.
Is it the case (I guess no) ?

If you "grep" cg_disk in the source files, you find that the only routines concerned with mpi-io are outwf and wfsinp...

Marc

torrent · Post by **torrent** » Wed Jun 02, 2010 12:36 pm

If I run the automatic MPI-IO tests with -fbounds-check (gcc), everything is correct (no bounds overflow)...
??? where is the cg_disk access ?

mverstra · Post by **mverstra** » Thu Jun 03, 2010 12:10 am

my mistake - the complaint came from adding -C for the compilation - ifort complains because we pass unallocated arrays. We had seen this before. I'm now trying what you mentioned with gfortran fcheck-bounds - ifort with bounds checking is unusable for abinit unfortunately

Matthieu

ABINIT Discussion Forums

mpi-io output for large files

mpi-io output for large files

Re: mpi-io output for large files

Re: mpi-io output for large files

Re: mpi-io output for large files

Re: mpi-io output for large files

Re: mpi-io output for large files

Re: mpi-io output for large files