Hello,
I am trying to test band fft parallelization for a Au system with 648 bands and 8 processors. I am using v 6.0.1.
Here are snippets of the input file , log file and the output file. It looks like it is not doing parallelization over bands. Any input is hugely appreciated.
Thanks for your time,
Regards
Mohua
Input:
paral_kgb 1
npband 8 #27
npfft 1 #4
timopt -1
fftalg 401 wfoptalg 4 fft_opt_lob 2
iprcch 0 intxc 0
Output:
Symmetries : space group Pm -3 m (#221); Bravais cP (primitive cubic)
================================================================================
Values of the parameters that define the memory need of the present run
intxc = 0 ionmov = 0 iscf = 7 xclevel = 1
lmnmax = 2 lnmax = 2 mband = 648 mffmem = 1
P mgfft = 36 mkmem = 1 mpssoang= 3 mpw = 286
mqgrid = 3001 natom = 107 nfft = 46656 nkpt = 1
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 48 n1xccc = 2501 ntypat = 1 occopt = 3
================================================================================
P This job should need less than 32.684 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 2.830 Mbytes ; DEN or POT disk file : 0.358 Mbytes.
================================================================================
-outvars: echo values of preprocessed input variables --------
acell 2.3010000000E+01 2.3010000000E+01 2.3010000000E+01 Bohr
amu 1.96966540E+02
ecut 2.50000000E+00 Hartree
enunit 2
fftalg 401
fft_opt_lob 2
iprcch 0
kpt 2.50000000E-01 2.50000000E-01 2.50000000E-01
kptrlen 4.60200000E+01
kptrlatt 2 0 0 0 2 0 0 0 2
P mkmem 1
natom 107
nband 648
ngfft 36 36 36
nkpt 1
nline 1
npband 8
npulayit 10
nstep 1
nsym 48
ntypat 1
P newkpt: treating 648 bands with npw= 286 for ikpt= 1 by node 0
Log:
npfft, npband and npkpt 1 8 1
mpi_enreg%sizecart(1),np_fft 1 1
mpi_enreg%sizecart(2),np_band 8 8
mpi_enreg%sizecart(3),np_kpt 1 1
For dataset= 1 a possible choice for less than 0 processors is:
nproc npkpt npband npfft bandpp
chkinp: WARNING -
When k-points/bands/FFT parallelism is activated
(paral_kgb=1), only MPI-IO input/output is allowed !
accesswff/=1 in your input file
You will not be able to perform input/output !
kpgio: loop on k-points done in parallel
band fft parallelization
Moderator: bguster
Re: band fft parallelization
Hello Mohua,
I think the run is parallelizing, but you have run without setting accesswff to 1 (mpi-io) so the wf file will be incomplete (corrupt actually) in the end. If you don't care about having the file, fine (that's what I am doing). This is just a warning. To use mpi-io you have to add it to the configure options and compile/link with it.
I am having some non-good behavior in getting band parallelization to work. I have done a simple test case with 64 Al atoms in FCC, trying to turn on the band parallelization (no fft, input details below) on ifort 11.1/openmpi 1.3.0/xeon cluster (cvos)
with 512 bands:
nproc cpu wall memory
1 7044.3 7044.3 308.489
2 3882.5 3882.5 248.184
4 12394.3 12394.3 218.028
8 13275.6 13275.6 202.954
16 1467.3 1467.3 195.416
32 961.6 961.6 182.102
So for 4 and 8 procs there is a problem, and for more there is definitely a speedup, but far from linear. Is my problem still too small? The slope for large nproc is 0.16 instead of 1 (obviously it will deteriorate at some point. Any hints or variables I have forgotten?
Cheers
Matthieu
#64 Al atoms 192 electrons, atom 1 displaced to break symmetry
#band parallelization, with KGB algorithm
paral_kgb 1 # -32
wfoptalg 4
nloalg 4
fftalg 401
intxc 0
fft_opt_lob 2
npkpt 1
npfft 1
bandpp 4
npband 32 # nproc
#with 128 bands no speedup at all: total time for each cpu ~ constant
#note: accesswff not used, but no output is requested
I think the run is parallelizing, but you have run without setting accesswff to 1 (mpi-io) so the wf file will be incomplete (corrupt actually) in the end. If you don't care about having the file, fine (that's what I am doing). This is just a warning. To use mpi-io you have to add it to the configure options and compile/link with it.
I am having some non-good behavior in getting band parallelization to work. I have done a simple test case with 64 Al atoms in FCC, trying to turn on the band parallelization (no fft, input details below) on ifort 11.1/openmpi 1.3.0/xeon cluster (cvos)
with 512 bands:
nproc cpu wall memory
1 7044.3 7044.3 308.489
2 3882.5 3882.5 248.184
4 12394.3 12394.3 218.028
8 13275.6 13275.6 202.954
16 1467.3 1467.3 195.416
32 961.6 961.6 182.102
So for 4 and 8 procs there is a problem, and for more there is definitely a speedup, but far from linear. Is my problem still too small? The slope for large nproc is 0.16 instead of 1 (obviously it will deteriorate at some point. Any hints or variables I have forgotten?
Cheers
Matthieu
#64 Al atoms 192 electrons, atom 1 displaced to break symmetry
#band parallelization, with KGB algorithm
paral_kgb 1 # -32
wfoptalg 4
nloalg 4
fftalg 401
intxc 0
fft_opt_lob 2
npkpt 1
npfft 1
bandpp 4
npband 32 # nproc
#with 128 bands no speedup at all: total time for each cpu ~ constant
#note: accesswff not used, but no output is requested
Matthieu Verstraete
University of Liege, Belgium
University of Liege, Belgium