paral_kgb options
Moderators: fgoudreault, mcote
Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
paral_kgb options
I have very little experience with running abinit in the parallel mode. I did recently install 6.4.2 and did check one of the parallel jobs in the test suite and also successfully ran a parallel job with bccLi, but the input file for black phosphorus is giving the following error:
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
mpiexec: Warning: tasks 0-3 died with signal 6 (Aborted).
I am thinking that I have not defined a parameter that is needed??? The input file is given below. I will be glad to provide any additional files if needed. I set up the pbs to run with 4 nodes and the execute command was:
mpiexec abinit <P.files
Thanks in advance for any suggestions,
Natalie Holzwarth
Department of Physics, Wake Forest University, Winston-Salem, NC 27106 USA
---P.in-----
ecut 32.00
pawecutdg 64.
#Structural relaxation
ionmov 2
optcell 2
ecutsm 0.5 Ha
dilatmx 1.8
ntime 100
spgroup 64
brvltt -1
acell 3.3117 10.158 4.243 angstrom
nstep 40
toldfe 1.0d-10
nband 40
occopt 7 tsmear 5.0d-4
#iscf 14
#Definition of the atom types
ntypat 1
znucl 15
#Definition of the atoms
natom 4
natrd 1
typat 1
xred 0.00000 0.10540 0.07470
#Definition of the k-point grid
kptopt 1
ngkpt 4 4 3
nshiftk 1
shiftk 0.5 0.5 0.5
prtwf 0
prtden 0
#parallel
paral_kgb 1
npband 1
npfft 1
npkpt 4
wfoptalg 4
nloalg 4
fftalg 401
intxc 0
fft_opt_lob 2
---------end of P.in------------------
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
*** glibc detected *** free(): invalid pointer: 0x0000002ab7b39010 ***
mpiexec: Warning: tasks 0-3 died with signal 6 (Aborted).
I am thinking that I have not defined a parameter that is needed??? The input file is given below. I will be glad to provide any additional files if needed. I set up the pbs to run with 4 nodes and the execute command was:
mpiexec abinit <P.files
Thanks in advance for any suggestions,
Natalie Holzwarth
Department of Physics, Wake Forest University, Winston-Salem, NC 27106 USA
---P.in-----
ecut 32.00
pawecutdg 64.
#Structural relaxation
ionmov 2
optcell 2
ecutsm 0.5 Ha
dilatmx 1.8
ntime 100
spgroup 64
brvltt -1
acell 3.3117 10.158 4.243 angstrom
nstep 40
toldfe 1.0d-10
nband 40
occopt 7 tsmear 5.0d-4
#iscf 14
#Definition of the atom types
ntypat 1
znucl 15
#Definition of the atoms
natom 4
natrd 1
typat 1
xred 0.00000 0.10540 0.07470
#Definition of the k-point grid
kptopt 1
ngkpt 4 4 3
nshiftk 1
shiftk 0.5 0.5 0.5
prtwf 0
prtden 0
#parallel
paral_kgb 1
npband 1
npfft 1
npkpt 4
wfoptalg 4
nloalg 4
fftalg 401
intxc 0
fft_opt_lob 2
---------end of P.in------------------
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: paral_kgb options
Hello Natalie,
This sounds like a bug - Abinit seems to free memory that has gone already. So even if a parameter could be wrong or missing (the input looks fine to me), it should not crash with such a memory corruption. Depending on your glibc version, the rest may work or not. I will try to reproduce this behavior on my system but would you be so kind to provide some extra debugging information.
Try to "prepend" to your parallel mpiexec launch and see if Abinit goes further (glib should ignore heap corruption and let Abinit continue with a risk of memory leakage). If you cannot control that all the parallel slots are located on the same node, add this variable to your parallel environment to have it propagated to all the running nodes.
If MALLOC_CHECK_ is set to 1 as in glibc should provide more information and let Abinit run up to the end (or segfault). Does it work - any relevant info?
What Linux variant are you running? (glibc version?) What version of Fortran and MPICH2 were used to compile Abinit?
Kind regards,
Alain
This sounds like a bug - Abinit seems to free memory that has gone already. So even if a parameter could be wrong or missing (the input looks fine to me), it should not crash with such a memory corruption. Depending on your glibc version, the rest may work or not. I will try to reproduce this behavior on my system but would you be so kind to provide some extra debugging information.
Try to "prepend"
Code: Select all
MALLOC_CHECK_=0
If MALLOC_CHECK_ is set to 1 as in
Code: Select all
MALLOC_CHECK_=1 mpiexec ...
What Linux variant are you running? (glibc version?) What version of Fortran and MPICH2 were used to compile Abinit?
Kind regards,
Alain
Re: paral_kgb options
Dear Alain,
I tried to rerun the job with the two different values of MALLOC_CHECK_=0 or MALLOC_CHECK_=1 and the results look identical to me.
I am using intel 11.1 for fortran and mpich2 and the compiler linked to the following libraries:
FC_LIBS=" -L/opt/intel111-libs/mpich2-1.0.8p1/lib -lmpichf90 -lmpich -lpthread -lrt -L/system0/opt/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -L/usrlib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64-L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../.. -L/lib64 -L/lib -L/usr/lib64 -L/usr/lib -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lirc_s -ldl"
This was generated automatically during the ./configure step. As I mentioned, the program does work in some cases so it cannot be completely wrong. On the other hand, I did notice that the optimization level was rather high -- FCFLAGS="-O3 -xW -vec-report0" -- perhaps that is a bad idea. Thank you for offering to try reproduce the error. I will be glad to send you the PAW pseudopotential file if that would be helpful. Thanks, Natalie
I tried to rerun the job with the two different values of MALLOC_CHECK_=0 or MALLOC_CHECK_=1 and the results look identical to me.
I am using intel 11.1 for fortran and mpich2 and the compiler linked to the following libraries:
FC_LIBS=" -L/opt/intel111-libs/mpich2-1.0.8p1/lib -lmpichf90 -lmpich -lpthread -lrt -L/system0/opt/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -L/usrlib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64-L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../.. -L/lib64 -L/lib -L/usr/lib64 -L/usr/lib -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lirc_s -ldl"
This was generated automatically during the ./configure step. As I mentioned, the program does work in some cases so it cannot be completely wrong. On the other hand, I did notice that the optimization level was rather high -- FCFLAGS="-O3 -xW -vec-report0" -- perhaps that is a bad idea. Thank you for offering to try reproduce the error. I will be glad to send you the PAW pseudopotential file if that would be helpful. Thanks, Natalie
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: paral_kgb options
Dear Natalie,
O3 should be alright - the routine test procedure uses O2 but I have no trouble with O3 and previous Abinit releases - from the accuracy and stability point of view.
Anything that helps me to be in the same conditions as you are welcomed so I'll gladly use your PAW pseudo - you don't need to upload the file if it is the same as the one on your pwpaw table.
Kind regards,
Alain
O3 should be alright - the routine test procedure uses O2 but I have no trouble with O3 and previous Abinit releases - from the accuracy and stability point of view.
Anything that helps me to be in the same conditions as you are welcomed so I'll gladly use your PAW pseudo - you don't need to upload the file if it is the same as the one on your pwpaw table.
Kind regards,
Alain
Re: paral_kgb options
The PAW function should be very similar to the one on the web page, but we changed a few parameters and have not updated the web page since Marc Torrent improved the code. I will upload a bzip2 file. Natalie
Re: paral_kgb options
I failed to upload the file to the forum. But it is available on the unlinked webpage http://www.wfu.edu/~natalie/papers/pwpa ... abinit.bz2
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: paral_kgb options
Hello Natalie,
Thanks for the pseudo. Uploading of large files is probably deactivated.
Just a small question ... if you comment out the paral_kgb, npband, ... parallel options in your P.in i.e. change the last section to and launch with a mpiexec -np 4 abinit ..., does it run - in parallel - without crashing? I thing it's paral_kgb 1 that is toxic here.
I admit that I never use it in this context. I use for example a paral_kgb -4 during a sequential Abinit run to output suggestions about parallelism (here on up to 4 slots) and then comment it out during the actual parallel job. But I agree that paral_kgb 1 should work according the documentation. I'll check further.
Kind regards,
Alain
Thanks for the pseudo. Uploading of large files is probably deactivated.
Just a small question ... if you comment out the paral_kgb, npband, ... parallel options in your P.in i.e. change the last section to
Code: Select all
#parallel
#paral_kgb 1
#npband 1
#npfft 1
#npkpt 4
wfoptalg 4
nloalg 4
fftalg 401
intxc 0
fft_opt_lob 2
I admit that I never use it in this context. I use for example a paral_kgb -4 during a sequential Abinit run to output suggestions about parallelism (here on up to 4 slots) and then comment it out during the actual parallel job. But I agree that paral_kgb 1 should work according the documentation. I'll check further.
Kind regards,
Alain
Last edited by Alain_Jacques on Tue Dec 07, 2010 10:17 pm, edited 1 time in total.
Re: paral_kgb options
Dear Alain,
It does seem to be running in parallel wth your most recent suggestion. It will take a while to finish, but it looks like this is a "fix"? Did I define the variables in the wrong order or what do you suggest in general? Is nkpt=#processes the default? In any case, Thanks, Natalie
It does seem to be running in parallel wth your most recent suggestion. It will take a while to finish, but it looks like this is a "fix"? Did I define the variables in the wrong order or what do you suggest in general? Is nkpt=#processes the default? In any case, Thanks, Natalie
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: paral_kgb options
The idea is that if you run a - sequential or parallel - Abinit with paral_kgb -100, it will output
Alain
and then stop. Then npkpt, npband, ... and -np=nproc are to be adjusted considering the number of available CPUs and the optimal weight. then I (luckily) get rid of that variable.WARNING in invars1m For dataset= 1 a possible choice for less than 100 processors is:
nproc npkpt npband npfft bandpp weight
96 12 4 2 2 0.50
96 12 8 1 1 1.00
60 12 5 1 4 1.00
48 12 2 2 4 0.25
48 12 4 1 2 1.00
24 12 2 1 4 1.00
invars1m : launch a parallel version of ABINIT with a number of processor among the above list, and the associated input variables
npkpt, npband, npfft and bandpp. The optimal weight is close to 1.
Alain
Re: paral_kgb options
Natalie,
Just a little remark:
All these options:
should be set by default in the future 6.6 version of Abinit.
Also, if you put wfoptalg=14, you get more efficient runs (this will be the default in v6.6).
But this has no link with your initial problem.
Marc
Just a little remark:
All these options:
wfoptalg 4
nloalg 4
fftalg 401
intxc 0
fft_opt_lob 2
should be set by default in the future 6.6 version of Abinit.
Also, if you put wfoptalg=14, you get more efficient runs (this will be the default in v6.6).
But this has no link with your initial problem.
Marc
Marc Torrent
CEA - Bruyères-le-Chatel
France
CEA - Bruyères-le-Chatel
France
Re: paral_kgb options
Thanks Alain and Marc! Your help is very much appreciated. Natalie
Re: paral_kgb options
It seems to me that the reason for the observed behaviour is line 314 in 66_wfs/prep_getghc.F90 (I'm referring to the production 6.4.3 code):
allocate(swavef_alltoall_sym(2,(ndatarecv_tot*bandpp_sym)*iscalc))
with iscalc = 0 (set previously at line 186). This yields a zero-sized 2nd dimension, which seems to disturb intel compilers.
Changing this line to
allocate(swavef_alltoall_sym(2,ndatarecv_tot*bandpp_sym))
seems to fix the problem.
Cheers, BK
allocate(swavef_alltoall_sym(2,(ndatarecv_tot*bandpp_sym)*iscalc))
with iscalc = 0 (set previously at line 186). This yields a zero-sized 2nd dimension, which seems to disturb intel compilers.
Changing this line to
allocate(swavef_alltoall_sym(2,ndatarecv_tot*bandpp_sym))
seems to fix the problem.
Cheers, BK
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: paral_kgb options
Dear BK,
Thanks for the debugging. Fixed in the upcoming release.
Alain
Thanks for the debugging. Fixed in the upcoming release.
Alain
Re: paral_kgb options
Dear Alain,
(thanks to BK for the debugging)
I'm not sure this correction is the optimal one... because of memory considerations.
In that level of the code, we absolutely have to save to memory.
and the proposed code modification introduces an unused array which can have a large size.
Instead of
...I would propose
This is for sure not elegant at all... but it's saves memory and avoid the use of the zero-sized array.
Do you agree with this ?
A bientôt,
Marc
(thanks to BK for the debugging)
I'm not sure this correction is the optimal one... because of memory considerations.
In that level of the code, we absolutely have to save to memory.
and the proposed code modification introduces an unused array which can have a large size.
Instead of
Code: Select all
allocate(swavef_alltoall_sym(2,(ndatarecv_tot*bandpp_sym)*iscalc))
...I would propose
Code: Select all
if (iscalc>0) then
allocate(swavef_alltoall_sym(2,ndatarecv_tot*bandpp_sym))
else
allocate(swavef_alltoall_sym(1,1))
endif
This is for sure not elegant at all... but it's saves memory and avoid the use of the zero-sized array.
Do you agree with this ?
A bientôt,
Marc
Marc Torrent
CEA - Bruyères-le-Chatel
France
CEA - Bruyères-le-Chatel
France
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: paral_kgb options
Hello Marc,
I was glad to see that 66_wfs/prep_getghc.F90 allocations were already modified in 6.6.1 but you're right about the memory size issue and your solution is definitely more efficient (from the economy and compliance with Intel's compiler - not ugly at all ). I don't see any other gotcha in the rest of the routine.
I'm somewhat puzzled by this behavior especially considering that zero sized arrays are allowed thanks to flexible array members within ISO99 C standard even on Intel's compilers. Anyway there are several other places in Abinit with similar structures that could be problematic. Don't know if anyone already tried to modify them - it's a bit awkward that it cannot be detected early. I'll have a look on the compiler's manual to see if there is an option that affects this behavior.
Amicalement,
Alain
I was glad to see that 66_wfs/prep_getghc.F90 allocations were already modified in 6.6.1 but you're right about the memory size issue and your solution is definitely more efficient (from the economy and compliance with Intel's compiler - not ugly at all ). I don't see any other gotcha in the rest of the routine.
I'm somewhat puzzled by this behavior especially considering that zero sized arrays are allowed thanks to flexible array members within ISO99 C standard even on Intel's compilers. Anyway there are several other places in Abinit with similar structures that could be problematic. Don't know if anyone already tried to modify them - it's a bit awkward that it cannot be detected early. I'll have a look on the compiler's manual to see if there is an option that affects this behavior.
Amicalement,
Alain
Re: paral_kgb options
Btw there is a somehow similar problem in 79_seqpar_mpi/vtorho.F90.
Array buffer2 is allocated by using the variable mb2dkpsp. The latter is initialized or not, depending on context. When using uninitialized while allocating buffer2, arbitrary effects are seen (SIGSEGV, unbalanced MPI barriers and the like). One gets pointed to the corresponding line with intel compilers after compiling with -ftrapuv.
The following patch (abinit 6.4.3) fixes this:
However, I did not take a look into newer abinit releases if the problem still exists.
Cheers BK
Array buffer2 is allocated by using the variable mb2dkpsp. The latter is initialized or not, depending on context. When using uninitialized while allocating buffer2, arbitrary effects are seen (SIGSEGV, unbalanced MPI barriers and the like). One gets pointed to the corresponding line with intel compilers after compiling with -ftrapuv.
The following patch (abinit 6.4.3) fixes this:
Code: Select all
Index: src/79_seqpar_mpi/vtorho.F90
===================================================================
RCS file: /gfs2/work/bzfbbk/CVS/ABINIT/abinit-6.4.3/src/79_seqpar_mpi/vtorho.F90,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 vtorho.F90
--- src/79_seqpar_mpi/vtorho.F90 2 Feb 2011 14:34:55 -0000 1.1.1.1
+++ src/79_seqpar_mpi/vtorho.F90 22 Feb 2011 13:59:22 -0000
@@ -1216,7 +1216,8 @@
! If needed, exchange the values of eigen,resid,eknk,enlnk,grnlnk
allocate(buffer1((4+3*natom*optforces-psps%usepaw)*mbdkpsp))
- allocate(buffer2(mb2dkpsp*paw_dmft%use_dmft))
+ if(paw_dmft%use_dmft==1) &
+& allocate(buffer2(mb2dkpsp*paw_dmft%use_dmft))
! Pack eigen,resid,eknk,enlnk,grnlnk in buffer1
buffer1(1 : mbdkpsp)=eigen(:)
buffer1(1+ mbdkpsp:2*mbdkpsp)=resid(:)
@@ -1287,6 +1288,7 @@
grnlnk(:,:)=reshape(buffer1(index1+1:index1+3*natom*mbdkpsp),&
& (/ 3*natom , mbdkpsp /) )
end if
+ if(allocated(buffer2)) deallocate(buffer2)
deallocate(buffer1)
call timab(29,2,tsec)
However, I did not take a look into newer abinit releases if the problem still exists.
Cheers BK
Re: paral_kgb options
Thanks BK! This has been incorporated into 6.6 (soon to be released patch)
Matthieu
Matthieu
Matthieu Verstraete
University of Liege, Belgium
University of Liege, Belgium