Dear all
I have tried to calculate the GW band structure of half-metallic alloy Ga4P3Ti using abinit-7.6.4. I split the job into four separated runs: Gs density, WFK, SCR, and sigma calculations. With ngkpt=444 and ecut=16, all runs were completed. The input and log files for the completed sigma run are attached as input-1.in and log-1.in, respectively. However, when I increased ngkpt to 888 and ecut to 40 (as a part of the convergence study), the first 3 runs were completed with no problem but the sigma run was terminated with this error:
>>>
Er%ID: 4
Er%Hscr%ID: 4
Memory needed for Er%epsm1 = 654.8 [Mb]
mkdump_Erread_screening with MPI_IO
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 24584 on node compute-0-0.local exited on signal 9 (Killed).
----------------------------------------------------------------------------------------------
<<<
The input and log files for the crashed job are attached as input-2.in and log-2.in, respectively.
There are a couple of posts in the forum concerning "mkdump_Erread_screening with MPI_IO" but I'm not sure if they are relevant to what I encountered.
May I note that the size of the screening file increased from about 449 MB to 4.6 GB, with the corresponding increase of ngkpt and ecut mentioned above. I wonder if the file size could be the cause of the error. Is there the limitation on the size of file that MPI_IO can handle? Anyway,"mkdump_Erread_screening with MPI_IO" also appeared in the log file of the completed run.
Advices and suggestions to solve this problem will be greatly appreciated.
Best regards,
Thanusit
Technical info:
- The abinit-7.6.4 was built on Rocks_clusters-6.1.1, having an intel-corei7 pc with 8 GB of RAM as a frontend and one intel-corei7 and two intel-corei5 pcs, each with 32 GB of RAM, as compute nodes. The build configuration is as follow:
>>>
#enable_fallbacks="no"
enable_exports="yes"
enable_pkg_check="yes"
enable_64bit_flags="yes"
enable_gw_dpc="yes"
enable_mpi="yes"
enable_mpi_io="yes"
enable_clib="yes"
with_mpi_prefix="/opt/openmpi"
with_trio_flavor="netcdf+etsf_io+fox"
with_netcdf_incs="-I/usr/include -I/usr/lib64/gfortran/modules"
with_netcdf_libs="-L/usr/lib64 -lnetcdf -lnetcdff"
with_fft_flavor="fftw3"
with_fft_incs="-I/usr/include"
with_fft_libs="-L/usr/lib64 -lfftw3 -lfftw3f"
with_linalg_flavor="atlas"
with_linalg_incs="-I/usr/include"
with_linalg_libs="-L/usr/lib64/atlas -llapack -lf77blas -lcblas -latlas"
with_algo_flavor="levmar"
with_algo_incs="-I/usr/include"
with_algo_libs="-L/usr/lib64 -llevmar"
with_math_flavor="gsl"
with_math_incs="-I/usr/include"
with_math_libs="-L/usr/lib64 -lgsl -lgslcblas -lm"
with_dft_flavor="atompaw+bigdft+libxc+wannier90"
<<<
- The script used to submit the job is as follow:
>>>
#!/bin/sh
#$ -N Ga4P3Ti_GW_band_ecut40_nband64
#$ -l h_rt=240:00:00
#$ -cwd
#$ -S /bin/sh
#$ -R y
#$ -pe orte 8
#Enable abinit-7.6.4
export PATH=/home/thanusit/apps/gcc-4.4.7/abinit/7.6.4/bin:$PATH
export LD_LIBRARY_PATH=/home/thanusit/apps/gcc-4.4.7/abinit/7.6.4/lib:$LD_LIBRARY_PATH
#Sigma calculation(G0W0)
cd Ga4P3Ti_lda_gw_band-4_sigma
ln -s ../Ga4P3Ti_lda_gw_band-2_wfk/Ga4P3Ti_lda_gw_band-2o_WFK Ga4P3Ti_lda_gw_band-4i_WFK
ln -s ../Ga4P3Ti_lda_gw_band-3_scr/Ga4P3Ti_lda_gw_band-3o_SCR Ga4P3Ti_lda_gw_band-4i_SCR
mpirun -n \$NSLOTS abinit<Ga4P3Ti_lda_gw_band-4.files>& Ga4P3Ti_lda_gw_band-4.log
<<<
- Submitting the job by calling nproc=16 was also tried, and ended up with the same error.
Sigma calculation: mkdump_Erread_screening with MPI_IO [SOLVED]
Moderators: maryam.azizi, bruneval
Sigma calculation: mkdump_Erread_screening with MPI_IO
- Attachments
-
- input-1.in
- (4.99 KiB) Downloaded 438 times
-
- log-1.in
- (179.86 KiB) Downloaded 444 times
-
- input-2.in
- (6.89 KiB) Downloaded 441 times
-
- log-2.in
- (154.02 KiB) Downloaded 434 times
Re: Sigma calculation: mkdump_Erread_screening with MPI_IO [SOLVED]
Dear all,
The crash appeared to be due to insufficient memory and imposing "gwmem 01" fixed the error.
Regards,
Thanusit
The crash appeared to be due to insufficient memory and imposing "gwmem 01" fixed the error.
Regards,
Thanusit