Sigma calculation: mkdump_Erread_screening with MPI_IO
Posted: Thu May 29, 2014 1:15 pm
Dear all
I have tried to calculate the GW band structure of half-metallic alloy Ga4P3Ti using abinit-7.6.4. I split the job into four separated runs: Gs density, WFK, SCR, and sigma calculations. With ngkpt=444 and ecut=16, all runs were completed. The input and log files for the completed sigma run are attached as input-1.in and log-1.in, respectively. However, when I increased ngkpt to 888 and ecut to 40 (as a part of the convergence study), the first 3 runs were completed with no problem but the sigma run was terminated with this error:
>>>
Er%ID: 4
Er%Hscr%ID: 4
Memory needed for Er%epsm1 = 654.8 [Mb]
mkdump_Erread_screening with MPI_IO
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 24584 on node compute-0-0.local exited on signal 9 (Killed).
----------------------------------------------------------------------------------------------
<<<
The input and log files for the crashed job are attached as input-2.in and log-2.in, respectively.
There are a couple of posts in the forum concerning "mkdump_Erread_screening with MPI_IO" but I'm not sure if they are relevant to what I encountered.
May I note that the size of the screening file increased from about 449 MB to 4.6 GB, with the corresponding increase of ngkpt and ecut mentioned above. I wonder if the file size could be the cause of the error. Is there the limitation on the size of file that MPI_IO can handle? Anyway,"mkdump_Erread_screening with MPI_IO" also appeared in the log file of the completed run.
Advices and suggestions to solve this problem will be greatly appreciated.
Best regards,
Thanusit
Technical info:
- The abinit-7.6.4 was built on Rocks_clusters-6.1.1, having an intel-corei7 pc with 8 GB of RAM as a frontend and one intel-corei7 and two intel-corei5 pcs, each with 32 GB of RAM, as compute nodes. The build configuration is as follow:
>>>
#enable_fallbacks="no"
enable_exports="yes"
enable_pkg_check="yes"
enable_64bit_flags="yes"
enable_gw_dpc="yes"
enable_mpi="yes"
enable_mpi_io="yes"
enable_clib="yes"
with_mpi_prefix="/opt/openmpi"
with_trio_flavor="netcdf+etsf_io+fox"
with_netcdf_incs="-I/usr/include -I/usr/lib64/gfortran/modules"
with_netcdf_libs="-L/usr/lib64 -lnetcdf -lnetcdff"
with_fft_flavor="fftw3"
with_fft_incs="-I/usr/include"
with_fft_libs="-L/usr/lib64 -lfftw3 -lfftw3f"
with_linalg_flavor="atlas"
with_linalg_incs="-I/usr/include"
with_linalg_libs="-L/usr/lib64/atlas -llapack -lf77blas -lcblas -latlas"
with_algo_flavor="levmar"
with_algo_incs="-I/usr/include"
with_algo_libs="-L/usr/lib64 -llevmar"
with_math_flavor="gsl"
with_math_incs="-I/usr/include"
with_math_libs="-L/usr/lib64 -lgsl -lgslcblas -lm"
with_dft_flavor="atompaw+bigdft+libxc+wannier90"
<<<
- The script used to submit the job is as follow:
>>>
#!/bin/sh
#$ -N Ga4P3Ti_GW_band_ecut40_nband64
#$ -l h_rt=240:00:00
#$ -cwd
#$ -S /bin/sh
#$ -R y
#$ -pe orte 8
#Enable abinit-7.6.4
export PATH=/home/thanusit/apps/gcc-4.4.7/abinit/7.6.4/bin:$PATH
export LD_LIBRARY_PATH=/home/thanusit/apps/gcc-4.4.7/abinit/7.6.4/lib:$LD_LIBRARY_PATH
#Sigma calculation(G0W0)
cd Ga4P3Ti_lda_gw_band-4_sigma
ln -s ../Ga4P3Ti_lda_gw_band-2_wfk/Ga4P3Ti_lda_gw_band-2o_WFK Ga4P3Ti_lda_gw_band-4i_WFK
ln -s ../Ga4P3Ti_lda_gw_band-3_scr/Ga4P3Ti_lda_gw_band-3o_SCR Ga4P3Ti_lda_gw_band-4i_SCR
mpirun -n \$NSLOTS abinit<Ga4P3Ti_lda_gw_band-4.files>& Ga4P3Ti_lda_gw_band-4.log
<<<
- Submitting the job by calling nproc=16 was also tried, and ended up with the same error.
I have tried to calculate the GW band structure of half-metallic alloy Ga4P3Ti using abinit-7.6.4. I split the job into four separated runs: Gs density, WFK, SCR, and sigma calculations. With ngkpt=444 and ecut=16, all runs were completed. The input and log files for the completed sigma run are attached as input-1.in and log-1.in, respectively. However, when I increased ngkpt to 888 and ecut to 40 (as a part of the convergence study), the first 3 runs were completed with no problem but the sigma run was terminated with this error:
>>>
Er%ID: 4
Er%Hscr%ID: 4
Memory needed for Er%epsm1 = 654.8 [Mb]
mkdump_Erread_screening with MPI_IO
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 24584 on node compute-0-0.local exited on signal 9 (Killed).
----------------------------------------------------------------------------------------------
<<<
The input and log files for the crashed job are attached as input-2.in and log-2.in, respectively.
There are a couple of posts in the forum concerning "mkdump_Erread_screening with MPI_IO" but I'm not sure if they are relevant to what I encountered.
May I note that the size of the screening file increased from about 449 MB to 4.6 GB, with the corresponding increase of ngkpt and ecut mentioned above. I wonder if the file size could be the cause of the error. Is there the limitation on the size of file that MPI_IO can handle? Anyway,"mkdump_Erread_screening with MPI_IO" also appeared in the log file of the completed run.
Advices and suggestions to solve this problem will be greatly appreciated.
Best regards,
Thanusit
Technical info:
- The abinit-7.6.4 was built on Rocks_clusters-6.1.1, having an intel-corei7 pc with 8 GB of RAM as a frontend and one intel-corei7 and two intel-corei5 pcs, each with 32 GB of RAM, as compute nodes. The build configuration is as follow:
>>>
#enable_fallbacks="no"
enable_exports="yes"
enable_pkg_check="yes"
enable_64bit_flags="yes"
enable_gw_dpc="yes"
enable_mpi="yes"
enable_mpi_io="yes"
enable_clib="yes"
with_mpi_prefix="/opt/openmpi"
with_trio_flavor="netcdf+etsf_io+fox"
with_netcdf_incs="-I/usr/include -I/usr/lib64/gfortran/modules"
with_netcdf_libs="-L/usr/lib64 -lnetcdf -lnetcdff"
with_fft_flavor="fftw3"
with_fft_incs="-I/usr/include"
with_fft_libs="-L/usr/lib64 -lfftw3 -lfftw3f"
with_linalg_flavor="atlas"
with_linalg_incs="-I/usr/include"
with_linalg_libs="-L/usr/lib64/atlas -llapack -lf77blas -lcblas -latlas"
with_algo_flavor="levmar"
with_algo_incs="-I/usr/include"
with_algo_libs="-L/usr/lib64 -llevmar"
with_math_flavor="gsl"
with_math_incs="-I/usr/include"
with_math_libs="-L/usr/lib64 -lgsl -lgslcblas -lm"
with_dft_flavor="atompaw+bigdft+libxc+wannier90"
<<<
- The script used to submit the job is as follow:
>>>
#!/bin/sh
#$ -N Ga4P3Ti_GW_band_ecut40_nband64
#$ -l h_rt=240:00:00
#$ -cwd
#$ -S /bin/sh
#$ -R y
#$ -pe orte 8
#Enable abinit-7.6.4
export PATH=/home/thanusit/apps/gcc-4.4.7/abinit/7.6.4/bin:$PATH
export LD_LIBRARY_PATH=/home/thanusit/apps/gcc-4.4.7/abinit/7.6.4/lib:$LD_LIBRARY_PATH
#Sigma calculation(G0W0)
cd Ga4P3Ti_lda_gw_band-4_sigma
ln -s ../Ga4P3Ti_lda_gw_band-2_wfk/Ga4P3Ti_lda_gw_band-2o_WFK Ga4P3Ti_lda_gw_band-4i_WFK
ln -s ../Ga4P3Ti_lda_gw_band-3_scr/Ga4P3Ti_lda_gw_band-3o_SCR Ga4P3Ti_lda_gw_band-4i_SCR
mpirun -n \$NSLOTS abinit<Ga4P3Ti_lda_gw_band-4.files>& Ga4P3Ti_lda_gw_band-4.log
<<<
- Submitting the job by calling nproc=16 was also tried, and ended up with the same error.