I am making some tests with a bulk water system (128 H2O molecules, cubic cell a = 15.66 angstrom, structure obtained from classical MD at 300 K). I am interested just in single point calculations for getting energy and forces at the PBE level at gamma point only, so from what I understand the only way of improving the efficiency is playing with the parallelization at the band and fft level. I am doing the tests in a Tier-0 supercomputer but it seems to me that my calculations are very slow, compared to the same calculation with other codes (e.g. CP2K, Siesta...) in the same machine, hence I think there is something quite wrong with my setup or compilation.
First, the output of abinit -b is this:
Code: Select all
DATA TYPE INFORMATION:
REAL: Data type name: REAL(DP)
Kind value: 8
Precision: 15
Smallest nonnegligible quantity relative to 1: 0.22204460E-15
Smallest positive number: 0.22250739-307
Largest representable number: 0.17976931+309
INTEGER: Data type name: INTEGER(default)
Kind value: 4
Bit size: 32
Largest representable number: 2147483647
LOGICAL: Data type name: LOGICAL
Kind value: 4
CHARACTER: Data type name: CHARACTER Kind value: 1
==== Using MPI-2 specifications ====
MPI-IO support is ON
xmpi_tag_ub ................ 2147483647
xmpi_bsize_ch .............. 1
xmpi_bsize_int ............. 4
xmpi_bsize_sp .............. 4
xmpi_bsize_dp .............. 8
xmpi_bsize_spc ............. 8
xmpi_bsize_dpc ............. 16
xmpio_bsize_frm ............ 4
xmpi_address_kind .......... 8
xmpi_offset_kind ........... 8
MPI_WTICK .................. 1.000000000000000E-006
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_INTEL CXX_INTEL FC_INTEL
HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ASYNC HAVE_FC_COMMAND_ARGUMENT
HAVE_FC_COMMAND_LINE HAVE_FC_CONTIGUOUS HAVE_FC_CPUTIME
HAVE_FC_ETIME HAVE_FC_EXIT HAVE_FC_FLUSH
HAVE_FC_GAMMA HAVE_FC_GETENV HAVE_FC_GETPID
HAVE_FC_IEEE_EXCEPTIONS HAVE_FC_IOMSG HAVE_FC_ISO_C_BINDING
HAVE_FC_ISO_FORTRAN_2008 HAVE_FC_LONG_LINES HAVE_FC_MOVE_ALLOC
HAVE_FC_PRIVATE HAVE_FC_PROTECTED HAVE_FC_STREAM_IO
HAVE_FC_SYSTEM HAVE_FFT HAVE_FFT_FFTW3_MKL
HAVE_FFT_MPI HAVE_FFT_SERIAL HAVE_LIBPAW_ABINIT
HAVE_LIBTETRA_ABINIT HAVE_LINALG HAVE_LINALG_AXPBY
HAVE_LINALG_GEMM3M HAVE_LINALG_MKL_IMATCOPY HAVE_LINALG_MKL_OMATADD
HAVE_LINALG_MKL_OMATCOPY HAVE_LINALG_MKL_THREADS HAVE_LINALG_SERIAL
HAVE_MPI HAVE_MPI2 HAVE_MPI_IALLREDUCE
HAVE_MPI_IALLTOALL HAVE_MPI_IALLTOALLV HAVE_MPI_INTEGER16
HAVE_MPI_IO HAVE_MPI_TYPE_CREATE_S... HAVE_OS_LINUX
HAVE_TIMER_ABINIT USE_MACROAVE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Build Information ===
Version : 8.6.1
Build target : x86_64_linux_intel17.0
Build date : 20171130
=== Compiler Suite ===
C compiler : intel17.0
C++ compiler : intel17.0
Fortran compiler : intel17.0
CFLAGS : -mkl
CXXFLAGS : -g -O2 -vec-report0
FCFLAGS : -mkl
FC_LDFLAGS :
=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon
=== Multicore ===
Parallel build : yes
Parallel I/O : yes
openMP support : no
GPU support : no
=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : none
FFT flavor : fftw3-mkl
LINALG flavor : mkl
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : none
=== Experimental features ===
Bindings : @enable_bindings@
Exports : no
GW double-precision : no
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
--- None ---
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
And the input (irrelevant parts abbreviated) is:
Code: Select all
atom 384
ntypat 2
znucl 8 1
typat 1 2 2 ....
ionmov 0
nstep 100
toldfe 1.0d-5
ecut 150 Ry
kptopt 0
nkpt 1
kpt 0 0 0
nband 528
paral_kgb 1
npband 48
npfft 6
fftalg 401
chksymbreak 0
acell 3*15.6626780696 angstrom
nsym 1
symrel 1 0 0 0 1 0 0 0 1
xangst ...
Now, some observations:
- When I try using "fftalg 312", the calculation stops with "FFTW3 support not activated". This should be fixed by including the flags HAVE_FFT_FFTW3 and/or HAVE_FFT_FFTW3_THREADS when compiling, right?
- I tried with "autoparal 3", but it seems to me it is better to set manually npband and npfft. It seem also obvious that they must be such npband*npfft = total no. of cores, at least in this calculation where there are no k-points or spin involved. Am I right or am I missing something obvious?
- You may think that ecut is too high. This is bacause I am making tests to check convergence of the forces wrt the cutoff, which is not yet reached for 150 Ry. From what I know from some other codes the convergence may be improved by applying some smoothing on the density for the xc calculation (e.g. see Appendix in Jonchiere et al. JCP , 135, 154503 (2011)). Is it something similar implemented in abinit?
I will be very thankful for any comment that may help me improve this calculation. Thanks a lot,
D.