Posted: Sat Sep 05, 2020 11:10 am
by minyez
Hi everyone,
Greetings. I am having some problem in testing the abinit-8.10.3 built with Intel 2019 Update 3. I am wondering if these failed cases will make further practical use less legitimate. It will be very appreciated if some one could help me resolve the problem.

The packages used in building abinit are netCDF (4.7.4), netCDF-Fortran (4.5.3), LibXC (3.0.1), MKL (within Intel 19.3) FFTW3 (3.3.8) and Wannier90 (1.2). They are all built with the same ifort 19 compiler. Here is the configure command used

./configure --prefix="/opt/software/abinit/8.10.3/intel-2019.3" \
    CXX=mpiicpc FC=mpiifort CC=mpiicc MPI_RUNNER=mpirun \
    --enable-64-bit-flags \
    --with-mpi-incs="-I$MKLROOT/../mpi/intel64/include" --with-mpi-libs="-L$MKLROOT/../mpi/intel64/lib -lmpi" \
    --with-linalg-flavor="mkl+scalapack" --with-linalg-incs=-I$MKLROOT/include/intel64/lp64 \
    --with-linalg-libs="-L$MKLROOT/lib/intel64 -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_scalapack_lp64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -lpthread -lm -ldl" \
    --enable-gw-dpc  --with-fft-flavor="fftw3" --with-fft-libs="-L$FFTW3DIR/lib -lfftw3_mpi -lfftw3" --with-fft-incs=-I$FFTW3DIR/include/ \
    --with-dft-flavor="atompaw+libxc+wannier90" --with-libxc-incs="-I$LIBXCDIR/include/" --with-libxc-libs="-L$LIBXCDIR/lib -lxcf90 -lxc" \
    --with-trio-flavor="netcdf" --with-netcdf-incs="-I$NETCDF_HOME/include/ -I$NETCDF_F_HOME/include/" --with-netcdf-libs="-L$NETCDF_HOME/lib -L$NETCDF_F_HOME/lib -lnetcdff -lnetcdf" \
    --with-wannier-libs="$WANNIER90_HOME/libwannier90.a" --with-wannier90-bins="$WANNIER90_HOME/" --with-wannier90-incs="-I$WANNIER90_HOME/src"
Compilation succeded. Tests were run shortly after compilation to check the build, with the command as -j8 --force-mpi. The overall report is below.

Suite            failed  passed  succeeded  skipped  disabled  run_etime  tot_etime
atompaw               2       0          0        0         0      12.26      12.48
bigdft                0       0          0       23         0       0.00       0.38
bigdft_paral          0       0          0        4         0       0.00       0.05
built-in              0       0          6        1         0       7.47       7.62
etsf_io               0       0          8        0         0      25.86      26.24
fast                  0       1         10        0         0      42.74      44.12
gpu                   0       0          0        7         0       0.00       0.01
libxc                 0       7         28        0         0     502.62     506.75
mpiio                 0       0          1       17         0       1.84       1.89
paral                 0       5         22       89         0     552.69     556.70
psml                  0       0          0       14         0       0.00       0.02
seq                   0       0          0       18         0       0.00       0.01
tutomultibinit        0       0          3        0         0      38.94      41.86
tutoparal             0       0          1        5         0       0.78       0.81
tutoplugs             0       4          0        0         0      13.25      13.42
tutorespfn            0       6         20        0         0    1368.07    1373.88
tutorial              0      13         46        0         0    1557.80    1562.85
unitary               2       0         19       13         0      96.64      97.32
v1                    0       1         75        0         0     673.91     679.98
v2                    0      14         65        0         0     939.31     945.70
v3                    1      13         64        0         0     707.81     717.84
v4                    0      11         50        0         0     683.25     691.34
v5                    0      13         60        0         0    1199.20    1213.47
v6                    1       8         52        0         0     771.25     781.04
v67mbpt               1       7         17        0         0     376.32     382.42
v7                    2      10         52        0         0    1587.33    1600.73
v8                    3       8         35        2         0    1567.68    1575.06
vdwxc                 0       0          1        0         0      14.04      14.09
wannier90             0       7          0        0         0      32.56      32.94

Completed in 1810.95 [s]. Average time for test=16.48 [s], stdev=29.64 [s]
Summary: failed=12, succeeded=635, passed=128, skipped=193, disabled=0
The 12 failed tests are

Since I can't upload the HTML report files, the stderr is briefly pasted below

1. atompaw
[atompaw][t02][np=1]: failed: erroneous lines 28 > 25 [file=t02.out]
[atompaw][t04][np=1]: failed: absolute error 0.015 > 0.011 [file=t04.out]

2. unitary
Both tfftfftw3_01 and tfftfftw3_02 tests have the following in the stderr

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source 000014B4C75E2522 for__signal_handl Unknown Unknown
libpthread-2.28.s 000014B4C9242070 Unknown Unknown Unknown
fftprof 00000000006F1DFA Unknown Unknown Unknown
fftprof 00000000006F0501 fftw_destroy_plan Unknown Unknown
fftprof 000000000047CC1B Unknown Unknown Unknown
fftprof 00000000004354D4 Unknown Unknown Unknown
fftprof 000000000043063E Unknown Unknown Unknown
fftprof 000000000040BF12 Unknown Unknown Unknown
fftprof 0000000000409022 Unknown Unknown Unknown 000014B4C51BE413 __libc_start_main Unknown Unknown
fftprof 0000000000408F2E Unknown Unknown Unknown
3. v3
In t88 and t89, it reports "The diff analysis cannot be pursued: the leading characters differ."

4. v7
t43 failed with absolute error 10.0 > 0.0011
t87 failed with "The diff analysis cannot be pursued: the leading characters differ."

5. v8
t43 and t63 failed with "The diff analysis cannot be pursued: the leading characters differ."
t45 failed with erroneous lines 35 > 33

Re: failed 12 tests

Posted: Mon Sep 07, 2020 9:59 am
by minyez
1. after using atompaw instead of fallback atompaw, the atompaw errors disappears.
2. I found error information about v6-t45 was missing in the post. make check reports the error as

--- !ERROR
src_file: m_energy.F90
src_line: 465
mpi_rank: 0
message: |

BUG: Migdal energy and LDA+U energy do not conincide 0.96416777E+01 0.00000000E+00nlda

Re: failed 12 tests

Posted: Tue Sep 08, 2020 10:04 pm
by gmatteo
The packages used in building abinit are netCDF MKL (within Intel 19.3) FFTW3 (3.3.8)
Avoid mixing FFTW3 with MKL as MKL provides wrappers for fftw3 calling the intel DFTI FFT library.
This means that the linker will receive multiple definitions for the same procedure and
results are unpredictable!
Either use MKL for linalg and fft or FFTW3 with e.g. openblas for linalg.

This should explain the segmentation fault in tfftfftw3_01 and tfftfftw3_02.
The errors in atompaw do not seem to be critical, perhaps it's just numerical noise.

The other tests require some inspection of the output files with e.g. vimdiff to understand if the diffs are physically significant.

Also note that there are tests that are expected to fail with intel19.
The test info section has an `exclude_builders` section we use to disable the test for particular compilers/machines.

$ grep intel ../tests/v9/Input/*
../tests/v9/Input/ exclude_builders = .*_intel_19.0_.*, .*_nag_7.0_.*, buda2_intel_17.0_openmpi
../tests/v9/Input/ Temporarily disabled on higgs_intel_19.0_serial for strange segfault
../tests/v9/Input/ exclude_builders = .*_intel_19.0_.*, .*_nag_7.0_.*, buda2_intel_17.0_openmpi
../tests/v9/Input/ Temporarily disabled on higgs_intel_19.0_serial for strange segfault