Hi,
I tried to compile abinit 7.8.1 with intel 13.1.1 (+mkl) and openmpi 1.6.5. When running a serial job it works, when running in parallel I sometimes get a segmentation fault. Then I switched to abinit 7.10.1 -> same problem. Switching from openmpi 1.6.5 to mvapich2 1.8.1 -> same problem.
Configuration for intel 13.1.1 and mvapich2 1.8.1 (with increased log level):
./configure CC=mpicc CXX=mpicxx FCFLAGS_EXTRA="-g -O0 -check all -traceback" --prefix=/cluster/apps/abinit/test7.10.1/x86_64 --enable-debug=naughty --enable-openmp --with-wannier90-bins=/cluster/apps/abinit/test7.10.1/wannier90 --with-wannier90-libs="-L/cluster/apps/abinit/test7.10.1/wannier90 -lwannier" --enable-64bit-flags --enable-mpi --enable-fast-check --enable-mpi-io --with-mpi-prefix="$MPI_ROOT" --with-fft-flavor="fftw3-mpi" --with-fft-incs="-I/cluster/apps/mvapich2/1.8.1/x86_64/intel_13.1.1/include" --with-fft-libs="-L/cluster/apps/mvapich2/1.8.1/x86_64/intel_13.1.1/lib64 -lfftw3 -lfftw3_mpi" --with-dft-flavor="wannier90" --with-timer-flavor="abinit" --with-linalg-flavor="mkl+scalapack" --with-linalg-incs="-I/$MKLROOT/include" --with-linalg-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_sequential -lmkl_core" --enable-optim CFLAGS_OPTIM="-O1" CXXFLAGS_OPTIM="-O1" FCFLAGS_OPTIM="-O1"
Compilation never gave any error. Any idea what the problem could be ?
Logs from the failed run, starting with the part where mpi is started:
mpi_setup@mpi_setup.F90:111 >>>>> ENTER
initmpi_seq@initmpi_seq.F90:65 >>>>> ENTER
initmpi_seq@initmpi_seq.F90:129 >>>>> EXIT
initmpi_img@initmpi_img.F90:85 >>>>> ENTER
initmpi_img@initmpi_img.F90:348 >>>>> EXIT
initmpi_seq@initmpi_seq.F90:65 >>>>> ENTER
initmpi_seq@initmpi_seq.F90:129 >>>>> EXIT
finddistrproc@finddistrproc.F90:144 >>>>> ENTER
kpgcount@m_fftcore.F90:3717 >>>>> ENTER
kpgcount@m_fftcore.F90:3763 >>>>> EXIT
getmpw sequential formula gave: 70247
Computing all possible proc distrib for this input with nproc less than 4
npimage| npkpt| npspinor| npfft| npband| bandpp | nproc| weight|
1 -> 1| 1 -> 4| 1 -> 1| 1 -> 4| 1 -> 4| 1 -> 12| 2 -> 4| 1 -> 4|
1| 4| 1| 1| 1| 1| 4| 3.91 |
1| 2| 1| 2| 1| 1| 4| 3.55 |
1| 2| 1| 1| 2| 1| 4| 3.55 |
1| 1| 1| 4| 1| 1| 4| 3.31 |
1| 1| 1| 2| 2| 1| 4| 3.31 |
Values below have been tested with respect to Linear Algebra performance;
Weights below are corrected according:
npimage| npkpt| npspinor| npfft| npband| bandpp | nproc| weight|new weight|
compute_kgb_indicator@compute_kgb_indicator.F90:95 >>>>> ENTER
compute_kgb_indicator : (bpp,npb,npf) = 1 1 2
init_scalapack@m_slk.F90:401 >>>>> ENTER
build_grid_scalapack@m_slk.F90:266 >>>>> ENTER
build_grid_scalapack@m_slk.F90:283 >>>>> EXIT
build_processor_scalapack@m_slk.F90:331 >>>>> ENTER
build_processor_scalapack@m_slk.F90:353 >>>>> EXIT
init_scalapack@m_slk.F90:410 >>>>> EXIT
init_matrix_scalapack@m_slk.F90:514 >>>>> ENTER
init_matrix_scalapack@m_slk.F90:574 >>>>> EXIT
init_matrix_scalapack@m_slk.F90:514 >>>>> ENTER
init_matrix_scalapack@m_slk.F90:574 >>>>> EXIT
init_matrix_scalapack@m_slk.F90:514 >>>>> ENTER
init_matrix_scalapack@m_slk.F90:574 >>>>> EXIT
Boundary Run-Time Check Failure for variable 'm_slk_mp_compute_generalized_eigen_problem_$RWORK_TMP'
Boundary Run-Time Check Failure for variable 'm_slk_mp_compute_generalized_eigen_problem_$RWORK_TMP'
forrtl: error (76): Abort trap signal
Image PC Routine Line Source
libc.so.6 00002AC271DFE625 Unknown Unknown Unknown
libc.so.6 00002AC271DFFE05 Unknown Unknown Unknown
libirc.so 00002AC271B89D2F Unknown Unknown Unknown
abinit 000000000F9FF976 m_slk_mp_compute_ 2714 m_slk.F90
abinit 000000000FA05FC6 m_slk_mp_compute_ 2977 m_slk.F90
abinit 000000000F77AB42 m_abi_linalg_mp_a 122 abi_xhegv.f90
abinit 000000000F77C248 m_abi_linalg_mp_a 221 abi_xhegv.f90
abinit 00000000098F8EDE compute_kgb_indic 209 compute_kgb_indicator.F90
abinit 000000000963E2E1 finddistrproc_ 793 finddistrproc.F90
abinit 0000000008FD072C mpi_setup_ 213 mpi_setup.F90
abinit 000000000041043D MAIN__ 330 abinit.F90
abinit 000000000040D38C Unknown Unknown Unknown
libc.so.6 00002AC271DEAD5D Unknown Unknown Unknown
abinit 000000000040D289 Unknown Unknown Unknown
forrtl: error (76): Abort trap signal
Image PC Routine Line Source
libc.so.6 00002B3E3F78A625 Unknown Unknown Unknown
libc.so.6 00002B3E3F78BE05 Unknown Unknown Unknown
libirc.so 00002B3E3F515D2F Unknown Unknown Unknown
abinit 000000000F9FF976 m_slk_mp_compute_ 2714 m_slk.F90
abinit 000000000FA05FC6 m_slk_mp_compute_ 2977 m_slk.F90
abinit 000000000F77AB42 m_abi_linalg_mp_a 122 abi_xhegv.f90
abinit 000000000F77C248 m_abi_linalg_mp_a 221 abi_xhegv.f90
abinit 00000000098F8EDE compute_kgb_indic 209 compute_kgb_indicator.F90
abinit 000000000963E2E1 finddistrproc_ 793 finddistrproc.F90
abinit 0000000008FD072C mpi_setup_ 213 mpi_setup.F90
abinit 000000000041043D MAIN__ 330 abinit.F90
abinit 000000000040D38C Unknown Unknown Unknown
libc.so.6 00002B3E3F776D5D Unknown Unknown Unknown
abinit 000000000040D289 Unknown Unknown Unknown
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
mkl13+mvapich2 1.8.1 segmentation fault
Moderators: fgoudreault, mcote
Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Re: mkl13+mvapich2 1.8.1 segmentation fault
If I were you I would try with much less options.
-> remove wannier90
-> remove fft3-mpi and use fft3 from MKL instead
-> remove openmp
-> try without scalapack, just MKL
Once you got something working, add one by one the above features if you need them.
Cheers,
Jordan
-> remove wannier90
-> remove fft3-mpi and use fft3 from MKL instead
-> remove openmp
-> try without scalapack, just MKL
Once you got something working, add one by one the above features if you need them.
Cheers,
Jordan