SEGFAULT in large calculations
Posted: Fri Feb 06, 2015 5:05 pm
Dear ABINIT Community,
I would like to share my experience with running large scale (~150 atoms) ABINIT calculation, a problem encountered and the proposed solution. Of course, you comments and suggestions are highly welcome.
I run a calculation with ABINIT 7.10.2 (latest version) across 32 cores on 2 nodes using MVAPICH2-1.9. The code is compiled with Intel compilers and MKL (details are provided below). The SEGFAULT occurs in 56_xc/rhohxc.F90 line 440
Similar structure, but ~70 atoms works fine. It turns out that the size of nhat is about (2500000,1) in the case of 150 atoms. For 70 atoms it is half of that. I should also mention that the stack size is set to "unlimited". I resolved the problem by replacing this piece of code with a cycle. I had to do the same in 42_libpaw/m_pawdij.F90 Here are details of the code modification.
edit (line 262): .../src/56_xc/rhohxc.F90
edit (line 440): .../src/56_xc/rhohxc.F90
edit (line 219): /gs/project/fhu-132-aa/abinit-7.10.2-mvapich2-intel-dbg/src/42_libpaw/m_pawdij.F90
edit (line 345): /gs/project/fhu-132-aa/abinit-7.10.2-mvapich2-intel-dbg/src/42_libpaw/m_pawdij.F90
It must be related to memory handling, which becomes problematic for large cases. Maybe it is possible to fix the problem at the compilation level without code modification? There are some discussions on stack vs. heap memory (https://software.intel.com/en-us/forums/topic/327647).
Thank you
Oleg
P.S.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Build Information ===
Version : 7.10.2
Build target : x86_64_linux_intel14.0
Build date : 20150205
=== Compiler Suite ===
C compiler : intel14.0
C++ compiler : gnu14.0
Fortran compiler : intel14.0
CFLAGS : -g -O2 -vec-report0
CXXFLAGS : -g -O2 -mtune=native -march=native
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc
=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon
=== Multicore ===
Parallel build : yes
Parallel I/O : auto
openMP support : no
GPU support : no
=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : none
=== Experimental features ===
Bindings : @enable_bindings@
Exports : no
GW double-precision : no
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O2 -xHost
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_INTEL CXX_GNU FC_INTEL
HAVE_DFT_LIBXC HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ASYNC
HAVE_FC_COMMAND_ARGUMENT HAVE_FC_CONTIGUOUS HAVE_FC_CPUTIME
HAVE_FC_ETIME HAVE_FC_EXIT HAVE_FC_FLUSH
HAVE_FC_GAMMA HAVE_FC_GETENV HAVE_FC_GETPID
HAVE_FC_IEEE_EXCEPTIONS HAVE_FC_IOMSG HAVE_FC_ISO_C_BINDING
HAVE_FC_LONG_LINES HAVE_FC_MOVE_ALLOC HAVE_FC_PRIVATE
HAVE_FC_PROTECTED HAVE_FC_STREAM_IO HAVE_FC_SYSTEM
HAVE_LIBPAW_ABINIT HAVE_MPI HAVE_MPI2
HAVE_MPI_IALLREDUCE HAVE_MPI_IALLTOALL HAVE_MPI_IALLTOALLV
HAVE_MPI_IO HAVE_MPI_TYPE_CREATE_S... HAVE_NUMPY
HAVE_OS_LINUX HAVE_TIMER HAVE_TIMER_ABINIT
HAVE_TIMER_MPI HAVE_TIMER_SERIAL USE_MACROAVE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I would like to share my experience with running large scale (~150 atoms) ABINIT calculation, a problem encountered and the proposed solution. Of course, you comments and suggestions are highly welcome.
I run a calculation with ABINIT 7.10.2 (latest version) across 32 cores on 2 nodes using MVAPICH2-1.9. The code is compiled with Intel compilers and MKL (details are provided below). The SEGFAULT occurs in 56_xc/rhohxc.F90 line 440
Code: Select all
rhor_(:,:)=rhor(:,:)-nhat(:,:)
Similar structure, but ~70 atoms works fine. It turns out that the size of nhat is about (2500000,1) in the case of 150 atoms. For 70 atoms it is half of that. I should also mention that the stack size is set to "unlimited". I resolved the problem by replacing this piece of code with a cycle. I had to do the same in 42_libpaw/m_pawdij.F90 Here are details of the code modification.
edit (line 262): .../src/56_xc/rhohxc.F90
Code: Select all
!Local variables-------------------------------
!scalars
...
integer :: jfft, jspin ! Oleg added
...
edit (line 440): .../src/56_xc/rhohxc.F90
Code: Select all
...
! rhor_(:,:)=rhor(:,:)-nhat(:,:) ! there is a segfault here
do jspin = 1, nspden ! Oleg added begin
do jfft = 1, nfft
rhor_(jfft,jspin)=rhor(jfft,jspin)-nhat(jfft,jspin)
end do
end do ! Oleg added end
...
edit (line 219): /gs/project/fhu-132-aa/abinit-7.10.2-mvapich2-intel-dbg/src/42_libpaw/m_pawdij.F90
Code: Select all
!Local variables ---------------------------------------
!scalars
...
integer :: ioleg, joleg ! Oleg aded
...
edit (line 345): /gs/project/fhu-132-aa/abinit-7.10.2-mvapich2-intel-dbg/src/42_libpaw/m_pawdij.F90
Code: Select all
! v_dijhat=vtrial-vxc ! Segfault
do joleg = 1, size(vxc,2) ! Oleg added start
do ioleg = 1, size(vxc,1)
v_dijhat(ioleg,joleg) = vtrial(ioleg,joleg) - vxc(ioleg,joleg)
end do
end do ! Oleg added end
...
It must be related to memory handling, which becomes problematic for large cases. Maybe it is possible to fix the problem at the compilation level without code modification? There are some discussions on stack vs. heap memory (https://software.intel.com/en-us/forums/topic/327647).
Thank you
Oleg
P.S.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Build Information ===
Version : 7.10.2
Build target : x86_64_linux_intel14.0
Build date : 20150205
=== Compiler Suite ===
C compiler : intel14.0
C++ compiler : gnu14.0
Fortran compiler : intel14.0
CFLAGS : -g -O2 -vec-report0
CXXFLAGS : -g -O2 -mtune=native -march=native
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc
=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon
=== Multicore ===
Parallel build : yes
Parallel I/O : auto
openMP support : no
GPU support : no
=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : none
=== Experimental features ===
Bindings : @enable_bindings@
Exports : no
GW double-precision : no
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O2 -xHost
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_INTEL CXX_GNU FC_INTEL
HAVE_DFT_LIBXC HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ASYNC
HAVE_FC_COMMAND_ARGUMENT HAVE_FC_CONTIGUOUS HAVE_FC_CPUTIME
HAVE_FC_ETIME HAVE_FC_EXIT HAVE_FC_FLUSH
HAVE_FC_GAMMA HAVE_FC_GETENV HAVE_FC_GETPID
HAVE_FC_IEEE_EXCEPTIONS HAVE_FC_IOMSG HAVE_FC_ISO_C_BINDING
HAVE_FC_LONG_LINES HAVE_FC_MOVE_ALLOC HAVE_FC_PRIVATE
HAVE_FC_PROTECTED HAVE_FC_STREAM_IO HAVE_FC_SYSTEM
HAVE_LIBPAW_ABINIT HAVE_MPI HAVE_MPI2
HAVE_MPI_IALLREDUCE HAVE_MPI_IALLTOALL HAVE_MPI_IALLTOALLV
HAVE_MPI_IO HAVE_MPI_TYPE_CREATE_S... HAVE_NUMPY
HAVE_OS_LINUX HAVE_TIMER HAVE_TIMER_ABINIT
HAVE_TIMER_MPI HAVE_TIMER_SERIAL USE_MACROAVE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++