[SOLVED] segmentation fault
Moderators: fgoudreault, mcote
Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
[SOLVED] segmentation fault
Hi,Can someone help me?!
Our group recently buy a parallel computer, which has 18 nodes with 216 cores(except the management node,node 19).This machine is
the product of Dawning Company of China. It uses intel cpu,xeon x5660,2.8GHz.
I have compiled abinit6.6.3 sucesefully on this meachine. The configure option is:
./configure --enable-mpi-io --enable-mpi --with-mpi-prefix="/public/soft/mpi/openmpi/1.4.2/icc.ifort/" --enable-64bit-flags --with-fft-flavor="fftw3"
--with-fft-incs="-I/public/libs/fftw3.2.3/include/" --with-fft-libs="-L/public/libs/fftw3.2.3/lib -lfftw3"
when i use the abinit to compute my first job. it works very well,and very fast.
But when I use abinit to compute another job,it doesn't work. I don't think there are any problems in my input files. Because i have computed it on a
small worksattion,DEll T5400( Xeon, 8 cores and 8 G memory).it works,and very slowly.
On the paralell mechine,i always got the fault sented by the PBS,somthing like :
--------------------------------------------------------------------------
mpirun noticed that process rank 28 with PID 14477 on node node6 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I had wrote a very short fortran programm to test "malloc",it could eat 1.5GB memory,and never sented a "segmentation fault". in fact, the memory the
job needs is very small compared with the total memory of each node,24GB.
I don't know how to deal with this problem.
ps: the system is suse.
Our group recently buy a parallel computer, which has 18 nodes with 216 cores(except the management node,node 19).This machine is
the product of Dawning Company of China. It uses intel cpu,xeon x5660,2.8GHz.
I have compiled abinit6.6.3 sucesefully on this meachine. The configure option is:
./configure --enable-mpi-io --enable-mpi --with-mpi-prefix="/public/soft/mpi/openmpi/1.4.2/icc.ifort/" --enable-64bit-flags --with-fft-flavor="fftw3"
--with-fft-incs="-I/public/libs/fftw3.2.3/include/" --with-fft-libs="-L/public/libs/fftw3.2.3/lib -lfftw3"
when i use the abinit to compute my first job. it works very well,and very fast.
But when I use abinit to compute another job,it doesn't work. I don't think there are any problems in my input files. Because i have computed it on a
small worksattion,DEll T5400( Xeon, 8 cores and 8 G memory).it works,and very slowly.
On the paralell mechine,i always got the fault sented by the PBS,somthing like :
--------------------------------------------------------------------------
mpirun noticed that process rank 28 with PID 14477 on node node6 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I had wrote a very short fortran programm to test "malloc",it could eat 1.5GB memory,and never sented a "segmentation fault". in fact, the memory the
job needs is very small compared with the total memory of each node,24GB.
I don't know how to deal with this problem.
ps: the system is suse.
Re: segmentation fault
Your segfault is insufficient data to tell you what is wrong. You could also check the actual memory usage with "top" while abinit runs - it could be using too much RAM. Try a small run with lower ecut or nband - if it runs fine there are good chances you are running out of memory. Remember each core uses the amount of memory announced in the log.
Otherwise, look around for other error messages and files. Run abinit in parallel without re-directing output to a log file. Same thing for a batch job: inside the batch file use
mpirun ... abinit < files
instead of the same with > log on the end. This might leave you some additional lines of error messages so you can see what is wrong.
Matthieu
Otherwise, look around for other error messages and files. Run abinit in parallel without re-directing output to a log file. Same thing for a batch job: inside the batch file use
mpirun ... abinit < files
instead of the same with > log on the end. This might leave you some additional lines of error messages so you can see what is wrong.
Matthieu
Matthieu Verstraete
University of Liege, Belgium
University of Liege, Belgium
Re: segmentation fault
Dear Matthieu,
according to your suggestion, i try again. The information of my job is:the cell has 43 atoms,and the band is 201.
first ,i use ecut = 50.0 and it really need larger memory. howerver, even change ecut to 1.0,it still stop.
check logfile, i get:
================================================================================
Values of the parameters that define the memory need for DATASET 1.
intxc = 0 ionmov = 2 iscf = 5 xclevel = 2
lmnmax = 4 lnmax = 4 mband = 201 mffmem = 1
P mgfft = 24 mkmem = 9 mpssoang= 4 mpw = 163
mqgrid = 3001 natom = 43 nfft = 3456 nkpt = 108
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 2 n1xccc = 0 ntypat = 2 occopt = 3
================================================================================
P This job should need less than 31.932 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 53.994 Mbytes ; DEN or POT disk file : 0.028 Mbytes.
according to your suggestion, i try again. The information of my job is:the cell has 43 atoms,and the band is 201.
first ,i use ecut = 50.0 and it really need larger memory. howerver, even change ecut to 1.0,it still stop.
check logfile, i get:
================================================================================
Values of the parameters that define the memory need for DATASET 1.
intxc = 0 ionmov = 2 iscf = 5 xclevel = 2
lmnmax = 4 lnmax = 4 mband = 201 mffmem = 1
P mgfft = 24 mkmem = 9 mpssoang= 4 mpw = 163
mqgrid = 3001 natom = 43 nfft = 3456 nkpt = 108
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 2 n1xccc = 0 ntypat = 2 occopt = 3
================================================================================
P This job should need less than 31.932 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 53.994 Mbytes ; DEN or POT disk file : 0.028 Mbytes.
Re: segmentation fault
tthe build information is :
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Build Information ===
Version : 6.6.3
Build target : x86_64_linux_intel11.1
Build date : 20110602
=== Compiler Suite ===
C compiler : gnu4.1
CFLAGS : -m64 -g -O2
C++ compiler : gnu4.1
CXXFLAGS : -m64 -g -O2
Fortran compiler : intel11.1
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc
=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon
=== MPI ===
Parallel build : yes
Parallel I/O : yes
Time tracing : no
GPU support : no
=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback+atompaw-fallback+bigdft-fallback+wannier90-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : netcdf-fallback+etsf_io-fallback
=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : no
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Build Information ===
Version : 6.6.3
Build target : x86_64_linux_intel11.1
Build date : 20110602
=== Compiler Suite ===
C compiler : gnu4.1
CFLAGS : -m64 -g -O2
C++ compiler : gnu4.1
CXXFLAGS : -m64 -g -O2
Fortran compiler : intel11.1
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc
=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon
=== MPI ===
Parallel build : yes
Parallel I/O : yes
Time tracing : no
GPU support : no
=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback+atompaw-fallback+bigdft-fallback+wannier90-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : netcdf-fallback+etsf_io-fallback
=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : no
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: segmentation fault
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O3 -xHost
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_GNU CXX_GNU FC_INTEL
HAVE_DFT_ATOMPAW HAVE_DFT_BIGDFT HAVE_DFT_LIBXC
HAVE_DFT_WANNIER90 HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ETIME
HAVE_FC_EXIT HAVE_FC_FLUSH HAVE_FC_GETENV
HAVE_FC_GETPID HAVE_FC_ISO_C_BINDING HAVE_FC_NULL
HAVE_MPI HAVE_MPI2 HAVE_MPI_IO
HAVE_OS_LINUX HAVE_STDIO_H HAVE_TIMER
HAVE_TIMER_ABINIT HAVE_TIMER_MPI HAVE_TIMER_SERIAL
HAVE_TRIO_ETSF_IO HAVE_TRIO_NETCDF USE_MACROAVE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O3 -xHost
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_GNU CXX_GNU FC_INTEL
HAVE_DFT_ATOMPAW HAVE_DFT_BIGDFT HAVE_DFT_LIBXC
HAVE_DFT_WANNIER90 HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ETIME
HAVE_FC_EXIT HAVE_FC_FLUSH HAVE_FC_GETENV
HAVE_FC_GETPID HAVE_FC_ISO_C_BINDING HAVE_FC_NULL
HAVE_MPI HAVE_MPI2 HAVE_MPI_IO
HAVE_OS_LINUX HAVE_STDIO_H HAVE_TIMER
HAVE_TIMER_ABINIT HAVE_TIMER_MPI HAVE_TIMER_SERIAL
HAVE_TRIO_ETSF_IO HAVE_TRIO_NETCDF USE_MACROAVE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: segmentation fault
The first job which i have successfully compute has 4 atoms/cell ,and the band is 60.
Re: segmentation fault
I compile a sequential abinit with --enable-mpi=no.
and found that the sequential abinit can computer the job 2 successfully.
when i use parallel version abinit to do sequential calculation( don't use mpirun), i find the calculation wiil stop at the first iteration.
But for job 1, there is no problem.,all of the abinit version can clculate successfully.
PS:
my system:
SuSe linux Enterprise 11.0; intel ifort 11.1,icc 11.0,intel MKL;openmpi 1.4.2;
The architecture of the computer is cluster( 1 master node and 18 compute node with 2*Xeon 5560, 2.8GHz,12 cores,24 GB RAM);infiniband;
jobs:
job 1: 12 atoms,nband=60,ngkpt(8,8,14),ecut=50.0.
job 2: 43 atoms,nband=201,ngkpt(8,8,6),ecut=5.0(ecut is small,just for test).
I really don't understand what's going on.
help!!
and found that the sequential abinit can computer the job 2 successfully.
when i use parallel version abinit to do sequential calculation( don't use mpirun), i find the calculation wiil stop at the first iteration.
But for job 1, there is no problem.,all of the abinit version can clculate successfully.
PS:
my system:
SuSe linux Enterprise 11.0; intel ifort 11.1,icc 11.0,intel MKL;openmpi 1.4.2;
The architecture of the computer is cluster( 1 master node and 18 compute node with 2*Xeon 5560, 2.8GHz,12 cores,24 GB RAM);infiniband;
jobs:
job 1: 12 atoms,nband=60,ngkpt(8,8,14),ecut=50.0.
job 2: 43 atoms,nband=201,ngkpt(8,8,6),ecut=5.0(ecut is small,just for test).
I really don't understand what's going on.
help!!
Re: segmentation fault
Ha ha!
i have solved the problem!
configuration option add "FCFLAGS="-heap-arrays 64" !
i have solved the problem!
configuration option add "FCFLAGS="-heap-arrays 64" !