Page 1 of 1

[SOLVED] segmentation fault

Posted: Sun Jun 05, 2011 10:11 am
by hhwj340
Hi,Can someone help me?!
Our group recently buy a parallel computer, which has 18 nodes with 216 cores(except the management node,node 19).This machine is
the product of Dawning Company of China. It uses intel cpu,xeon x5660,2.8GHz.
I have compiled abinit6.6.3 sucesefully on this meachine. The configure option is:
./configure --enable-mpi-io --enable-mpi --with-mpi-prefix="/public/soft/mpi/openmpi/1.4.2/icc.ifort/" --enable-64bit-flags --with-fft-flavor="fftw3"
--with-fft-incs="-I/public/libs/fftw3.2.3/include/" --with-fft-libs="-L/public/libs/fftw3.2.3/lib -lfftw3"

when i use the abinit to compute my first job. it works very well,and very fast.
But when I use abinit to compute another job,it doesn't work. I don't think there are any problems in my input files. Because i have computed it on a
small worksattion,DEll T5400( Xeon, 8 cores and 8 G memory).it works,and very slowly.
On the paralell mechine,i always got the fault sented by the PBS,somthing like :
--------------------------------------------------------------------------
mpirun noticed that process rank 28 with PID 14477 on node node6 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I had wrote a very short fortran programm to test "malloc",it could eat 1.5GB memory,and never sented a "segmentation fault". in fact, the memory the
job needs is very small compared with the total memory of each node,24GB.
I don't know how to deal with this problem. :cry:
ps: the system is suse.

Re: segmentation fault

Posted: Sun Jun 05, 2011 9:08 pm
by mverstra
Your segfault is insufficient data to tell you what is wrong. You could also check the actual memory usage with "top" while abinit runs - it could be using too much RAM. Try a small run with lower ecut or nband - if it runs fine there are good chances you are running out of memory. Remember each core uses the amount of memory announced in the log.

Otherwise, look around for other error messages and files. Run abinit in parallel without re-directing output to a log file. Same thing for a batch job: inside the batch file use

mpirun ... abinit < files

instead of the same with > log on the end. This might leave you some additional lines of error messages so you can see what is wrong.

Matthieu

Re: segmentation fault

Posted: Mon Jun 06, 2011 4:36 am
by hhwj340
Dear Matthieu,
according to your suggestion, i try again. The information of my job is:the cell has 43 atoms,and the band is 201.
first ,i use ecut = 50.0 and it really need larger memory. howerver, even change ecut to 1.0,it still stop.
check logfile, i get:
================================================================================
Values of the parameters that define the memory need for DATASET 1.
intxc = 0 ionmov = 2 iscf = 5 xclevel = 2
lmnmax = 4 lnmax = 4 mband = 201 mffmem = 1
P mgfft = 24 mkmem = 9 mpssoang= 4 mpw = 163
mqgrid = 3001 natom = 43 nfft = 3456 nkpt = 108
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 2 n1xccc = 0 ntypat = 2 occopt = 3
================================================================================
P This job should need less than 31.932 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 53.994 Mbytes ; DEN or POT disk file : 0.028 Mbytes.

Re: segmentation fault

Posted: Mon Jun 06, 2011 4:39 am
by hhwj340
tthe build information is :
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

=== Build Information ===
Version : 6.6.3
Build target : x86_64_linux_intel11.1
Build date : 20110602

=== Compiler Suite ===
C compiler : gnu4.1
CFLAGS : -m64 -g -O2
C++ compiler : gnu4.1
CXXFLAGS : -m64 -g -O2
Fortran compiler : intel11.1
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc

=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon

=== MPI ===
Parallel build : yes
Parallel I/O : yes
Time tracing : no
GPU support : no

=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback+atompaw-fallback+bigdft-fallback+wannier90-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : netcdf-fallback+etsf_io-fallback

=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : no

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: segmentation fault

Posted: Mon Jun 06, 2011 4:41 am
by hhwj340
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O3 -xHost


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:

CC_GNU CXX_GNU FC_INTEL

HAVE_DFT_ATOMPAW HAVE_DFT_BIGDFT HAVE_DFT_LIBXC

HAVE_DFT_WANNIER90 HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ETIME

HAVE_FC_EXIT HAVE_FC_FLUSH HAVE_FC_GETENV

HAVE_FC_GETPID HAVE_FC_ISO_C_BINDING HAVE_FC_NULL

HAVE_MPI HAVE_MPI2 HAVE_MPI_IO

HAVE_OS_LINUX HAVE_STDIO_H HAVE_TIMER

HAVE_TIMER_ABINIT HAVE_TIMER_MPI HAVE_TIMER_SERIAL

HAVE_TRIO_ETSF_IO HAVE_TRIO_NETCDF USE_MACROAVE

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: segmentation fault

Posted: Mon Jun 06, 2011 4:48 am
by hhwj340
The first job which i have successfully compute has 4 atoms/cell ,and the band is 60.

Re: segmentation fault

Posted: Tue Jun 07, 2011 3:58 pm
by hhwj340
I compile a sequential abinit with --enable-mpi=no.
and found that the sequential abinit can computer the job 2 successfully.
when i use parallel version abinit to do sequential calculation( don't use mpirun), i find the calculation wiil stop at the first iteration.
But for job 1, there is no problem.,all of the abinit version can clculate successfully.
PS:
my system:
SuSe linux Enterprise 11.0; intel ifort 11.1,icc 11.0,intel MKL;openmpi 1.4.2;
The architecture of the computer is cluster( 1 master node and 18 compute node with 2*Xeon 5560, 2.8GHz,12 cores,24 GB RAM);infiniband;
jobs:
job 1: 12 atoms,nband=60,ngkpt(8,8,14),ecut=50.0.
job 2: 43 atoms,nband=201,ngkpt(8,8,6),ecut=5.0(ecut is small,just for test).
I really don't understand what's going on.
help!!

Re: segmentation fault

Posted: Wed Jun 08, 2011 9:51 am
by hhwj340
Ha ha!
i have solved the problem!
configuration option add "FCFLAGS="-heap-arrays 64" !