[SOLVED] segmentation fault

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

[SOLVED] segmentation fault

Post by hhwj340 » Sun Jun 05, 2011 10:11 am

Hi,Can someone help me?!
Our group recently buy a parallel computer, which has 18 nodes with 216 cores(except the management node,node 19).This machine is
the product of Dawning Company of China. It uses intel cpu,xeon x5660,2.8GHz.
I have compiled abinit6.6.3 sucesefully on this meachine. The configure option is:
./configure --enable-mpi-io --enable-mpi --with-mpi-prefix="/public/soft/mpi/openmpi/1.4.2/icc.ifort/" --enable-64bit-flags --with-fft-flavor="fftw3"
--with-fft-incs="-I/public/libs/fftw3.2.3/include/" --with-fft-libs="-L/public/libs/fftw3.2.3/lib -lfftw3"

when i use the abinit to compute my first job. it works very well,and very fast.
But when I use abinit to compute another job,it doesn't work. I don't think there are any problems in my input files. Because i have computed it on a
small worksattion,DEll T5400( Xeon, 8 cores and 8 G memory).it works,and very slowly.
On the paralell mechine,i always got the fault sented by the PBS,somthing like :
--------------------------------------------------------------------------
mpirun noticed that process rank 28 with PID 14477 on node node6 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I had wrote a very short fortran programm to test "malloc",it could eat 1.5GB memory,and never sented a "segmentation fault". in fact, the memory the
job needs is very small compared with the total memory of each node,24GB.
I don't know how to deal with this problem. :cry:
ps: the system is suse.

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: segmentation fault

Post by mverstra » Sun Jun 05, 2011 9:08 pm

Your segfault is insufficient data to tell you what is wrong. You could also check the actual memory usage with "top" while abinit runs - it could be using too much RAM. Try a small run with lower ecut or nband - if it runs fine there are good chances you are running out of memory. Remember each core uses the amount of memory announced in the log.

Otherwise, look around for other error messages and files. Run abinit in parallel without re-directing output to a log file. Same thing for a batch job: inside the batch file use

mpirun ... abinit < files

instead of the same with > log on the end. This might leave you some additional lines of error messages so you can see what is wrong.

Matthieu
Matthieu Verstraete
University of Liege, Belgium

hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

Re: segmentation fault

Post by hhwj340 » Mon Jun 06, 2011 4:36 am

Dear Matthieu,
according to your suggestion, i try again. The information of my job is:the cell has 43 atoms,and the band is 201.
first ,i use ecut = 50.0 and it really need larger memory. howerver, even change ecut to 1.0,it still stop.
check logfile, i get:
================================================================================
Values of the parameters that define the memory need for DATASET 1.
intxc = 0 ionmov = 2 iscf = 5 xclevel = 2
lmnmax = 4 lnmax = 4 mband = 201 mffmem = 1
P mgfft = 24 mkmem = 9 mpssoang= 4 mpw = 163
mqgrid = 3001 natom = 43 nfft = 3456 nkpt = 108
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 2 n1xccc = 0 ntypat = 2 occopt = 3
================================================================================
P This job should need less than 31.932 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 53.994 Mbytes ; DEN or POT disk file : 0.028 Mbytes.

hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

Re: segmentation fault

Post by hhwj340 » Mon Jun 06, 2011 4:39 am

tthe build information is :
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

=== Build Information ===
Version : 6.6.3
Build target : x86_64_linux_intel11.1
Build date : 20110602

=== Compiler Suite ===
C compiler : gnu4.1
CFLAGS : -m64 -g -O2
C++ compiler : gnu4.1
CXXFLAGS : -m64 -g -O2
Fortran compiler : intel11.1
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc

=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon

=== MPI ===
Parallel build : yes
Parallel I/O : yes
Time tracing : no
GPU support : no

=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback+atompaw-fallback+bigdft-fallback+wannier90-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : netcdf-fallback+etsf_io-fallback

=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : no

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

Re: segmentation fault

Post by hhwj340 » Mon Jun 06, 2011 4:41 am

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O3 -xHost


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:

CC_GNU CXX_GNU FC_INTEL

HAVE_DFT_ATOMPAW HAVE_DFT_BIGDFT HAVE_DFT_LIBXC

HAVE_DFT_WANNIER90 HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ETIME

HAVE_FC_EXIT HAVE_FC_FLUSH HAVE_FC_GETENV

HAVE_FC_GETPID HAVE_FC_ISO_C_BINDING HAVE_FC_NULL

HAVE_MPI HAVE_MPI2 HAVE_MPI_IO

HAVE_OS_LINUX HAVE_STDIO_H HAVE_TIMER

HAVE_TIMER_ABINIT HAVE_TIMER_MPI HAVE_TIMER_SERIAL

HAVE_TRIO_ETSF_IO HAVE_TRIO_NETCDF USE_MACROAVE

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

Re: segmentation fault

Post by hhwj340 » Mon Jun 06, 2011 4:48 am

The first job which i have successfully compute has 4 atoms/cell ,and the band is 60.

hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

Re: segmentation fault

Post by hhwj340 » Tue Jun 07, 2011 3:58 pm

I compile a sequential abinit with --enable-mpi=no.
and found that the sequential abinit can computer the job 2 successfully.
when i use parallel version abinit to do sequential calculation( don't use mpirun), i find the calculation wiil stop at the first iteration.
But for job 1, there is no problem.,all of the abinit version can clculate successfully.
PS:
my system:
SuSe linux Enterprise 11.0; intel ifort 11.1,icc 11.0,intel MKL;openmpi 1.4.2;
The architecture of the computer is cluster( 1 master node and 18 compute node with 2*Xeon 5560, 2.8GHz,12 cores,24 GB RAM);infiniband;
jobs:
job 1: 12 atoms,nband=60,ngkpt(8,8,14),ecut=50.0.
job 2: 43 atoms,nband=201,ngkpt(8,8,6),ecut=5.0(ecut is small,just for test).
I really don't understand what's going on.
help!!

hhwj340
Posts: 20
Joined: Mon Jan 04, 2010 10:42 am

Re: segmentation fault

Post by hhwj340 » Wed Jun 08, 2011 9:51 am

Ha ha!
i have solved the problem!
configuration option add "FCFLAGS="-heap-arrays 64" !

Locked