Problem with t41.file test

sdwang · Post by **sdwang** » Fri Sep 24, 2010 9:56 am

Dear:
When I performed the test/t41.file,there exits error as follows:

*** glibc detected *** free(): invalid pointer: 0x0000002a99d35010 ***
p0_15043: p4_error: interrupt SIGx: 6
forrtl: error (69): process interrupted (SIGINT)
rm_l_1_29792: (2.921875) net_send: could not write to fd=5, errno = 32
forrtl: error (69): process interrupted (SIGINT)
p0_15043: (5.964844) net_send: could not write to fd=4, errno = 32

What is the problem?
SDwang

mverstra · Post by **mverstra** » Mon Oct 11, 2010 12:18 pm

1) read the nettiquette in viewtopic.php?f=20&t=251
2) this is probably linked to your build of abinit and in particular p4_error probably means a parallelization error.

You really can't expect us to explain your crash on so little information...

Matthieu

sdwang · Post by **sdwang** » Tue Oct 12, 2010 5:41 am

I have tested paralell calculation in ./tests/tparal_1.in, but it stops at:
================================================================================

getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 36 36 36
ecut(hartree)= 30.000 => boxcut(ratio)= 2.06487
scfcv : before setvtr, energies%e_hartree= 0.000000000000000E+000

ewald : nr and ng are 3 and 11
mklocl_recipspace : will add potential with strength vprtrb(:)=
0.000000000000000E+000 0.000000000000000E+000
setvtr : istep,n1xccc,moved_rhor= 1 0 0
scfcv : after setvtr, energies%e_hartree= 0.000000000000000E+000

ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
p1_32116: p4_error: interrupt SIGSEGV: 11
p0_32111: p4_error: interrupt SIGSEGV: 11
forrtl: error (69): process interrupted (SIGINT)
rm_l_1_32174: (2.261719) net_send: could not write to fd=5, errno = 32
p1_32116: (2.261719) net_send: could not write to fd=5, errno = 32
p0_32111: (4.523438) net_send: could not write to fd=4, errno = 32

I do not kown why? In Below is part of my log file.
=== Build Information ===
Version : 6.2.2
Build target : x86_64_linux_intel9.0
Build date : 20101012

=== Compiler Suite ===
C compiler : gnu3.4
CFLAGS : -g -O3 -fschedule-insns2 -march=nocona -mmmx -msse -msse2 -msse3 -mfpmath=sse
C++ compiler : gnu3.4
CXXFLAGS : -g -O3 -fschedule-insns2 -march=nocona -mmmx -msse -msse2 -msse3 -mfpmath=sse
Fortran compiler : intel9.0
FCFLAGS : -g -extend-source -vec-report0
FC_LDFLAGS : -static-libgcc -static-intel

=== Optimizations ===
Debug level : yes
Optimization level : standard
Architecture : intel_xeon

=== MPI ===
Parallel build : yes
Parallel I/O : yes

=== Linear algebra ===
Library flavor : @linalg_flavor@
Use ScaLAPACK : no

=== Plug-ins ===
BigDFT : no
ETSF I/O : no
LibXC : no
FoX : no
NetCDF : no
Wannier90 : no

=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : no
Macroave build : yes

mverstra · Post by **mverstra** » Sat Oct 16, 2010 11:53 am

your code is segfaulting, but there's no way to tell why at this distance. These input files have run on dozens of reference architectures every night for years, so the problem is with your build, hardware, or you have modified the input file. Your compilers are quite old, but this should not be the problem.
- Check your parallel mpif90/mpicc is correctly compiled with the same versions of the compilers.
- Compile without optimizations or (first) run under a debugger:

* read the gdb manual or a howto

* mpirun -np 4 abinit < etc.etc.etc. > &

* top gives you the pid for the instances of abinit, then you can run

* gdb $ABINITPATH/abinit <pid1>

* inside gdb, type cont to continue execution, and see where it crashes.

Also, does it run sequentially?

matthieu

Naina · Post by **Naina** » Tue Sep 27, 2016 1:29 pm

Hi,

I am running into similar error and I am not able to figure out why my jobs are crashing. Any help will be greatly appreciated.

Requested basis set is non-standard
Compound shells will be simplified
There are 30 shells and 82 basis functions
A cutoff of 1.0D-12 yielded 442 shell pairs
There are 3388 function pairs ( 4202 Cartesian)
Smallest overlap matrix eigenvalue = 4.51E-03
p0_947: p4_error: interrupt SIGSEGV: 11

Below is how my qchem input looks like:
$molecule
0 5
S
Fe 1 2.030996
$end

$rem
BASIS gen
ECP gen
EXCHANGE PBE
CORRELATION PBE
MAX_SCF_CYCLES 200
SCF_ALGORITHM DIIS_GDM
INCDFT FALSE
VARTHRESH FALSE
SYMMETRY FALSE
JOBTYPE freq
MEM_TOTAL = 4000
MEM_STATIC = 256
$end

ABINIT Discussion Forums

Problem with t41.file test

Problem with t41.file test

Re: Problem with t41.file test

killed paralell run

Re: Problem with t41.file test

Re: Problem with t41.file test