My KSS calculation ended while writing the WFK file (this is the tail of the .log file) :
Code: Select all
----iterations are completed or convergence reached----
outwf : write wavefunction to file pSSxo_DS1_WFK
-P-0000 leave_test : synchronization done...
-----
The system admin told me that the memory utilization increase dramatically in one processor while writing the file, while other processors of the node remain unutilized (this is normal since I use one proc. per node). This is the error message from the system :
Code: Select all
[[32890,1],0][btl_openib_component.c:2948:handle_wc] from compute-0-6.local to: compute-0-7 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 374141704 opcode 0 vendor error 105 qp_idx 3
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 27682 on
node compute-0-6.local exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
I try to understand what append and how to fix it (since I think this cause problem to my SCR calculation that has an "open g-shell"). I think anybody who know how abinit write the file in question could help.
Thanks,
SL
Calculation made
- on Intel Xeon proc. E5462 quad-cores (3.0 GHz) 16 GB mem/node, requiring 6 nodes and one proc. per node to maximise total memory,
- with abinit-5.9.1 using Tr.-Mart. psp,
- Here is the .in file (without xcoord, znucl, etc.):
Code: Select all
ndtset 1
acell 15.6 16 8.08613283 angstrom
dilatmx 1.20000000E+00
ecut 20 Hartree
kptopt 1
nband 1300
nstep 500
ngkpt 1 1 10
shiftk 0. 0. 0.
nshiftk 1
getden 1
kssform 3
nbandkss 1300
symmorphi 0
istwfk *1
iscf -2
tolwfr 1.0d-5
zcut 0.0037