Hi Forum,
I am using abinit 6.0.3 and openmpi 1.4.1 (both compiled with ifort 11) for parallel calculations of a large molecule near an Au(111) surface. However, I have been noticing some problems and was hoping to get advice.
For one thing I have noticed that sometimes I will start a parallel job and it will make it through about 100 scf cycles and then crash and I will run the exact same job again and it will crash after only 10 scf cycles. Any idea what the problem could be in this case?
As a more specific problem, I have recently been trying to obtain the WFK file for a large molecule near an au(111) surface for use with the stm capabilities of abinit... but my parallel runs on 130 processors keep crashing after only a few scf interations. I have attached the input file and the tail of the log file. I also include the error message sent by LSF below:
Job <pam -g 1 mympirun_wrapper abinit < gold4x5whybrid.files >& log> was submitted from host <simes0001> by user <asorini>.
Job was executed on host(s) <8*simes0011>, in queue <simesq>, as user <asorini>.
<8*simes0044>
<8*simes0032>
<8*simes0039>
<8*simes0053>
<8*simes0018>
<8*simes0064>
<8*simes0008>
<8*simes0014>
<8*simes0063>
<8*simes0048>
<8*simes0002>
<8*simes0059>
<8*simes0030>
<8*simes0022>
<8*simes0020>
<2*simes0027>
</u/xl/asorini> was used as the home directory.
</nfs/slac/g/simes/asorini/Gold4x5wHybrid/lessgoldlayers/STM_Stuff/again> was used as the working directory.
Started at Tue May 25 20:20:17 2010
Results reported at Wed May 26 10:16:10 2010
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
pam -g 1 mympirun_wrapper abinit < gold4x5whybrid.files >& log
------------------------------------------------------------
Exited with exit code 134.
Resource usage summary:
CPU time :14710985.00 sec.
Max Memory : 253466 MB
Max Swap : 561752 MB
Max Processes : 130
Max Threads : 130
The output (if any) follows:
/nfs/farm/lsb_spool/1274839309.199744: line 8: 10628 Aborted (core dumped) pam -g 1 mympirun_wrapper abinit <gold4x5whybrid.files >&log
Any help with this would be very much appreciated. Cheers,
Adam
paral_kgb crash on 130 processors
Moderator: bguster
paral_kgb crash on 130 processors
- Attachments
-
- gold4x5whybrid.in
- input file
- (8.52 KiB) Downloaded 379 times
-
- log.log
- log file
- (47.86 KiB) Downloaded 357 times