Job exiting after P000

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
JEJohns
Posts: 55
Joined: Sun May 02, 2010 5:30 pm

Job exiting after P000

Post by JEJohns » Thu Aug 05, 2010 11:15 pm

While running abinit, I'm consistently getting jobs which exit early on larger jobs (either during the first iteration or beforehand) just after the following message.

"-P-0000 leave_test : synchronization done...
-P-0000 leave_test : exiting..."

This happened when I went from a 2x2 unit cell of graphene to a 3x3 unit cell of graphene, and when I increased the vacuum spacing on a silver slab to a large unit cell. I would have thought it was due to a memory usage issue, but it doesn't go away if I increase the number of nodes & processors (& thus the available memory). Am I doing something dumb? I've attached the input, output, and log files for the graphene unit cell where this happens (labelled as graphod.* for my own personal naming reasons). I've queued this job on 2 nodes with 20GB on each node.
Attachments
graphod.in
(871 Bytes) Downloaded 228 times
graphod.log
(16.9 KiB) Downloaded 241 times
graphod.out
(3.05 KiB) Downloaded 210 times

JEJohns
Posts: 55
Joined: Sun May 02, 2010 5:30 pm

Re: Job exiting after P000

Post by JEJohns » Thu Aug 05, 2010 11:22 pm

I'm so sorry this is clearly in the wrong forum. I meant to place this in the Input file or Platform specific forum. I'm running these jobs on carver.nersc.gov, one of the Lawrence Berkeley National Lab computers using abinit 6.0.3

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: Job exiting after P000

Post by mverstra » Sat Sep 04, 2010 9:31 am

Hello JEJohns,

in the latest versions (6.2 I think, and certainly upcoming 6.4) I have made this error message more verbose: it arises because some of your processors are not responding. This can be due to:

1) some of them didn't manage to allocate memory. Check the individual *LOG* files for each one. This is the most common reason.
2) you have chosen a proc distribution which is not consistent (e.g. more than you have k-points) in which case some processors end up empty and complain.
3) other, real, error on the nodes. If possible check which nodes are complaining.


Matthieu
Matthieu Verstraete
University of Liege, Belgium

JEJohns
Posts: 55
Joined: Sun May 02, 2010 5:30 pm

Re: Job exiting after P000

Post by JEJohns » Wed Sep 08, 2010 7:22 am

Thanks for the reply. Unfortunately, I've switched positions and am now working @ Northwestern & argon, but I hear that they upgraded carver.nersc.gov to 6.2.1, so I'll pass the info along. Thanks
James

Locked