Job exiting after P000

JEJohns · Post by **JEJohns** » Thu Aug 05, 2010 11:15 pm

While running abinit, I'm consistently getting jobs which exit early on larger jobs (either during the first iteration or beforehand) just after the following message.

"-P-0000 leave_test : synchronization done...
-P-0000 leave_test : exiting..."

This happened when I went from a 2x2 unit cell of graphene to a 3x3 unit cell of graphene, and when I increased the vacuum spacing on a silver slab to a large unit cell. I would have thought it was due to a memory usage issue, but it doesn't go away if I increase the number of nodes & processors (& thus the available memory). Am I doing something dumb? I've attached the input, output, and log files for the graphene unit cell where this happens (labelled as graphod.* for my own personal naming reasons). I've queued this job on 2 nodes with 20GB on each node.

JEJohns · Post by **JEJohns** » Thu Aug 05, 2010 11:22 pm

I'm so sorry this is clearly in the wrong forum. I meant to place this in the Input file or Platform specific forum. I'm running these jobs on carver.nersc.gov, one of the Lawrence Berkeley National Lab computers using abinit 6.0.3

mverstra · Post by **mverstra** » Sat Sep 04, 2010 9:31 am

Hello JEJohns,

in the latest versions (6.2 I think, and certainly upcoming 6.4) I have made this error message more verbose: it arises because some of your processors are not responding. This can be due to:

1) some of them didn't manage to allocate memory. Check the individual *LOG* files for each one. This is the most common reason.
2) you have chosen a proc distribution which is not consistent (e.g. more than you have k-points) in which case some processors end up empty and complain.
3) other, real, error on the nodes. If possible check which nodes are complaining.

Matthieu

JEJohns · Post by **JEJohns** » Wed Sep 08, 2010 7:22 am

Thanks for the reply. Unfortunately, I've switched positions and am now working @ Northwestern & argon, but I hear that they upgraded carver.nersc.gov to 6.2.1, so I'll pass the info along. Thanks
James

ABINIT Discussion Forums

Job exiting after P000

Job exiting after P000

Re: Job exiting after P000

Re: Job exiting after P000

Re: Job exiting after P000