Dear developers,
I find that the berry phase calculation, the 3rd perturbation in spin-polarized Raman calculation, and the sigma calculation in SCF-GW have some parallel limitations, which are not mentioned in official documents.
(1) For the problem in berry phase calculation, one can test the tffield_6.in. If one runs it with CPU cores over 7, the calculation will never finish, and the CPU and memory are always well occupied.
(2) For the 3rd perturbation in the spin-polarized Raman calculation, one can use only one core to calculate it, or abinit will stop will the following information. (I have attached a test file)
-P-0000 leave_test : synchronization done...
kpgio: loop on k-points done in parallel
-P-0000
-P-0000 leave_new : decision taken to exit ...
(3) In the sigma calculation of SCF-GW, certain restriction on the number of CPU core also exists. For the test file I attached, one can use 6 cores at most, or abinit will fail with the following printout.
m_wfs.F90:3674:ERROR
Nobody has (band1, ik_ibz) (band2, ikp_ibz) spin: 29 1 44 1 1
-P-0000
-P-0000 leave_new : decision taken to exit ...
Sincerely,
Guangfu Luo
Parallel limitation in three kinds of calculation
Parallel limitation in three kinds of calculation
- Attachments
-
spin-polarized-Raman.zip.in
- remove .in
- (120.26 KiB) Downloaded 276 times
-
sigma-in-SCF-GW.zip.in
- remove .in
- (80.68 KiB) Downloaded 335 times
Re: Parallel limitation in three kinds of calculation
Hi Guangfu -thanks for your checking.
You are quite correct - there should be more intrinsic checks and more explicit error messages. For the hangs you saw this is fixed in recent versions: you are told that certain processors are not answering, and this is usually because they have no k-points at all.
This is possible if you give too many procs or if your distributions of kpt per proc is, e.g., 2 2 2 1 0 for 7 kpts on 5 procs: mkmem = 2 and you can see that you could do the same calculation in the same time on 4 procs. The 0-kpt processor hangs basically.
To make sure you are seeing real bugs you should eliminate these cases of parallelization beyond the number of k-points available.
For the other cases we'll have to look more closely.
Matthieu
You are quite correct - there should be more intrinsic checks and more explicit error messages. For the hangs you saw this is fixed in recent versions: you are told that certain processors are not answering, and this is usually because they have no k-points at all.
This is possible if you give too many procs or if your distributions of kpt per proc is, e.g., 2 2 2 1 0 for 7 kpts on 5 procs: mkmem = 2 and you can see that you could do the same calculation in the same time on 4 procs. The 0-kpt processor hangs basically.
To make sure you are seeing real bugs you should eliminate these cases of parallelization beyond the number of k-points available.
For the other cases we'll have to look more closely.
Matthieu
Matthieu Verstraete
University of Liege, Belgium
University of Liege, Belgium