[Q]parallel run with gamma point calculations

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
bjeon
Posts: 10
Joined: Tue Sep 28, 2010 10:11 pm

[Q]parallel run with gamma point calculations

Post by bjeon » Wed Dec 28, 2011 9:07 pm

Hi, I am testing surface problems with large vacuum space.
Using gamma point (kpt=0,0,0,) run, I am trying to run parallel run calculations as:

# for 2-CPU run
paral_kgb 1
npkpt 1
npband 2
npfft 1
bandpp 1

Because there is a single K-point, npkpt is given as 1. I am trying to configure npband or npfft. But this run shows slower results than a single CPU run (77 -> 103 sec for 10 SCF loops). A test problem has 58 bands and 40x40x160 FFT grids.

Are options right? Or the tested problem is too small for parallel run check? I tried different npband/npfft with different number of CPUs but still results are not good (getting worse for more CPUs). Any comment will be very appreciated.

ByoungSeon

nleconte
Posts: 9
Joined: Sat Apr 17, 2010 6:52 pm

Re: [Q]parallel run with gamma point calculations

Post by nleconte » Mon Jan 02, 2012 3:14 pm

When doing parallelisation over bands, make sure the communications between CPU's are efficient or have it work on the same node to share memory. I for instance had a very bad efficiency (almost factor of then) when doing a job requiring 8 CPU's when it was being distributed on different nodes instead of one same node...

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: [Q]parallel run with gamma point calculations

Post by mverstra » Sun Jan 22, 2012 4:10 pm

You are probably right that the problem is small. Hard to see speed up on 2 procs as well. Above all, the execution time is small, so you will have a large fraction of system time (start up, file i/o etc...) and some pollution if there are other processes around.

Have you checked what Nicolas mentioned, that the processes truly are being distributed by your mpi implementation and batch system?

Matthieu
Matthieu Verstraete
University of Liege, Belgium

bjeon
Posts: 10
Joined: Tue Sep 28, 2010 10:11 pm

Re: [Q]parallel run with gamma point calculations

Post by bjeon » Tue Jan 24, 2012 2:55 am

Hi.

Thanks for the comments and help.
Yes, I have been testing the jobs on SMP or multiple-core machines, not distributed environment.
But the still efficiency is not good. Or do I need any other compiling options? I used a following command for configuration.

./configure --enable-mpi --with-mpi-level=2 --prefix=/home/aaa/abinit FC=/opt/ompi143/bin/mpif90

Do I have to configure openMP or Pthreads?

As mentioned before, parallel run is very good on multiple K-points. Problem is that parallel calculations on a gamma point
calculation (a single K-point) is not good.

Best regards,

ByoungSeon

bjeon
Posts: 10
Joined: Tue Sep 28, 2010 10:11 pm

Re: [Q]parallel run with gamma point calculations

Post by bjeon » Tue May 01, 2012 11:33 pm

I leave some observations regarding gamma point parallelism.

1. For Xeon desktop (6corexhyperthread=12 threads) with intel fortran compiler didn't yield (?) any speed up in gamma point parallelism.

2. For AMD Opteron(tm) Processor 6176, 48 cores, with open64 5.0, 45x45x216 ngfft size problem, parallel option is given as:
paral_kgb 1
npkpt 1
npband 8 #---> change as 1, 2, 4, 8
nband 120
npfft 1
bandpp 1

20 SCF loops are done and the wall times are:
1cpu = 908.4 sec
2cpu = 743.4sec
4cpu = 385.6sec
8cpu = 245.1sec

There might be some overheads but gamma point parallelism seems working good. But also hw/compiler might be double-checked.

B.

Locked