Page 1 of 1

Automatic parallelising over datasets (jdtset)

Posted: Tue Nov 29, 2011 12:05 am
by JackMedley
Hello,
I have been running a loop over various positions of atoms in a crystal (input file can be found here: http://dl.dropbox.com/u/21305328/fe-pnic.in). And I have access to many hundreds of processors. However, I noticed in the log file after one of the runs that it was only using as many processors as there were k-points being used.
Is there away I can make ABINIT run some of the different datasets (with difference values of jdtset) at the same time? This would massively cut down on runtime. Any suggestions on how to do this would be greatly appreciated. Thanks
Jack

Re: Automatic parallelising over datasets (jdtset)

Posted: Tue Nov 29, 2011 12:37 am
by david.waroquiers
Hello,

Look at the new tutorials on parallelism, in particular the one on "ground state with plane waves" (http://www.abinit.org/documentation/helpfiles/for-v6.10/tutorial/lesson_paral_gspw.html) and the one on "images" which you might use for the different positions of the atoms in your crystal instead of using datasets (http://www.abinit.org/documentation/helpfiles/for-v6.10/tutorial/lesson_paral_string.html). Note that images are need not necessarily to be used with the string method as in this tutorial.

David

Re: Automatic parallelising over datasets (jdtset)

Posted: Tue Nov 29, 2011 1:23 am
by JackMedley
Can't believe I didn't find that myself I've been trying to figure this out for hours. Thank you!

Re: Automatic parallelising over datasets (jdtset)

Posted: Tue Nov 29, 2011 2:58 am
by JackMedley
Oh yes one other quick question; You dont happen to know if there are any GPU flags that need to be included in the input file to force the code to use a graphics card? I compiled ABINIT on a machine correctly on a GPU machine but when i ran a rest it took exactly the same time? Thanks in advance
Jack

Re: Automatic parallelising over datasets (jdtset)

Posted: Tue Nov 29, 2011 10:49 am
by david.waroquiers
Hello,

The GPU enabled part will only be available in the next few weeks/months.

David

Re: Automatic parallelising over datasets (jdtset)

Posted: Thu Dec 01, 2011 6:41 pm
by JackMedley
Hello,
OK i've read through all that material and have started trying to run it in the more parallel way, however when i try and run the code with the following added to my input file:

paral_kgb 24
npkpt 3
npband 8
npfft 1
bandpp 1

(running it on 6 nodes each with four processors) I get errors like:


Subroutine Unknown:0:WARNING
The second and third dimension of the FFT grid, 0 0 were imposed to be multiple of the number of processors for the FFT, 24
For input ecut= 2.500000E+01 best grid ngfft= 36 48 120
max ecut= 2.753148E+01
However, must be changed due to symmetry => 48 48 120
with max ecut= 0.325545E+02

==== FFT mesh ====
FFT mesh divisions ........................ 48 48 120
Augmented FFT divisions ................... 49 49 120
FFT algorithm ............................. 401
FFT cache size ............................ 16
getmpw: optimal value of mpw= 709

getdim_nloc : deduce lmnmax = 15, lnmax = 3,
lmnmaxso= 15, lnmaxso= 3.
setmqgrid : COMMENT -
The number of points "mqgrid" in reciprocal space used for the
description of the pseudopotentials has been set automatically
by abinit to : 5258.
memory : analysis of memory needs
================================================================================
Values of the parameters that define the memory need for DATASET 1.
intxc = 0 ionmov = 0 iscf = 7 xclevel = 1
lmnmax = 3 lnmax = 3 mband = 52 mffmem = 1
P mgfft = 120 mkmem = 1 mpssoang= 4 mpw = 709
mqgrid = 5258 natom = 16 nfft = 11520 nkpt = 3
nloalg = 4 nspden = 1 nspinor = 1 nsppol = 1
nsym = 32 n1xccc = 2501 ntypat = 4 occopt = 3
================================================================================
P This job should need less than 19.391 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 40.506 Mbytes ; DEN or POT disk file : 0.090 Mbytes.
================================================================================

Biggest array : f_fftgr(disk), with 1.4083 MBytes.
-P-0000 leave_test : synchronization done...
memana : allocated an array of 1.408 Mbytes, for testing purposes.
memana : allocated 19.391 Mbytes, for testing purposes.
The job will continue.
[node064:3504] *** An error occurred in MPI_Comm_free
[node064:3504] *** on communicator MPI_COMM_WORLD
[node064:3504] *** MPI_ERR_COMM: invalid communicator
[node064:3504] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 13 with PID 18197 on
node node065.ic.cluster exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[node059:25307] 23 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[node059:25307] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages


I have also tried having paral_kgb -24 (as seems to be suggested by the variable list) but then I get:

symkpt : found identity, with number 1
getkgrid : length of smallest supercell vector (bohr)= 2.155448E+01
Simple Lattice Grid
symkpt : found identity, with number 1
getkgrid : length of smallest supercell vector (bohr)= 4.310897E+01
Simple Lattice Grid
symkpt : found identity, with number 1
getkgrid : length of smallest supercell vector (bohr)= 6.466345E+01
Simple Lattice Grid
symkpt : found identity, with number 1
getkgrid : length of smallest supercell vector (bohr)= 3.048264E+01
Simple Lattice Grid
symkpt : found identity, with number 1
mpi_enreg%sizecart(1),np_fft 1 1
mpi_enreg%sizecart(2),np_band 8 8
mpi_enreg%sizecart(3),np_kpt 3 3
in initmpi_grid : me_fft, me_band, me_kpt are 0 0 0
invars1: mkmem undefined in the input file. Use default mkmem = nkpt
invars1: With nkpt_me= 1 and mkmem = 3, ground state wf handled in core.
Resetting mkmem to nkpt_me to save memory space.
invars1: mkqmem undefined in the input file. Use default mkqmem = nkpt
invars1: With nkpt_me= 1 and mkqmem = 3, ground state wf handled in core.
Resetting mkqmem to nkpt_me to save memory space.
invars1: mk1mem undefined in the input file. Use default mk1mem = nkpt
invars1: With nkpt_me= 1 and mk1mem = 3, ground state wf handled in core.
Resetting mk1mem to nkpt_me to save memory space.

COMMENT in invars1m For dataset= 10 a possible choice for less than 24 processors is:
nproc npkpt npband npfft bandpp weight
24 3 8 1 1 1.00
24 3 4 2 2 0.50
18 3 3 2 4 0.25
18 3 6 1 4 1.00
12 3 4 1 2 1.00
12 3 2 2 4 0.25
9 3 3 1 4 1.00
6 3 2 1 4 1.00

invars1m : launch a parallel version of ABINIT with a number of processors among the above list, and the associated input variables npkpt, npband, npfft and bandpp. The optimal weight is close to 1.
-P-0000
-P-0000 leave_new : decision taken to exit ...

Am I doing something wrong in here? Cheers
Jack

Re: Automatic parallelising over datasets (jdtset)

Posted: Fri Dec 02, 2011 3:16 pm
by nleconte
From what I understand, if you type : paral_kgb -24, it will give you the different combinations you can use to parallelize over 24 processors. What it does, see the list at the end of your file (you should chose one with a weight close to 1).

Once you actually want to do the parallelization, you should put paral_kgb 1, which activates the parallelization over bands, fft, kpt and spinor. You should not put paral_kgb 24.

But I may not be the reference in this, as I have some problem doing something similar myself, cfr another thread in this forum...