Reducing memory needs

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
ruslan
Posts: 7
Joined: Tue Sep 07, 2010 5:19 pm

Reducing memory needs

Post by ruslan » Mon Oct 04, 2010 10:40 am

Hello, abinit users!

I am trying to calculate phonons using PAW formalism and abinit. Because it is not implemented [yet] i am falling back to phonopy code http://phonopy.sourceforge.net/ . The methodics is relatively straight-forward, program generates supercells with distorted atom positions and takes calculated forces as input to produce phonot spectra.

My problem arises when i am trying to simulate 3x3x3 supercell with 8 atoms in each unit cell. This results in roughly 1440 bands and i was unable to bring such a system to running due to out-of-memory errors. How can one reduce memory demands of simulation? The things i tried:
  • Just increasing number of processors does not help because the total number of k-points with ngkpt=4x4x4 is 16 and most of processors are idle.
  • Activation of paral_kgb option. Apart from standard settings wfoptalg 4, nloalg 4, fftalg 401, intxc 0 and fft_opt_lob 2 i tried enabling parallelization over k-points (npkpt 16, npband 2, npfft 1, bandpp 1). I tried various combinations of options, but i am missing some hint like "the memory used by each processor gets divided by npband (npfft or something else)". Is at all possible to reduce memory by parallelization?
  • Activating mkmem=0. Some data gets written to disc, but i am unsure whether the simulation gets slower (can one provide a rough estimate of slowdown factor?) The problem in this case is, that i cannot use band/fft parallelization and the simulation lasts for ages.

I am using a Cray XT6 machine with 24 processors and 32 GB/node. I tried reducing number of MPI tasks per node with little success. With 3 MPI tasks/node i could start the simulation, but this seem to be a very inefficient mode of operating parallel machine.

Kind regards,
Ruslan Zinetullin

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: Reducing memory needs

Post by mverstra » Mon Oct 11, 2010 12:08 pm

ruslan wrote:Hello, abinit users!

I am trying to calculate phonons using PAW formalism and abinit. Because it is not implemented [yet] i am falling back to phonopy code http://phonopy.sourceforge.net/ . The methodics is relatively straight-forward, program generates supercells with distorted atom positions and takes calculated forces as input to produce phonot spectra.

My problem arises when i am trying to simulate 3x3x3 supercell with 8 atoms in each unit cell. This results in roughly 1440 bands and i was unable to bring such a system to running due to out-of-memory errors. How can one reduce memory demands of simulation? The things i tried:
  • Just increasing number of processors does not help because the total number of k-points with ngkpt=4x4x4 is 16 and most of processors are idle.

4x4x4 = 64 - how many processors do you want to use? In most cases as you displace atoms you will break symmetry and use all of the k-points (of course nkpt will change for each displacement case...)

  • Activation of paral_kgb option. Apart from standard settings wfoptalg 4, nloalg 4, fftalg 401, intxc 0 and fft_opt_lob 2 i tried enabling parallelization over k-points (npkpt 16, npband 2, npfft 1, bandpp 1). I tried various combinations of options, but i am missing some hint like "the memory used by each processor gets divided by npband (npfft or something else)". Is at all possible to reduce memory by parallelization?

  • Yes, search the forum for paral_kgb and check the web site and variable definitions. Parllelizing over g vectors is even more efficient in reducing memory, but it can be much worse for parallel scaling if you have a slow network (ethernet)

  • Activating mkmem=0. Some data gets written to disc, but i am unsure whether the simulation gets slower (can one provide a rough estimate of slowdown factor?) The problem in this case is, that i cannot use band/fft parallelization and the simulation lasts for ages.

  • It will get slower. The factor will depend on your disk i/o so there is no rule (some systems are fine with this mode). And indeed there are a number of options you can't use with mkmem 0 - there is no guarantee all processors will have access to the scratch file, and some combinations are just not coded.



    I am using a Cray XT6 machine with 24 processors and 32 GB/node. I tried reducing number of MPI tasks per node with little success. With 3 MPI tasks/node i could start the simulation, but this seem to be a very inefficient mode of operating parallel machine.

    Kind regards,
    Ruslan Zinetullin
    Matthieu Verstraete
    University of Liege, Belgium

    ruslan
    Posts: 7
    Joined: Tue Sep 07, 2010 5:19 pm

    Re: Reducing memory needs

    Post by ruslan » Wed Oct 13, 2010 12:33 pm

    mverstra wrote:4x4x4 = 64 - how many processors do you want to use? In most cases as you displace atoms you will break symmetry and use all of the k-points (of course nkpt will change for each displacement case...)

    let's consider following example: 3x3x3 supercell of Nb3Sn. First, i tried to make a run with ideal atomic positions, an load this WF into perturbed simulations to ease convergence. I have following setup:

    Code: Select all

    acell 29.6035314709658941 29.6035314709658941 29.6035314709658941
    ntypat 2
    znucl 41 50
    natom 216
    typat 162*1 54*2
    nband 1440

    ecut 20
    pawecutdg 20
    kptopt 1
    ngkpt 4 4 4

    paral_kgb 1
    wfoptalg 4
    nloalg=4
    fftalg=401
    intxc=0
    fft_opt_lob=2

    npkpt 4
    npband 12
    npfft  2
    bandpp  4

    prtposcar 1

    chkprim 0
    maxnsym 1296 # supercell has more symmetries

    prtden 1
    prtwf 1
    prteig 0
    tolvrs 1.0d-16
    diemac 1000000
    xred
    ... positions here

    This case results in nkpt=4, which is fairly small. Memory needs seem to be within limits:
    P This job should need less than 1057.967 Mbytes of memory.
    Rough estimation (10% accuracy) of disk space for files :
    WF disk file : 811.760 Mbytes ; DEN or POT disk file : 6.594 Mbytes.

    Biggest array : pawfgrtab%gylm(gr), with 302.9008 MBytes.


    But my problem is, that i cannot run even such a simple system with 96 processors! log:
    ITER STEP NUMBER 1
    vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
    starting lobpcg, with nblockbd,mpi_enreg%nproc_band 30 12
    [NID 00040] 2010-10-13 01:01:40 Apid 23269: initiated application termination
    [NID 00040] 2010-10-13 01:01:48 Apid 23269: OOM killer terminated this process.
    Application 23269 exit signals: Killed
    Application 23269 resources: utime 0, stime 0


    Each node has 32GB of memory, which seem enough for this job. I ran abinit with paral_kgb=-96 and got

    Code: Select all

      nproc    npkpt    npband     npfft    bandpp    weight
        96       4        6         4          4           0.25
        96       4        8         3          4           0.50
        96       4       12         2          4           1.50
        96       4       24         1          4           1.00


    Obviously it makes no sence to increase npkpt, because there are only 4 k-points. Which parameter could be adjusted (if any) to run the simulation? I already reduced values of ecut/pawecutdg to rather small value of 20.

    Yes, search the forum for paral_kgb and check the web site and variable definitions. Parllelizing over g vectors is even more efficient in reducing memory, but it can be much worse for parallel scaling if you have a slow network (ethernet)


    What do you mean by parallelizing over g-vectors? npkpt? As you can see i set maximal allowable value of npkpt and it does not really help.

    Kind regards,
    Ruslan Zinetullin

    Locked