Parallelization and convergence

erichmond · Post by **erichmond** » Sun Apr 19, 2020 11:48 pm

I am trying to get the attached input program to run faster and with greater convergence and without an error, I tried mpirun which works but only improves the speed by 35% and doesn't improve the convergence. Also, I tried many combinations and values for npband and bandpp based on the output from a run with autoparal = 1 but in every case, after 5 iterations, I get the error message

Src_file: m_lobpcg2.F90
Src_line: 525
Mpi_rank: 0

Message: I'm so sorry I couldn't make it, I did my best but I failed. Sorry. I am gonna suicide.

I attached the error file, logs041720-par21 for your information.

Can you please tell me what I am doing wrong in my attempt to get the program to parallelize. (The server I am using has 24 cores and 2 threads/core)

About convergence, - another version of the program running on just 1 CPU reaches tolvrs ~1e-10 after about 120 steps, but then oscillates in the 1e-10 and 1e-11 range for the remaining 360 steps. nstep = 480. The accuracy criterion is set to tolvrs 1e-12. I was hoping by using a large block size in parallelizing the input file,

input file.docx: (11.94 KiB) Downloaded 350 times

logs041720-par21.docx: (103.84 KiB) Downloaded 334 times

I would improve the convergence, but I get the above error instead.

Please help!!
Eliezer

ebousquet · Post by **ebousquet** » Mon Apr 20, 2020 8:02 am

Dear Eliezer,
For the convergence of the SCF you have to play with preconditioning and mixing parameters: see diemix, diemixmag (if magnetic), nline, diemac as a 1st attempt.
I see you have diemix=1 but it is strongly not advised to put 1! Reduce it should probably help your convergence problem.
Best wishes,
Eric

erichmond · Post by **erichmond** » Wed Apr 22, 2020 7:25 am

Eric,

Thank you for your response.

I should mention that these calculations are on a slab of 3 layers of Pd atoms and 7 layers of vacuum. I am not yet treating the system for its magnetic properties.

My input file already has the following statements
iprcel 45
iscf 7
npulayit 30
diemac 1e+8
diemix 1
nnsclo 3
nband 48
nline 6
optforces 0
nbdbuf 6

in order to improve the convergence and speed, and it works to a point.

I will try diemix = 0.5 in a run. I also included dielng = 0.25.

But I still have the problem parallelizing my program. The parallelization is complicated by the error I get.

Eliezer

ebousquet · Post by **ebousquet** » Wed Apr 22, 2020 9:13 am

Dear Eliezer,
Another problem I did not notice is that the ecut you use is way too small (you start with ecut=6).
For norm-conserving pseudopotentials from pseudodojo project, the ecut can be between 25 and 50. For PAW it can be between 14 and 30.
If the ecut is too small, the program can run into troubles to converge the SCF.
Best wishes,
Eric

erichmond · Post by **erichmond** » Wed Apr 22, 2020 11:29 pm

Eric,
The object of my computations is to develop high accuracy workfunctions for transition metals as a function of ecut = 6 - 60, spin, surface relaxation, Fermi energy, and slab thickness. So far I have only been working with variations as a function of the number of plane waves for Ni and Pd. When I relaxed a bulk structure as a function of ngkpt, tsmear, and acell ABINIT chose a value of ecut ~ 60.

Up to this point, I have been using the pseudopotentials from the ONCVPSP-PBE-PDv0.4 library. Is pseudodojo the same as ONCVPSP-PBE-PDv0.4? If so, I have a problem -many of my results are unreliable. So far the computations on the slab for 6 - 40 have converged to tolvrs 1e-12.

The addition of diemix 0.5 and dielng 0.25 did improve the convergence slightly where tolvrs ~ 5e-12 after 240 iterations instead of tolvrs ~ 1e-11, but still not at tolvrs 1e-12. What other preconditioning variables can I try that I am not already using?

Eliezer

PS: I computed ETOT for diemix = .2, .4, .6, .8, 1.0 with all the other parameters the same. For all values of diemix for sufficiently high iterations, ETOT oscillated in the range 3E-12< ETOT < 2e-11 Ha. For 0.6, 0.8, 1.0 the oscillations started around 40 interactions. For 0.2 and 0.4, the oscillations occurred around 100 and 60 iterations respectively.

ebousquet · Post by **ebousquet** » Tue Apr 28, 2020 8:40 am

erichmond wrote: ↑
Wed Apr 22, 2020 11:29 pm
The object of my computations is to develop high accuracy workfunctions for transition metals as a function of ecut = 6 - 60, spin, surface relaxation, Fermi energy, and slab thickness. So far I have only been working with variations as a function of the number of plane waves for Ni and Pd. When I relaxed a bulk structure as a function of ngkpt, tsmear, and acell ABINIT chose a value of ecut ~ 60.

What do you mean by "Abinit chose ecut~60"? Do you mean that your convergence tests with ecut give a good resolution on the property you are looking at for ecut=60?

erichmond wrote: ↑
Wed Apr 22, 2020 11:29 pm
Up to this point, I have been using the pseudopotentials from the ONCVPSP-PBE-PDv0.4 library. Is pseudodojo the same as ONCVPSP-PBE-PDv0.4? If so, I have a problem -many of my results are unreliable. So far the computations on the slab for 6 - 40 have converged to tolvrs 1e-12.

Yes, ONCVPSP-PBE-PDv0.4 is the same as what you have on pseudodojo.
What do you mean unreliable, is it because you cannot go below 1E-12 on the density/potential resolution?

erichmond wrote: ↑
Wed Apr 22, 2020 11:29 pm
The addition of diemix 0.5 and dielng 0.25 did improve the convergence slightly where tolvrs ~ 5e-12 after 240 iterations instead of tolvrs ~ 1e-11, but still not at tolvrs 1e-12. What other preconditioning variables can I try that I am not already using?

Eliezer

PS: I computed ETOT for diemix = .2, .4, .6, .8, 1.0 with all the other parameters the same. For all values of diemix for sufficiently high iterations, ETOT oscillated in the range 3E-12< ETOT < 2e-11 Ha. For 0.6, 0.8, 1.0 the oscillations started around 40 interactions. For 0.2 and 0.4, the oscillations occurred around 100 and 60 iterations respectively.

Maybe you are in a more pathological case...?
OK for diemix, sounds like a too small value just shifts the problem to higher number of SCF.
You used iprcel 45, did test if this is indeed the best? What about without it?
You can increase nline to 10 in case, it can be less computer demanding than increasing nnsclo (leave it by default and increase nline first).

Another remark, how many CPU are you using because in your input you put npfft 4 npband 24 npkpt 4 bandpp 2, meaning that you want to run on npfft*npband*npkpt=384 CPU times two threads per CPU (bandpp), is it correct?

Best wishes,
Eric

erichmond · Post by **erichmond** » Fri May 01, 2020 4:00 am

Eric,
I appreciate your detailed response. I'll respond item by item.

In the output of a relaxation run where I varied ngkpt[ 2 2 2, 4 4 4, 6 6 6, 8 8 8, 10 10 10] and tsmear [.0001 to .0005 by .0001 steps], there appeared, at the beginning of a dataset and right before the pseudopotential description, the following
Angles(23, 13, 12)= 90 90 120
getcut: wavevector= 0.0 0.0 0.0 ngfft= 36 36 120
ecut(hartree)= 58.190 => boxcut(ratio)- 2.01652
Normally, here the value of ecut just equals the value I insert in the input file. This is where I obtained the value of ecut for further computations. The input value of ecut was 44,

I jumped to the conclusion that my computations for the 1DM files [ecut = 6, 8, 10, 12, 14, 16, 18, 10, 22, 24 and 52, 52, 56, 58, and 60] were unreliable because I was using the pseudopotential in a forbidden range, and therefore couldn't be relied upon.

I might mention for bulk computations, a unit cell without vacuum, all computations for ecut = 6 - 60 converged, and for slab calculations, 3 atom layers and 7 vacuum layers, with ecut <= 40 all converged to tolvrs = 1e-12,. So I don't know why slab computations don't converge for ecut = 42-60.

With regard to iprcel = 45, I found that idea in the base Tutorial tbase4_7 where he recommended using this input term over dielng and diemax to precondition the computation when dealing with Al layers and vacuum. The writeup on iprcel is too brief to give me an intelligent way to vary iprcel. In that tutorial, the author also suggests using diemac = 3-5 and dielng = 0.2 for the Al-vacuum system. I tried this latter computation with diemax = 4 and dielng = 0.2 and after 240 iterations the last five iterations of tolvrs gave an average of 1.2 e-11 which is higher than the variations of diemix which was in the range of 7-9 e-12.

My early computations did not include iprcel,diemix, diemac, npulayit, nline, optforces, nbdbuf, or dielng, and the convergence for some slab runs, which didn't converge to tolvrs = 1 e-12, were orders of magnitude worse.

I am currently running computations where I vary diemac (1e5 to 1e9 by factors of 10)and nline (4, 5, 6, 7, 8, 9, 10). Each computation takes ~9s/iteration so the runs are taking days. The results so far are
diemac = 1 e+5 Average of last 5 iterations of tolvrs = 1.28 e-11 Iteration where tolvrs ~e-10 = 85
nline = 4 Average of last 5 iterations of tolvrs = 1.15 e-11 Iteration where tolvrs ~e-10 = 40
The programs for these tests are still running and will take days to complete if not weeks.

The server I am using has 24 CPU's with 2 threads each. Therefore I made npband = 24 and bandpp = 2 to generate the largest block. I ran an autoparal run with the following results
npband 6 8 6 6 6
npfft 8 6 6 8 6
bandpp 1 1 1 2 2
where the weights decreased from left too right. I am not sure where I obtained the values for npfft and npkpt which I used.

Thanks again for your help.
Eliezer

ebousquet · Post by **ebousquet** » Tue May 05, 2020 9:36 am

Dear Eliezer,
OK, sounds like it is a bit pathological case (metallic multilayers in vacuum are often harder to converge), thought reaching tolvrs to 1E-12 is not bad at all! Which property are you looking for in this system that needs to go beyond that?

Can you show what a grep ETOT gives in one of your cases that are not OK, such that I can see if it has some strange behavior before?

Otherwise, playing with the different preconditioners/mixing parameters is the way to go...
The iprcell is not highly described, indeed, but you can find some more technical discussion in the following paper:
https://journals.aps.org/prb/abstract/1 ... .78.045126

Best wishes,
Eric

erichmond · Post by **erichmond** » Wed May 06, 2020 3:46 am

Eric,
I am sorry to hear that you might think that my problem is pathological, which sounds like a black hole for unsolvable problems. Nonetheless, I will continue my preconditioning investigations.

Attached fine the file "PreConditioner parameter evaluation", which gives the results I have found so far. To give you a frame of reference, the constant parameters for each run are iprcel 45, dielng .25, diemac 1e+8, diemix .5, iscf 7, npulayit 30, nnsclo 3, nline 6, optforces 0, nbdbuf 20, nband 144. Each table in the attached file has 4 columns: first giving the value of the parameter being tested; second, giving the average and standard deviation of the 236 to 240 iteration values of vres; third the 240th vres value; and last, the iteration where vres first reaches the e-10 range. I consider the 2nd and 4th columns to be significant because the standard deviation of vres gives some idea of how noisy vres is at the 240th iteration; and, the 4th column gives some idea of how fast that particular run is converging. Based on these criteria I choose nline 7, diemix 0.6, and diemac 1 e+6. The experiment trying diemac 4 and dielng 0.2 instead of iprcel 45 gave about the same results as iprcel 45.

I am going to test iprcel for values 0, 25, 35, 45, 55, 65, 145 and dielng 0.2 to 1.0 in steps of ,2.

Also I have attached the ETOT file " ETOTnline042820.out", as you requested, for a test of nline 6 and 7. For the second dataset 2 (nline 7), the program quit running, for some unknown reason, after 103 iterations. There were no comments in the log file.

Eliezer

ebousquet · Post by **ebousquet** » Wed May 06, 2020 5:00 pm

Dear Eliezer,
Thank you for your feedback. Regarding the convergence tests of diemac, diemix, etc it looks like nline and diemac does not change anything and diemix=0.8 is the best value. The SCF is quite smooth without bumps such that this is quite good.
For iprcell, I don't have much experience with it, just one inhomogeneous dielectric system (BaO/BaTiO3 superlattice) which was not converging with most usual parameters (400 SCF to converge) but worked fantastically well with iprcell=41 (10 SCF to converge).
Now, I come back to my question, do you really need to go below 1E-10 or 1E-11 on tolvrs? When looking at your grep ETOT file, we can see that tolvrs is indeed stuck at 1E-11 but the residual on the energy (first column) is very well converged (1E-10/1E-11) as well as the residual on the wavefunction (second column, 1E-27). So, did you test physical properties you are interested in vs tolvrs, because maybe 1E-11 this is way enough?
Best wishes,
Eric

erichmond · Post by **erichmond** » Thu May 14, 2020 1:57 am

Eric,

I completed the tests on iprcel and dielng which are included in an updated version of "PreConditioner parameter evaluation" (attached). Also I put the optimum choices of the preconditioning parameters in a slab run with nstep = 2000. The results were better (iteration 1996-2000 =>4.72e-12+-2.69e-12, but vres had still not reached 1e-12. Perhaps 4000 iterations would work, but that seems excessive.

I also included in the "PreConditioner ,,," attachment two plots. The first plot is one of workfunction vs ecut and the second is a plot of the Fermi energy vs. ecut. It is obvious for ecut< 25 there is a sharp decrease in the workfunction and Fermi energy. These results are why I need to know whether these results are Valid for ecut < 25. If you don't know the answer, could you please point me to someone or some paper that could help.

Lastly, I have not yet, but will try tonight, to compare the workfunction vs potential residual for vres = 10e-9, -10, -11, -12 and -13 for say nstep = 500. ETOT converges for ecut < 42, so I will set ecut = 38 or 40.

Thank you for your continuing assistance,

Eliezer

ebousquet · Post by **ebousquet** » Fri May 15, 2020 11:05 am

Dear Eliezer,

I completed the tests on iprcel and dielng which are included in an updated version of "PreConditioner parameter evaluation" (attached). Also I put the optimum choices of the preconditioning parameters in a slab run with nstep = 2000. The results were better (iteration 1996-2000 =>4.72e-12+-2.69e-12, but vres had still not reached 1e-12. Perhaps 4000 iterations would work, but that seems excessive.

more than a few 100 steps is already excessive and a few 1000 sounds not to help a lot here (I mean, the precision gained is not worth the CPU cost).

I also included in the "PreConditioner ,,," attachment two plots. The first plot is one of workfunction vs ecut and the second is a plot of the Fermi energy vs. ecut. It is obvious for ecut< 25 there is a sharp decrease in the workfunction and Fermi energy. These results are why I need to know whether these results are Valid for ecut < 25. If you don't know the answer, could you please point me to someone or some paper that could help.

To help in visualizing your convergence tests, you could plot the residual instead of the total value vs ecut (by residual I mean subtracting all results by the best converged case, i.e. the highest ecut you went)? A log scale on the y axis might be helpful for this convergence test vs ecut.
You could also make a second plot with [(value(ecut)-value(ecut_max))/value(ecut_max)]*100 vs ecut to have it like a percentage of error so that you can say you have a precision of 1% at ecut=XX, 10% at ecut=YY, etc. This will help you to decide which ecut you have to use at the end.
The same applies vs tolvrs.

Lastly, I have not yet, but will try tonight, to compare the workfunction vs potential residual for vres = 10e-9, -10, -11, -12 and -13 for say nstep = 500. ETOT converges for ecut < 42, so I will set ecut = 38 or 40.

You could test the convergence for tolvrs starting at 1E-05 or 1E-04 and see how it goes every 1E-01.

Thank you for your continuing assistance,

You're welcome

Best wishes,
Eric

erichmond · Post by **erichmond** » Wed May 20, 2020 6:24 am

Eric,

I finished calculating the workfunction for values of 1d-4 to 1d-13. The results are at the end of the attached document. Included is a plot of the difference of the largest value of the workfunction, which occurs at tolvrs = 1d-10, from each of the others. The work function appears to be independent of the values for tolvrs, unlike the behavior of the workfunction with respect to the number of plane waves. I will plot the difference and its percentage of ecut vs. WF tomorrow.

Eliezer

ebousquet · Post by **ebousquet** » Mon May 25, 2020 7:06 am

Dear Eliezer,
OK, we can thus see that you don't need such a strict precision on tolvrs to get correct precision on the property you want to compute (WF here) and asking for more would be CPU time waste. Now you can safely test the convergence w.r.t. ecut and kpt.
Best wishes,
Eric

erichmond · Post by **erichmond** » Tue May 26, 2020 9:54 am

Eric,

I'm sorry that I did not get back to you earlier but I spent the last week recalculating the workfunction of Pd so that there is no systematic error. The result is shown in Figure 1 of the attachment.

Unfortunately, the workfunction does not monotonically converge to some limit as a function of ecut as the Fermi energy does. My original goal was to determine this limit. Taking your suggestion, I plotted in Figure 2 the difference of the value of the workfunction for ecut = 60 from all the other values and plotted the result on a log-linear plot. You might notice that the values in Figure 2 for ecut = 26 snd 28 are missing because they are negative. Further, I computed the percent difference of each value of the workfunction from the value at ecut = 60 and plotted it in Figure 3.

The general characteristic in Figures 2 and 3 is that the difference in the workfunction values, as ecut increases, decreases.

Thanks again for your assistance and guidance.
Eliezer

ebousquet · Post by **ebousquet** » Thu Jun 04, 2020 3:17 pm

Dear Eliezer,
OK, it sounds like you can get easily an error bar of 1% but going lower is CPU time demanding. At this stage you need to know if 1% precision on the WF is enough for you? If so, ecut=16-20 look good.
There is something making a noise oscillation but I don't know what can be the origin, which pseupotentials are you using?
Best wishes,
Eric
PS: If you are using PAW, did you make pawecutdg evolving with ecut in your convergence tests?

erichmond · Post by **erichmond** » Tue Jun 09, 2020 8:56 am

Eric
I was hoping to get an accuracy of 0.2% ~ 0.01eV, but this may not be practical.

I am using PBE GGA xc, ixc = 11 and pseudodojo pseudopotentials.

There is an oscillation in the tolvrs and toldfe values in the output file. Also, there is an oscillation in the vacuum portion of the 1DM files which decreases with increasing ecut (see the last two graphs in the attachment). I thought that the vacuum portion should rise to some flat region and then decrease.

Eliezer

ebousquet · Post by **ebousquet** » Wed Jun 10, 2020 11:49 am

erichmond wrote: ↑
Tue Jun 09, 2020 8:56 am
I was hoping to get 0.2% ~ 0.1eV accuracy, but this may not be practical.

Then, according to your convergence test, you have to use a large enough ecut to reach this precision...

erichmond wrote: ↑
Tue Jun 09, 2020 8:56 am
There is an oscillation in the tolvrs and toldfe values in the output file. Also, there is an oscillation in the vacuum portion of the 1DM files which decreases with increasing ecut. I thought that the vacuum portion should be flat.

Here it will be the limit of my help because I've never calculated work functions...
Did you see this document that might help you:
https://docs.abinit.org/guide/work_func ... orial.tex

Best wishes,
Eric

erichmond · Post by **erichmond** » Fri Jun 12, 2020 2:37 am

Eric

Yes. This was the article along with a Phys Rev B article by Verstraete that allowed me to get started calculating work functions.

Many thanks for all your help and guidance!!

Sayonara
Eliezer

ABINIT Discussion Forums

Parallelization and convergence

Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence

Re: Parallelization and convergence