Response calculations in 2 steps creates problems in DDB's

uma · Post by **uma** » Fri Jul 04, 2014 3:28 pm

I am working with 108 atoms. I did the response function displacing all atoms in all directions (rfatpol 1 108). Since the job stopped in a few hours after calculating up to 20 atoms, I resubmitted the job with (rfatpol 20 108). I got 2 trf2_1.out files one for each job. Later when I used mrgddb and anaddb tools, I got the error message that information was missing in DDB's regarding my atom 1, x direction. I scanned through the DDB files. DS1_DDB was ok. But the other DDB files had info only from the 20th atom. (i.e the derivatives..). Is there a way to get around this..?

I did not come across this problem when I did all the displacements in one job.

Uma

Boris · Post by **Boris** » Fri Jul 18, 2014 11:17 pm

Hello

Did your first job produce a DDB file?

In the second run you specified rfatpol=20 108 so it is normal that in your second DDB (DS2_DDB), you have only the information starting from the 20nd atom.

You need to merge the 2 DDB's with mrgddb, then you should have a complete DDB with all atoms.

Boris

uma · Post by **uma** » Sat Jul 19, 2014 9:15 am

Dear Boris,

Many thanks for your response to my query and initiative to help.

I am working on an amorphous system of 108 atoms. The job with trf2_1.in had 2 datasets. Dataset 1 did the SCF calculations. It went through completely producing DS1_DDB. The 2nd dataset involved the displacement of 108 atoms. The job stopped midway through this dataset. So DS2_DDB was not available then. I resubmitted the job with just the 2nd dataset displacing from the 20th atom. I had all the WF function files up to the 19th atom. So the DS2_DDB had information only from the 20th atom. I am never able to run dataset 2 completely in a single job. Wall time is insufficient. When I increase the number of processors, the job takes a very long time and still stops mid-way . I think involving too many processors doesn't really help.. Most of the time they are just communicating among themselves..Do you have a way to suggest to get around this problem..?

Thanking you in advance,
Uma

Boris · Post by **Boris** » Wed Jul 23, 2014 9:30 pm

Hello Uma

I see what you mean. If indeed you cannot run the second dataset completely, you need to split it in more dataset. For supercells, here's what I personally do:

- 1st dataset : the ground state
- 2nd dataset : I displace only the first ten atoms, using rfatpol2 = 1 10
- 3rd dataset : I displace another set of 10 atoms, using rfatpol3 = 11 20
etc.

Then you can run all your dataset separately using ndtset = 1 and jdtset = X. It will run in "parallel" and each dtset will produce a DDB file that you need to merge with all the others.

This works well if you have one q-point (which I believe is your case since you have a 108 atom cell). If you have more qpts, you need to tweak this a little, but the idea is here.

Hope this helps
Boris

EDIT : As for the cpu distrib, what I usually do is that I parallelize on bands only. It works quite well, though I advice you to use the latest abinit version 7.8.x because we have made some major modifications on band parallelisation lately.

umasubbiah · Post by **umasubbiah** » Tue Aug 19, 2014 4:19 pm

Thanks Boris.. Wonder why it didn't occur to me..
Uma

ABINIT Discussion Forums

Response calculations in 2 steps creates problems in DDB's

Response calculations in 2 steps creates problems in DDB's

Re: Response calculations in 2 steps creates problems in DDB

Re: Response calculations in 2 steps creates problems in DDB

Re: Response calculations in 2 steps creates problems in DDB

Re: Response calculations in 2 steps creates problems in DDB