faster indgrid() (PAW)

Documentation, Web site and code modifications

Moderators: baguetl, routerov

Locked
chen
Posts: 13
Joined: Mon Apr 19, 2010 4:58 am

faster indgrid() (PAW)

Post by chen » Tue May 04, 2010 2:03 am

Hi there,

the subroutine indgrid() under 66_paw is very slow, it seems that the loops are wasted since every time the fine grid will loop from q_max to 0 and then from -q_max to 0. Therefore the computational cost is N_fine*N_coarse. Is there any plan to make it faster?

here I just tested this modification which is much faster now (linear scaling with the # of grid points), it takes only several seconds with my PAW test, and the old version takes 8 mins. BTW, all the leading spaces in the code has been removed after submitting the post, due to some reason......

!{\src2tex{textfont=tt}}
!!****f* ABINIT/indgrid
!!
!! NAME
!! indgrid
!!
!! FUNCTION
!! Calculate the correspondance between the coarse grid and
!! the fine grid for PAW calculations.
!!
!! COPYRIGHT
!! Copyright (C) 1998-2010 ABINIT group (FJ, MT)
!! This file is distributed under the terms of the
!! GNU General Public License, see ~abinit/COPYING
!! or http://www.gnu.org/copyleft/gpl.txt .
!! For the initials of contributors, see ~abinit/doc/developers/contributors.txt.
!!
!! INPUTS
!! nfftc=total number of FFt grid=n1*n2*n3 for the coarse grid
!! nfftf=total number of FFt grid=n1*n2*n3 for the fine grid
!! ngfftc(18)=contain all needed information about 3D FFT, for the coarse grid,
!! see ~abinit/doc/input_variables/vargs.htm#ngfft
!! ngfftf(18)=contain all needed information about 3D FFT, for the fine grid,
!! see ~abinit/doc/input_variables/vargs.htm#ngfft
!!
!! OUTPUT
!! coatofin(nfftc)= index of the points of the coarse grid on the fine grid
!! fintocoa(nfftf)=index of the points of the fine grid on the
!! coarse grid (=0 if the point of the fine grid does not belong to
!! the coarse grid).
!!
!! PARENTS
!! gstate,gstateimg,gw_tools,m_paw_toolbox,m_rec,respfn
!!
!! CHILDREN
!!
!! SOURCE

#if defined HAVE_CONFIG_H
#include "config.h"
#endif

#include "abi_common.h"

subroutine indgrid(coatofin,fintocoa,nfftc,nfftf,ngfftc,ngfftf)

use defs_basis
use m_errors

implicit none

!Arguments ------------------------------------
!scalars
integer,intent(in) :: nfftc,nfftf
!arrays
integer,intent(in) :: ngfftc(18),ngfftf(18)
integer,intent(out) :: coatofin(nfftc),fintocoa(nfftf)

!Local variables-------------------------------
!scalars
integer :: i1,i2,i3,if1,if2,if3,ii,ing,n1c,n1f,n2c,n2f,n3c,n3f,narg1,narg2
!arrays
integer :: id(3)
integer,allocatable :: gc(:,:),gf(:,:)
integer :: backup_f1,backup_f2,backup_f3

! *************************************************************************
!

DBG_ENTER("COLL")

n1c=ngfftc(1);n2c=ngfftc(2);n3c=ngfftc(3)
n1f=ngfftf(1);n2f=ngfftf(2);n3f=ngfftf(3)

allocate(gc(3,max(n1c,n2c,n3c)))
do ii=1,3
id(ii)=ngfftc(ii)/2+2
do ing=1,ngfftc(ii)
gc(ii,ing)=ing-(ing/id(ii))*ngfftc(ii)-1
end do
end do

allocate(gf(3,max(n1f,n2f,n3f)))
do ii=1,3
id(ii)=ngfftf(ii)/2+2
do ing=1,ngfftf(ii)
gf(ii,ing)=ing-(ing/id(ii))*ngfftf(ii)-1
end do
end do

! coatofin=0;fintocoa=0
! do i1=1,n1c
! do if1=1,n1f
! if(gc(1,i1)==gf(1,if1)) then
! do i2=1,n2c
! do if2=1,n2f
! if(gc(2,i2)==gf(2,if2)) then
! do i3=1,n3c
! do if3=1,n3f
! if(gc(3,i3)==gf(3,if3)) then
! narg1=i1+n1c*(i2-1+n2c*(i3-1))
! narg2=if1+n1f*(if2-1+n2f*(if3-1))
! coatofin(narg1)=narg2
! fintocoa(narg2)=narg1
! exit
! end if
! end do
! end do
! end if
! end do
! end do
! end if
! end do
! end do

coatofin=0;fintocoa=0
backup_f1 = 1
backup_f2 = 1
backup_f3 = 1

do i1=1,n1c
do if1=backup_f1,n1f
if(gc(1,i1)==gf(1,if1)) then
backup_f1 = if1+1
backup_f2 = 1
do i2=1,n2c
do if2=backup_f2,n2f
if(gc(2,i2)==gf(2,if2)) then
backup_f2 = if2+1
backup_f3 = 1
do i3=1,n3c
do if3=backup_f3,n3f
if(gc(3,i3)==gf(3,if3)) then
backup_f3=if3+1
narg1=i1+n1c*(i2-1+n2c*(i3-1))
narg2=if1+n1f*(if2-1+n2f*(if3-1))
coatofin(narg1)=narg2
fintocoa(narg2)=narg1
exit
end if
end do
end do
exit
end if
end do
end do
exit
end if
end do
end do

deallocate(gf,gc)

! open(unit=111,action='write')
! write(111,*)coatofin
! close(111)
! open(unit=112,action='write')
! write(112,*)fintocoa
! close(112)
! print *, 'stop in 66_paw/indgrid.F90'
! stop

DBG_EXIT("COLL")

end subroutine indgrid
!!***


=======================================

chen

User avatar
pouillon
Posts: 651
Joined: Wed Aug 19, 2009 10:08 am
Location: Spain
Contact:

Re: faster indgrid() (PAW)

Post by pouillon » Tue May 04, 2010 1:14 pm

Could you submit the source as an attachment? Thank you in advance.
Yann Pouillon
Simune Atomistics
Donostia-San Sebastián, Spain

chen
Posts: 13
Joined: Mon Apr 19, 2010 4:58 am

Re: faster indgrid() (PAW)

Post by chen » Tue May 04, 2010 2:05 pm

thanks for the prompt reply, and here is the attachment. please let me know how do you think. I tried many times, however cannot attach the file, the error message is "The extension F90 is not allowed." so weired .... I have re-posted the full indgrdi() code.


chen huang

Code: Select all

!{\src2tex{textfont=tt}}
!!****f* ABINIT/indgrid
!!
!! NAME
!! indgrid
!!
!! FUNCTION
!! Calculate the correspondance between the coarse grid and
!! the fine grid for PAW calculations.
!!
!! COPYRIGHT
!! Copyright (C) 1998-2010 ABINIT group (FJ, MT)
!! This file is distributed under the terms of the
!! GNU General Public License, see ~abinit/COPYING
!! or http://www.gnu.org/copyleft/gpl.txt .
!! For the initials of contributors, see ~abinit/doc/developers/contributors.txt.
!!
!! INPUTS
!! nfftc=total number of FFt grid=n1*n2*n3 for the coarse grid
!! nfftf=total number of FFt grid=n1*n2*n3 for the fine grid
!! ngfftc(18)=contain all needed information about 3D FFT, for the coarse grid,
!!        see ~abinit/doc/input_variables/vargs.htm#ngfft
!! ngfftf(18)=contain all needed information about 3D FFT, for the fine grid,
!!        see ~abinit/doc/input_variables/vargs.htm#ngfft
!!
!! OUTPUT
!! coatofin(nfftc)= index of the points of the coarse grid on the fine grid
!! fintocoa(nfftf)=index of the points of the fine grid on the
!!   coarse grid (=0 if the point of the fine grid does not belong to
!!   the coarse grid).
!!
!! PARENTS
!!      gstate,gstateimg,gw_tools,m_paw_toolbox,m_rec,respfn
!!
!! CHILDREN
!!
!! SOURCE

#if defined HAVE_CONFIG_H
#include "config.h"
#endif

#include "abi_common.h"

subroutine indgrid(coatofin,fintocoa,nfftc,nfftf,ngfftc,ngfftf)

 use defs_basis
 use m_errors

 implicit none

!Arguments ------------------------------------
!scalars
 integer,intent(in) :: nfftc,nfftf
!arrays
 integer,intent(in) :: ngfftc(18),ngfftf(18)
 integer,intent(out) :: coatofin(nfftc),fintocoa(nfftf)

!Local variables-------------------------------
!scalars
 integer :: i1,i2,i3,if1,if2,if3,ii,ing,n1c,n1f,n2c,n2f,n3c,n3f,narg1,narg2
!arrays
 integer :: id(3)
 integer,allocatable :: gc(:,:),gf(:,:)
 integer :: backup_f1,backup_f2,backup_f3

! *************************************************************************
!

 DBG_ENTER("COLL")

 n1c=ngfftc(1);n2c=ngfftc(2);n3c=ngfftc(3)
 n1f=ngfftf(1);n2f=ngfftf(2);n3f=ngfftf(3)

 allocate(gc(3,max(n1c,n2c,n3c)))
 do ii=1,3
   id(ii)=ngfftc(ii)/2+2
   do ing=1,ngfftc(ii)
     gc(ii,ing)=ing-(ing/id(ii))*ngfftc(ii)-1
   end do
 end do

 allocate(gf(3,max(n1f,n2f,n3f)))
 do ii=1,3
   id(ii)=ngfftf(ii)/2+2
   do ing=1,ngfftf(ii)
     gf(ii,ing)=ing-(ing/id(ii))*ngfftf(ii)-1
   end do
 end do

!====================================
! The code below takes time
! proportional to N_fine*N_coarse
! very slow,
! I re-write a faster code
!
!
! coatofin=0;fintocoa=0
! do i1=1,n1c
!   do if1=1,n1f
!     if(gc(1,i1)==gf(1,if1)) then
!       do i2=1,n2c
!         do if2=1,n2f
!           if(gc(2,i2)==gf(2,if2)) then
!             do i3=1,n3c
!               do if3=1,n3f
!                 if(gc(3,i3)==gf(3,if3)) then
!                   narg1=i1+n1c*(i2-1+n2c*(i3-1))
!                   narg2=if1+n1f*(if2-1+n2f*(if3-1))
!                   coatofin(narg1)=narg2
!                   fintocoa(narg2)=narg1
!                   exit
!                 end if
!               end do
!             end do
!           end if
!         end do
!       end do
!     end if
!   end do
! end do

 coatofin=0;fintocoa=0
 backup_f1 = 1
 backup_f2 = 1
 backup_f3 = 1

 do i1=1,n1c
   do if1=backup_f1,n1f
     if(gc(1,i1)==gf(1,if1)) then
       backup_f1 = if1+1
       backup_f2 = 1
       do i2=1,n2c
         do if2=backup_f2,n2f
           if(gc(2,i2)==gf(2,if2)) then
             backup_f2 = if2+1
             backup_f3 = 1
             do i3=1,n3c
               do if3=backup_f3,n3f
                 if(gc(3,i3)==gf(3,if3)) then
                   backup_f3=if3+1
                   narg1=i1+n1c*(i2-1+n2c*(i3-1))
                   narg2=if1+n1f*(if2-1+n2f*(if3-1))
                   coatofin(narg1)=narg2
                   fintocoa(narg2)=narg1
                   exit
                 end if
               end do
             end do
             exit
           end if
         end do
       end do
       exit
     end if
   end do
 end do

 deallocate(gf,gc)

! open(unit=111,action='write')
! write(111,*)coatofin
! close(111)
! open(unit=112,action='write')
! write(112,*)fintocoa
! close(112)
! print *, 'stop in  66_paw/indgrid.F90'
! stop

 DBG_EXIT("COLL")

end subroutine indgrid
!!***


User avatar
gmatteo
Posts: 291
Joined: Sun Aug 16, 2009 5:40 pm

Re: faster indgrid() (PAW)

Post by gmatteo » Tue May 04, 2010 3:56 pm

Dear Huang,

there's something not clear to me in the new implementation.
You are changing the extrema of the do loops *inside* the loop
but, if I remember well, the values of start, stop and increment are computed once;
i.e. changing them inside the loop doesn't affect the loop.
Did you test the new piece of code and with which compiler?

BTW: Another possible solution might be ordering the
points in the two meshes according to the their length (almost linear wrt nfft).
The sorting algorithm will produce a table that can be used to loop over the set of
points belonging to shell of the FFT-point that has to be located.

M

chen
Posts: 13
Joined: Mon Apr 19, 2010 4:58 am

Re: faster indgrid() (PAW)

Post by chen » Tue May 04, 2010 4:48 pm

Hi there,

I am using ifort9, here is the info:
which mpif90 ===> /usr/local/openmpi/1.3.3/intel101/x86_64/bin/mpif90
the code has been tested to get the same coatofin and fintocoa arrays and the final energies are the same as the original code.

The original code will be slow, since the inner loops (loops of if1, if2, and if3) will try to go from 1 to n1f, n2f, and n3f whenever the coarse indexes i1,i2, and i3 proceed by 1. Therefore the fine mesh is swept for each point in the coarse mesh. The computational cost is therefore N_fine*N_coarse. So i added another two "exit" statements and reset the start indexes for if1 if2 and if3.

Your consideration for ordering the points is very nice. By coding in this way, I find that the points in the coarse mesh are already sorted. For example, since i1=1,nc1, which means the coarse grid's x-dimension goes from 0 to qx_max, and then from -qx_max to 0. And we know that if1 always lags behind i1, therefore we do not need to sort the x-dimension. The same reason for the y and z dimensions.

best,
chen

Locked