ACML or ACML_MP ?

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
yurtesen
Posts: 15
Joined: Sun Nov 11, 2012 9:42 pm

ACML or ACML_MP ?

Post by yurtesen » Thu Nov 15, 2012 10:08 am

Is there any difference between using ACML or ACML MP?

The reason I am asking is if I start 4 MPI jobs on a machine with 4 cores, if I use acml mp, will each mpi job make 4 threads (totaling up to 16?)

Thanks,
Evren

User avatar
Alain_Jacques
Posts: 279
Joined: Sat Aug 15, 2009 9:34 pm
Location: Université catholique de Louvain - Belgium

Re: ACML or ACML_MP ?

Post by Alain_Jacques » Fri Nov 16, 2012 11:32 am

Dear Evren,

You may have performance enhancements with ACML_MP library thanks to the multithreading of some of its functions - on large linalg systems and assuming you have available cores. Performance comparisons are somewhat complicated by the fact that recent multicores processors automatically tune the core frequencies depending on the number of cores that are busy.
Anyway Abinit and ACML_MP use different parallelization techniques - Abinit uses a MPI and OpenMP mixture, ACML_MP is OpenMP only so it's possible to overload the cores. Up to you - and your specific study (many k points? many bands? ...) to find the right balance between the two schemes. Furthermore, the number of threads opened by OpenMP routines can be adjusted with the OMP_NUM_THREADS environment variable so you can compile parallel Abinit and decide at runtime.

Kind regards,

Alain

yurtesen
Posts: 15
Joined: Sun Nov 11, 2012 9:42 pm

Re: ACML or ACML_MP ?

Post by yurtesen » Mon Nov 19, 2012 11:29 am

I understand that. I am building packages for a cluster so I wanted to make sure that the settings were not badly selected. It is difficult to teach people how to run their programs. It is diffult to make people set environment variables :)

As far as I understand, it would be best to use OMP_NUM_THREADS set to 1 when running MPI tasks on a cluster as long as each processor gets an MPI task on a node. However if the MPI would run 1 process per node then it is better to unset OMP_NUM_THREADS. Am I understanding correctly?

yurtesen
Posts: 15
Joined: Sun Nov 11, 2012 9:42 pm

Re: ACML or ACML_MP ?

Post by yurtesen » Tue Nov 20, 2012 11:09 am

It appears when ATLAS is configured to run with a specific number of threads, it ignores OMP_NUM_THREADS variable. :(

User avatar
Alain_Jacques
Posts: 279
Joined: Sat Aug 15, 2009 9:34 pm
Location: Université catholique de Louvain - Belgium

Re: ACML or ACML_MP ?

Post by Alain_Jacques » Tue Nov 20, 2012 3:47 pm

Right. Most of the time I build a MPI abinit with sequential blas/lapack and a sequential abinit with multithreaded blas/lapack libraries. And I'm lucky enough to have small unit cells and many k points so my studies efficiently run with MPI parallelism. If the case is sequential, I give it a few cores to please blas/lapack ... but don't expect linear performance gain.
I also suggest to split the different parts of an input file - most of the time, the parallelization requirements are very different, no need to waste CPUs.
IMHO ACML performances are so so ... if you have some time to spare on benchmarks, I suggest OpenBLAS or even MKL on AMD CPUs.

Kind regards,

Alain


... and yes, ATLAS hardcodes the number of threads at compile time ... most Linux packages default to 2 threads - pretty arbitrary

yurtesen
Posts: 15
Joined: Sun Nov 11, 2012 9:42 pm

Re: ACML or ACML_MP ?

Post by yurtesen » Tue Nov 20, 2012 7:02 pm

Actually, I sort of forgot that the thread I made was about ACML :) While ATLAS does not obey the openmp environment variables, ACML actually is able to run with 1 thread. The question now is if there is a performance penalty of running threaded acml with 1 thread, compared to serial acml :) I will let you know after running some tests, it is in my task queue :)

Locked