Page 1 of 1

ACML or ACML_MP ?

Posted: Thu Nov 15, 2012 10:08 am
by yurtesen
Is there any difference between using ACML or ACML MP?

The reason I am asking is if I start 4 MPI jobs on a machine with 4 cores, if I use acml mp, will each mpi job make 4 threads (totaling up to 16?)

Thanks,
Evren

Re: ACML or ACML_MP ?

Posted: Fri Nov 16, 2012 11:32 am
by Alain_Jacques
Dear Evren,

You may have performance enhancements with ACML_MP library thanks to the multithreading of some of its functions - on large linalg systems and assuming you have available cores. Performance comparisons are somewhat complicated by the fact that recent multicores processors automatically tune the core frequencies depending on the number of cores that are busy.
Anyway Abinit and ACML_MP use different parallelization techniques - Abinit uses a MPI and OpenMP mixture, ACML_MP is OpenMP only so it's possible to overload the cores. Up to you - and your specific study (many k points? many bands? ...) to find the right balance between the two schemes. Furthermore, the number of threads opened by OpenMP routines can be adjusted with the OMP_NUM_THREADS environment variable so you can compile parallel Abinit and decide at runtime.

Kind regards,

Alain

Re: ACML or ACML_MP ?

Posted: Mon Nov 19, 2012 11:29 am
by yurtesen
I understand that. I am building packages for a cluster so I wanted to make sure that the settings were not badly selected. It is difficult to teach people how to run their programs. It is diffult to make people set environment variables :)

As far as I understand, it would be best to use OMP_NUM_THREADS set to 1 when running MPI tasks on a cluster as long as each processor gets an MPI task on a node. However if the MPI would run 1 process per node then it is better to unset OMP_NUM_THREADS. Am I understanding correctly?

Re: ACML or ACML_MP ?

Posted: Tue Nov 20, 2012 11:09 am
by yurtesen
It appears when ATLAS is configured to run with a specific number of threads, it ignores OMP_NUM_THREADS variable. :(

Re: ACML or ACML_MP ?

Posted: Tue Nov 20, 2012 3:47 pm
by Alain_Jacques
Right. Most of the time I build a MPI abinit with sequential blas/lapack and a sequential abinit with multithreaded blas/lapack libraries. And I'm lucky enough to have small unit cells and many k points so my studies efficiently run with MPI parallelism. If the case is sequential, I give it a few cores to please blas/lapack ... but don't expect linear performance gain.
I also suggest to split the different parts of an input file - most of the time, the parallelization requirements are very different, no need to waste CPUs.
IMHO ACML performances are so so ... if you have some time to spare on benchmarks, I suggest OpenBLAS or even MKL on AMD CPUs.

Kind regards,

Alain


... and yes, ATLAS hardcodes the number of threads at compile time ... most Linux packages default to 2 threads - pretty arbitrary

Re: ACML or ACML_MP ?

Posted: Tue Nov 20, 2012 7:02 pm
by yurtesen
Actually, I sort of forgot that the thread I made was about ACML :) While ATLAS does not obey the openmp environment variables, ACML actually is able to run with 1 thread. The question now is if there is a performance penalty of running threaded acml with 1 thread, compared to serial acml :) I will let you know after running some tests, it is in my task queue :)