Discussion:
nbnxn generation
Roland Schulz
2014-08-22 04:39:28 UTC
Permalink
Hi,

I vaguely remember that someone said there are some plans of generating the
verlet kernels in some new way. Do I remember that correctly or do no such
plan exist? If they do, what are they?

Erik, you mentioned previously that there are potential problems with
intrinsics and C++. Do we already have any examples of that?

If C++ and intrinsics doesn't cause any problems one option to replace the
preprocessor would be using templated functions (for an example see here:
http://stackoverflow.com/questions/6179295/if-statement-inside-a-cuda-kernel/6179580#6179580
- this is for CUDA but doesn't effect the idea).

Roland
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
Mark Abraham
2014-08-22 12:07:03 UTC
Permalink
Post by Roland Schulz
Hi,
I vaguely remember that someone said there are some plans of generating
the verlet kernels in some new way. Do I remember that correctly or do no
such plan exist? If they do, what are they?
Erik has some plans - the general idea is that we use a python script to
generate flat source files with no conditionality of any sort, on similar
lines to the current group scheme kernel generator.

Erik, you mentioned previously that there are potential problems with
Post by Roland Schulz
intrinsics and C++. Do we already have any examples of that?
If C++ and intrinsics doesn't cause any problems one option to replace the
http://stackoverflow.com/questions/6179295/if-statement-inside-a-cuda-kernel/6179580#6179580
- this is for CUDA but doesn't effect the idea).
Yes, that's another way we could metaprogram. I suggested Christian try it
out for combining the FFT grids for better LJPME performance (see draft at
https://gerrit.gromacs.org/#/c/3266/11 in fft5p.cpp for those with access).
Inasmuch as constant-propagation and dead-code optimizations probably work
fine within a function, even when templated, then I think that's an
approach we could use in places. It's definitely better than duplication,
and often more readable than preprocessor-based techniques.

The downside for template meta-programming for non-bonded kernels is that
you still get a kernel with every possible code path in it, whereas with a
generator script doing the metaprogramming, you can read whichever one
suits your current purpose. Being able to see the meta-programming output
can be useful when developing new kernels. I imagine compile time of ~100
kernels might be a little faster overall with the generator script, and
debuggers might have a better time, too.

The downside of generation is having lots of generated code. For the
tarball, we should generate the code, but perhaps for the repo we should do
it at configure time.

On Anton, they metaprogram even harder - the executables for NPT and NVT
are different, for example.

Mark
Post by Roland Schulz
Roland
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
--
Gromacs Developers mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
Loading...