Remember? Last year I held a talk at Karlsruhe Institute of Technology’s GridKa Summer School titled GPUs: Platform, Programming, Pitfalls introducing GPUs and stuff.

As we all know, the topic is still interesting. Also, my evaluation was quite alright, so the organizers invited me this year again to introduce GPUs at 2017’s make science && run-titled school. In addition, they asked if I wanted to hold a tutorial on OpenACC programming. I was happy to oblige.

The talk, GPU Programming 101, is basically last year’s talk with some tweaks and updates (Volta, …). But instead of running through the pitfalls – which was hard to do in the time constraints anyway – I decided rather to present some alternative GPU programming models. That is fitting with respect to the OpenACC tutorial, but also gives me a chance to present something really current in GPU programming: Higher-level, portable models.1

In the afternoon I have about five hours to host an OpenACC introduction. I present a boiled-down version of our JSC OpenACC course. I need to create a lot of slides, since I’ve never held the OpenACC introduction part of the course myself. Since I’m at it, I refine the curriculum a bit. Also, I use slightly new programming examples.2 During train travels to Karlsruhe (time is short, as usual), I add a OpenACC↔OtherGPU interoperability section to the tutorial; just in case we are finishing early. Turns out, the participants in the course were all quite apt and we indeed went through the interoperability stuff.

Again, the evaluation of talk and tutorial was quite good. The talk got a 9/10; all four tutorial-feedbacks were 10/10. Nice! Makes the effort worth it.

I made some time-consuming LaTeX fanciness in the slides, which are going into a dedicated posting.

Materials:

• Slides of the GPU-introducing talk are on JuSER, here (handout version), or embedded further down.
• Slides of the OpenACC tutorial are on JuSER, here (handout version), or embedded below the GPU slides. It’s 110 slides, so beware.3 If you’re interested in the source code of the tasks, contact me.

### Slides of Tutorial

1. I still need a chance to properly test-run Kokkos, though.

2. I decide to go all-in in the method I introduced for our JSC courses for generating code. There, I have a master source code file from which the individual tasks are generated. This is done by appending the master source code with pre-processor macros which are resolved using the C partial pre-processor, cppp. When doing only one part of the tutorial, only a few tasks are generated from one master source code. But in a five-hour tutorial, where all tasks build up on each other, I ended up with eight tasks. Each accompanied by a solution version of the source code. So many #ifdefs. So many.

3. Which take like 10 minutes to typeset from scratch. Or 3 minutes for my faster computer…