• Clang's Optimization Levels

    Clang is a neat compiler. I like using it.

    For some manual optimization of a source code I was interested in the exact difference between the automatic optimization levels -O1 to -O3. What are they doing anyway?

    It turns out, this info is not so easy to come by.

    The official documentation of Clang specifies quite coarsely the different levels:

    -O2: Moderate level of optimization;
    -O1: Somewhere between -O0 and -O2


    Luckily there’s StackOverflow.

    In this answer by Antoine, the two lines needed to get the optimization passes are printed:

    llvm-as < /dev/null | opt -O1 -disable-output -debug-pass=Arguments
    echo 'int;' | clang -xc -O1 - -o /dev/null -\#\#\#

    The first line uses opt, which is the modular LLVM optimizer and analyzer, running on LLVM source files and, I reckon, being independent of the actual programming language. 1

    The second command prints the optimization passes which clang, the C/C++ driver of LLVM, puts on top of opt.

    This will not explain anything, but solely print the switches used. To understand what lies beneath each switch, LLVM has an explanatory website about the passes (opt --help will also print them, apparently). 2

    Luckily, Antoine has compiled the passes Clang uses in the above posting. (At least until Clang 3.8.)

    1. Although I can’t find -disable-output and -debug-pass in the list of options of opt’s help…

    2. For some of the options clang prints, the description is available through clang -cc1 --help, where cc1 is the frontend; find your’s through clang -\#\#\# -c file.c.

  • CUDA Course 2016: CUDA Tools

    Last week we had a CUDA course for the students of our guest student program. I held the session on CUDA Tools; that is, NVIDIA tools for programming, debugging, and profiling of GPU applications.

    Here are the slides, which are closely based on my colleague Jiri Kraus’ slides of the open-to-public CUDA course earlier this year.

    Download the slides here, or see them embedded below.

    → continue reading!
  • Reduce Filesize of PDF-embedded Bitmap Images with Ghostscript

    Ghostscript is a powerful tool for manipulating PDF and PS files. But with great power comes great complexity. Here are examples on embedding fonts and reducing image size with it!

    Embedding Fonts

    Usually, your PDF typesetting program takes care of embedding fonts into a PDF document (PDFLaTeX does); but sometimes you have strange sources of PDFs: My ROOT-generated plots for example do not embed their fonts1.

    In a blog post, Karl Rupp summarizes how to embed fonts into PDFs from different sources. To really embed ALL the fonts, also those usually ignored by Ghostscript, you have to dive in even deeper. Here is the command, which I found in a Stackoverflow reply:

    gs -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dCompressFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf -c ".setpdfwrite <</NeverEmbed [ ]>> setdistillerparams" -f input.pdf

    A quicker alternative to Ghostscript is the pdftocairo command of the poppler PDF library. The command enables conversion to different vector graphics formats2. But it can also convert from PDF to PDF, embedding the fonts in the process.

    pdftocairo input.pdf -pdf output.pdf

    Changing Image Quality

    For printing a document, you probably want to have it available in the best quality possible. For uploading it somewhere for sharing with your friends, file size might be more important than quality. Usually, in best vector fashion, the bulk of bits of a LaTeX-set document are taken by bitmap images (or bitmap-like raster images like JPG, PNG, …). Ghostscript offers a batch way to reduce the size of all embedded bitmap-like images.

    Everything revolves around the -dPDFSETTINGS=/ setting. It can take different values, e.g. screen from the command above (equivalent to 72 dpi images) to prepress (300 dpi). A one-liner to get all images of a document down to 150 dpi would be

    gs -sDEVICE=pdfwrite -dCompabilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

    Since I’m lazy and don’t want to memorize this, I made a small, encapsulating shell script a while ago to reduce the PDF’s size by means of image compression: reducePdfSize.sh.

    Using pdfimages -all on my thesis, which is in total 41 MB of size, results in extraction of about 21 MB images – half of the data in the PDF of my thesis is for bitmap images. Using the above Ghostscript command on thesis.pdf reduces the 41 MB to 15 MB, using the printer option3.
    Not bad, right?

    1. But then again, they use very basic fonts which should be available on any system.

    2. It’s probably also the easiest way to convert your PDF vector graphics to SVG.

    3. I don’t know what happens additionally to reduce the file size even beyond lossy image compression. If you know, tell me!

  • Talks with LaTeX Beamer, written in Markdown

    There are different ways out there to create slides for talks. One used a lot in academia is LaTeX Beamer. For the unknowing, in short, Beamer allows to generate PDF slides by relying on the comprehensive typesetting greatness of LaTeX.

    Compared to WYSIWYG tools like Powerpoint and Keynote, LaTeX Beamer has a high getting-started threshold of learning the keywords and peculiarities, which is inherent to all things LaTeX. This makes it unappealing for beginners, but also somewhat reduces productivity for experienced users.
    But fear not! There’s a well-working converter from the great markup language Markdown to LaTeX Beamer slides. And this is how it works.

    TL;DR: Pandoc can convert Markdown to PDF slides using LaTeX Beamer. It works out of the box, but can easily be extended. Apart from LaTeX Beamer, also HTML slideshows using reveal.js (and others) are possible from the same source file.

    → continue reading!
  • CUDA with LLVM and gpucc, Google's CUDA Compiler

    The SC Conference in Austin made me read up on compiler developments concerning CUDA. Two related things gained traction in the last couple of weeks. One is CUDA code compilation using LLVM, but having still the NVIDIA CUDA driver and runtime as a backend; the other is a full Open-Source nvcc replacement.

    CUDA with LLVM

    Since a few weeks, you can use LLVM / Clang to compile CUDA code. How it’s done is written in a document in the LLVM code repository (fix link, introduced with this commit). I haven’t tried it yet, but it looks quite straight-forward. There are still more optimizations in LLVM going on to better include CUDA.


    Apparently the same people from Google sewing CUDA into LLVM are also developing gpucc, an Open-Source CUDA compiler.

    Surely, the compiler is LLVM-based and from the last LLVM developers’ meeting comes also the only in-depth info on gpucc: A talk by Jingyue Wu (video, slides). I like the optimizations done by the compiler, which are also already included into the public LLVM part from above (the whitepapers for reference: »Straight-line Scalar Optimizations« and »Memory Space Inference for NVPTX Backend«, both by Wu)!

    It looks quite interesting. Their time line foresees a publication next year (»Q1 2016«).

    (Sidenote: AMD is working on a tool converting CUDA to a C++ programming model, which can then be translated to CUDA or AMD’s HCC compiler; it’s like CUDA support for AMD through a back door.)

  • No Firewall Warnings for OS X Apps with Self-Signed Certificates

    Ever got this annoying popup-window from OS X’ firewall asking you to allow incoming connections to some certain application?

    OS X Firewall Warning

    I’m currently fiddling around with MPI where constantly messages are being sent and OS X surely always prompts me to »allow« it.

    There’s a solution: Using your Keychain Access.app, create a self-signed certificate for the certain app, trust it »always«, and then sign the application with your freshly made certificate.

    Read how it’s done in this Stackexchange post.

    As an alternative from the same thread, you can use ad-hoc signing, e.g.

    sudo codesign --force --deep --sign - /path/to/application.app
  • NVIDIA Nsight Eclipse Edition as an OS X App

    Edit 2016-02-19: By chance, I found out, what I wrote below is not true. NVIDIA supplies a bundled App for OS X for Nsight (and also for the Visual Profiler, nvvp). They are located at /Developer/NVIDIA/CUDA-7.5/libnsight/nsight.app and libnvvp/nvvp.app. Just create aliases from there to your /Applications/ folder and you’re done. Easy!
    I leave the rest below for completeness.

    NVIDIA bundles a custom Eclipse IDE version in their CUDA Toolkit, the Nsight Eclipse Edition. A handy tool for local and remote GPU development.

    Usually, the program is started via command line invocation (»nsight«, resolved to /Developer/NVIDIA/CUDA-7.5/bin/nsight or the likes via your $PATH).

    To start it as a more proper OS X App, AppleScript can be used. Open Script Editor.app (in /Applications/Utilities/) and paste the following

    run application "/Developer/NVIDIA/CUDA-7.5/bin/nsight"

    modifying the path to the executable accordingly.

    Save the file as a program in /Applications/ (~/Applications/) and, voila, you can start it with Spotlight or Alfred.

    To change the icon of the app, select it in Finder, hit ⌘+i, select the icon on the upper left side and paste (⌘+v) an image from clipboard – e.g. a cutout from the logo on the official NVIDIA webpage for Nsight Eclipse edition.

  • GPU/CPU Comparison Schemes

    For my thesis I made drawings comparing conceptual differences of GPUs and CPUs. I didn’t like the ones which were floating through the interwebs, since most of them had bad quality or horrible colors.

    The chosen font is Myriad Pro, the color scheme is blue / purple / green.

    CPU Die Structure (Simplified)

    GPU Die Structure (Simplified)

    GPU die

    GPU Die Structure with Multiprocessors

    GPU structure


    A few schemes I’m not particularly proud of (and I did not use them any where). But, for completeness:

    Multi GPU Scheme

    Multi GPU

    Grid, Block, Thread

    Does not really work, since the virtual entities of Grid, Block, and Thread do not map 1:1 to physical entities on the GPU… It was a try…