Clang is a neat compiler. I like using it.
For some manual optimization of a source code I was interested in the exact difference between the automatic optimization levels
-O3. What are they doing anyway?
It turns out, this info is not so easy to come by.
The official documentation of Clang specifies quite coarsely the different levels:
-O2: Moderate level of optimization;
-O1: Somewhere between -O0 and -O2
Luckily there’s StackOverflow.
In this answer by Antoine, the two lines needed to get the optimization passes are printed:
llvm-as < /dev/null | opt -O1 -disable-output -debug-pass=Arguments echo 'int;' | clang -xc -O1 - -o /dev/null -\#\#\#
The second command prints the optimization passes which
clang, the C/C++ driver of LLVM, puts on top of
This will not explain anything, but solely print the switches used. To understand what lies beneath each switch, LLVM has an explanatory website about the passes (
opt --helpwill also print them, apparently). 2
Luckily, Antoine has compiled the passes Clang uses in the above posting. (At least until Clang 3.8.)
Last week we had a CUDA course for the students of our guest student program. I held the session on CUDA Tools; that is, NVIDIA tools for programming, debugging, and profiling of GPU applications.
Download the slides here, or see them embedded below.→ continue reading!
Ghostscript is a powerful tool for manipulating PDF and PS files. But with great power comes great complexity. Here are examples on embedding fonts and reducing image size with it!
Usually, your PDF typesetting program takes care of embedding fonts into a PDF document (PDFLaTeX does); but sometimes you have strange sources of PDFs: My ROOT-generated plots for example do not embed their fonts1.
In a blog post, Karl Rupp summarizes how to embed fonts into PDFs from different sources. To really embed ALL the fonts, also those usually ignored by Ghostscript, you have to dive in even deeper. Here is the command, which I found in a Stackoverflow reply:
gs -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dCompressFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf -c ".setpdfwrite <</NeverEmbed [ ]>> setdistillerparams" -f input.pdf
A quicker alternative to Ghostscript is the
pdftocairocommand of the poppler PDF library. The command enables conversion to different vector graphics formats2. But it can also convert from PDF to PDF, embedding the fonts in the process.
pdftocairo input.pdf -pdf output.pdf
Changing Image Quality
For printing a document, you probably want to have it available in the best quality possible. For uploading it somewhere for sharing with your friends, file size might be more important than quality. Usually, in best vector fashion, the bulk of bits of a LaTeX-set document are taken by bitmap images (or bitmap-like raster images like JPG, PNG, …). Ghostscript offers a batch way to reduce the size of all embedded bitmap-like images.
Everything revolves around the
-dPDFSETTINGS=/setting. It can take different values, e.g.
screenfrom the command above (equivalent to 72 dpi images) to
prepress(300 dpi). A one-liner to get all images of a document down to 150 dpi would be
gs -sDEVICE=pdfwrite -dCompabilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Since I’m lazy and don’t want to memorize this, I made a small, encapsulating shell script a while ago to reduce the PDF’s size by means of image compression:
pdfimages -allon my thesis, which is in total 41 MB of size, results in extraction of about 21 MB images – half of the data in the PDF of my thesis is for bitmap images. Using the above Ghostscript command on
thesis.pdfreduces the 41 MB to 15 MB, using the printer option3.
Not bad, right?
There are different ways out there to create slides for talks. One used a lot in academia is LaTeX Beamer. For the unknowing, in short, Beamer allows to generate PDF slides by relying on the comprehensive typesetting greatness of LaTeX.
Compared to WYSIWYG tools like Powerpoint and Keynote, LaTeX Beamer has a high getting-started threshold of learning the keywords and peculiarities, which is inherent to all things LaTeX. This makes it unappealing for beginners, but also somewhat reduces productivity for experienced users.
But fear not! There’s a well-working converter from the great markup language Markdown to LaTeX Beamer slides. And this is how it works.
TL;DR: Pandoc can convert Markdown to PDF slides using LaTeX Beamer. It works out of the box, but can easily be extended. Apart from LaTeX Beamer, also HTML slideshows using reveal.js (and others) are possible from the same source file.→ continue reading!
The SC Conference in Austin made me read up on compiler developments concerning CUDA. Two related things gained traction in the last couple of weeks. One is CUDA code compilation using LLVM, but having still the NVIDIA CUDA driver and runtime as a backend; the other is a full Open-Source
CUDA with LLVM
Since a few weeks, you can use LLVM / Clang to compile CUDA code. How it’s done is written in a document in the LLVM code repository (fix link, introduced with this commit). I haven’t tried it yet, but it looks quite straight-forward. There are still more optimizations in LLVM going on to better include CUDA.
Apparently the same people from Google sewing CUDA into LLVM are also developing
gpucc, an Open-Source CUDA compiler.
Surely, the compiler is LLVM-based and from the last LLVM developers’ meeting comes also the only in-depth info on
gpucc: A talk by Jingyue Wu (video, slides). I like the optimizations done by the compiler, which are also already included into the public LLVM part from above (the whitepapers for reference: »Straight-line Scalar Optimizations« and »Memory Space Inference for NVPTX Backend«, both by Wu)!
It looks quite interesting. Their time line foresees a publication next year (»Q1 2016«).
(Sidenote: AMD is working on a tool converting CUDA to a C++ programming model, which can then be translated to CUDA or AMD’s HCC compiler; it’s like CUDA support for AMD through a back door.)
Ever got this annoying popup-window from OS X’ firewall asking you to allow incoming connections to some certain application?
I’m currently fiddling around with MPI where constantly messages are being sent and OS X surely always prompts me to »allow« it.
There’s a solution: Using your
Keychain Access.app, create a self-signed certificate for the certain app, trust it »always«, and then sign the application with your freshly made certificate.
Read how it’s done in this Stackexchange post.
As an alternative from the same thread, you can use
ad-hoc signing, e.g.
sudo codesign --force --deep --sign - /path/to/application.app
Edit 2016-02-19: By chance, I found out, what I wrote below is not true. NVIDIA supplies a bundled App for OS X for Nsight (and also for the Visual Profiler,
nvvp). They are located at
libnvvp/nvvp.app. Just create aliases from there to your
/Applications/folder and you’re done. Easy!
I leave the rest below for completeness.
Usually, the program is started via command line invocation (»
nsight«, resolved to
/Developer/NVIDIA/CUDA-7.5/bin/nsightor the likes via your
To start it as a more proper OS X App, AppleScript can be used. Open
/Applications/Utilities/) and paste the following
run application "/Developer/NVIDIA/CUDA-7.5/bin/nsight"
modifying the path to the executable accordingly.
Save the file as a program in
~/Applications/) and, voila, you can start it with Spotlight or Alfred.
To change the icon of the app, select it in Finder, hit
⌘+i, select the icon on the upper left side and paste (
⌘+v) an image from clipboard – e.g. a cutout from the logo on the official NVIDIA webpage for Nsight Eclipse edition.
For my thesis I made drawings comparing conceptual differences of GPUs and CPUs. I didn’t like the ones which were floating through the interwebs, since most of them had bad quality or horrible colors.
The chosen font is Myriad Pro, the color scheme is blue / purple / green.
CPU Die Structure (Simplified)
GPU Die Structure (Simplified)
GPU Die Structure with Multiprocessors
A few schemes I’m not particularly proud of (and I did not use them any where). But, for completeness:
Multi GPU Scheme
Grid, Block, Thread
Does not really work, since the virtual entities of Grid, Block, and Thread do not map 1:1 to physical entities on the GPU… It was a try…
Previous Page: 1 of 2 Next