Measuring Effects of Compiler Flags on Different Processor Architectures

Modern compilers provide a lot of possibilities to influence the generation of machine code. Among them are different optimization levels (-O0, -O1, -O2, -O3, -Os) and the possibility to use processor specific instruction set extensions (-march=native). The hope is to achieve an increase in performance by using these options. However, initial measurements show that this assumption is not correct at first glance:

march x264 xz
generic 15,8s 23,7s
native 16,2s 24,1s

In this thesis, these questions should be answered by measuring multiple programs on different processor architectures. An important part of the work will be to find meaningful benchmark methods for different performance criteria (runtime, cache misses, branch mispredictions) for Intel and ARM processors. For example, Intel offers the TSC (Time-Stamp Counter) in its processors, which can measure time with high accuracy and is often used for benchmarks, but is no longer fully suitable in times of CPUs that dynamically change their clock frequency. Intel offers alternatives, but they have other disadvantages.

The thesis should cover these aspects:

  1. Examine options of compilers (at least -march)
  2. Find useful measurement methods for x86, amd64 and ARM
  3. Evaluate found solutions in productive software

Further Information