diff options
Diffstat (limited to 'Documentation/dev-tools/autofdo.rst')
-rw-r--r-- | Documentation/dev-tools/autofdo.rst | 168 |
1 files changed, 168 insertions, 0 deletions
diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst new file mode 100644 index 000000000000..1f0a451e9ccd --- /dev/null +++ b/Documentation/dev-tools/autofdo.rst @@ -0,0 +1,168 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Using AutoFDO with the Linux kernel +=================================== + +This enables AutoFDO build support for the kernel when using +the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization) +is a type of profile-guided optimization (PGO) used to enhance the +performance of binary executables. It gathers information about the +frequency of execution of various code paths within a binary using +hardware sampling. This data is then used to guide the compiler's +optimization decisions, resulting in a more efficient binary. AutoFDO +is a powerful optimization technique, and data indicates that it can +significantly improve kernel performance. It's especially beneficial +for workloads affected by front-end stalls. + +For AutoFDO builds, unlike non-FDO builds, the user must supply a +profile. Acquiring an AutoFDO profile can be done in several ways. +AutoFDO profiles are created by converting hardware sampling using +the "perf" tool. It is crucial that the workload used to create these +perf files is representative; they must exhibit runtime +characteristics similar to the workloads that are intended to be +optimized. Failure to do so will result in the compiler optimizing +for the wrong objective. + +The AutoFDO profile often encapsulates the program's behavior. If the +performance-critical codes are architecture-independent, the profile +can be applied across platforms to achieve performance gains. For +instance, using the profile generated on Intel architecture to build +a kernel for AMD architecture can also yield performance improvements. + +There are two methods for acquiring a representative profile: +(1) Sample real workloads using a production environment. +(2) Generate the profile using a representative load test. +When enabling the AutoFDO build configuration without providing an +AutoFDO profile, the compiler only modifies the dwarf information in +the kernel without impacting runtime performance. It's advisable to +use a kernel binary built with the same AutoFDO configuration to +collect the perf profile. While it's possible to use a kernel built +with different options, it may result in inferior performance. + +One can collect profiles using AutoFDO build for the previous kernel. +AutoFDO employs relative line numbers to match the profiles, offering +some tolerance for source changes. This mode is commonly used in a +production environment for profile collection. + +In a profile collection based on a load test, the AutoFDO collection +process consists of the following steps: + +#. Initial build: The kernel is built with AutoFDO options + without a profile. + +#. Profiling: The above kernel is then run with a representative + workload to gather execution frequency data. This data is + collected using hardware sampling, via perf. AutoFDO is most + effective on platforms supporting advanced PMU features like + LBR on Intel machines. + +#. AutoFDO profile generation: Perf output file is converted to + the AutoFDO profile via offline tools. + +The support requires a Clang compiler LLVM 17 or later. + +Preparation +=========== + +Configure the kernel with:: + + CONFIG_AUTOFDO_CLANG=y + +Customization +============= + +The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for +AutoFDO builds. One can, however, enable or disable AutoFDO build for +individual files and directories by adding a line similar to the following +to the respective kernel Makefile: + +- For enabling a single file (e.g. foo.o) :: + + AUTOFDO_PROFILE_foo.o := y + +- For enabling all files in one directory :: + + AUTOFDO_PROFILE := y + +- For disabling one file :: + + AUTOFDO_PROFILE_foo.o := n + +- For disabling all files in one directory :: + + AUTOFDO_PROFILE := n + +Workflow +======== + +Here is an example workflow for AutoFDO kernel: + +1) Build the kernel on the host machine with LLVM enabled, + for example, :: + + $ make menuconfig LLVM=1 + + Turn on AutoFDO build config:: + + CONFIG_AUTOFDO_CLANG=y + + With a configuration that with LLVM enabled, use the following command:: + + $ scripts/config -e AUTOFDO_CLANG + + After getting the config, build with :: + + $ make LLVM=1 + +2) Install the kernel on the test machine. + +3) Run the load tests. The '-c' option in perf specifies the sample + event period. We suggest using a suitable prime number, like 500009, + for this purpose. + + - For Intel platforms:: + + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + + - For AMD platforms: + + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check, + + For Zen3:: + + $ cat proc/cpuinfo | grep " brs" + + For Zen4:: + + $ cat proc/cpuinfo | grep amd_lbr_v2 + + The following command generated the perf data file:: + + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + +4) (Optional) Download the raw perf file to the host machine. + +5) To generate an AutoFDO profile, two offline tools are available: + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part + of the AutoFDO project and can be found on GitHub + (https://github.com/google/autofdo), version v0.30.1 or later. + The llvm_profgen tool is included in the LLVM compiler itself. It's + important to note that the version of llvm_profgen doesn't need to match + the version of Clang. It needs to be the LLVM 19 release of Clang + or later, or just from the LLVM trunk. :: + + $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file> + + or :: + + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file> + + Note that multiple AutoFDO profile files can be merged into one via:: + + $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n> + +6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, + (Note CONFIG_AUTOFDO_CLANG needs to be enabled):: + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> |