diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-22 11:07:29 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-22 11:07:29 -0800 |
commit | d1ac1a2b14264e98c24db6f8c2bd452e695c7238 (patch) | |
tree | 9f9b745e3ba5347448139dc9ab71fc2083a083e4 /tools/arch | |
parent | 9d2f6060fe4c3b49d0cdc1dce1c99296f33379c8 (diff) | |
parent | 09e6f9f98370be9a9f8978139e0eb1be87d1125f (diff) |
Merge tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
"perf tools fixes and improvements:
- Don't stop building perf if python setuptools isn't installed, just
disable the affected perf feature.
- Remove explicit reference to python 2.x devel files, that warning
is about python-devel, no matter what version, being unavailable
and thus disabling the linking with libpython.
- Don't use -Werror=switch-enum when building the python support that
handles libtraceevent enumerations, as there is no good way to test
if some specific enum entry is available with the libtraceevent
installed on the system.
- Introduce 'perf lock contention' --type-filter and --lock-filter,
to filter by lock type and lock name:
$ sudo ./perf lock record -a -- ./perf bench sched messaging
$ sudo ./perf lock contention -E 5 -Y spinlock
contended total wait max wait avg wait type caller
802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62
13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14
12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27
114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5
83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e
$ sudo ./perf lock contention -l
contended total wait max wait avg wait address symbol
57 1.11 ms 42.83 us 19.54 us ffff9f4140059000
15 280.88 us 23.51 us 18.73 us ffffffff9d007a40 jiffies_lock
1 20.49 us 20.49 us 20.49 us ffffffff9d0d50c0 rcu_state
1 9.02 us 9.02 us 9.02 us ffff9f41759e9ba0
$ sudo ./perf lock contention -L jiffies_lock,rcu_state
contended total wait max wait avg wait type caller
15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93
1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb
$ sudo ./perf lock contention -L ffff9f4140059000
contended total wait max wait avg wait type caller
38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50
11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39
8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5
- Fix splitting CC into compiler and options when checking if a
option is present in clang to build the python binding, needed in
systems such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c".
- Refresh metris and events for Intel systems: alderlake.
alderlake-n, bonnell, broadwell, broadwellde, broadwellx,
cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
haswellx, icelake, icelakex, ivybridge, ivytown, jaketown,
knightslanding, meteorlake, nehalemep, nehalemex, sandybridge,
sapphirerapids, silvermont, skylake, skylakex, snowridgex,
tigerlake, westmereep-dp, westmereep-sp, westmereex.
- Add vendor events files (JSON) for AMD Zen 4, from sections
2.1.15.4 "Core Performance Monitor Counters", 2.1.15.5 "L3 Cache
Performance Monitor Counter"s and Section 7.1 "Fabric Performance
Monitor Counter (PMC) Events" in the Processor Programming
Reference (PPR) for AMD Family 19h Model 11h Revision B1
processors.
This constitutes events which capture op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB
activity, L3 cache activity and data bandwidth for various links
and interfaces in the Data Fabric.
- Also, from the same PPR are metrics taken from Section 2.1.15.2
"Performance Measurement", including pipeline utilization, which
are new to Zen 4 processors and useful for finding performance
bottlenecks by analyzing activity at different stages of the
pipeline.
- Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and
'srcfile' sort keys performance by postponing calling the external
addr2line utility to the collapse phase of histogram bucketing.
- Fix 'perf test' "all PMU test" to skip parametrized events, that
requires setting up and are not supported by this test.
- Update tools/ copies of kernel headers: features,
disabled-features, fscrypt.h, i915_drm.h, msr-index.h, power pc
syscall table and kvm.h.
- Add .DELETE_ON_ERROR special Makefile target to clean up partially
updated files on error.
- Simplify the mksyscalltbl script for arm64 by avoiding to run the
host compiler to create the syscall table, do it all just with the
shell script.
- Further fixes to honour quiet mode (-q)"
* tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits)
perf python: Fix splitting CC into compiler and options
perf scripting python: Don't be strict at handling libtraceevent enumerations
perf arm64: Simplify mksyscalltbl
perf build: Remove explicit reference to python 2.x devel files
perf vendor events amd: Add Zen 4 mapping
perf vendor events amd: Add Zen 4 metrics
perf vendor events amd: Add Zen 4 uncore events
perf vendor events amd: Add Zen 4 core events
perf vendor events intel: Refresh westmereex events
perf vendor events intel: Refresh westmereep-sp events
perf vendor events intel: Refresh westmereep-dp events
perf vendor events intel: Refresh tigerlake metrics and events
perf vendor events intel: Refresh snowridgex events
perf vendor events intel: Refresh skylakex metrics and events
perf vendor events intel: Refresh skylake metrics and events
perf vendor events intel: Refresh silvermont events
perf vendor events intel: Refresh sapphirerapids metrics and events
perf vendor events intel: Refresh sandybridge metrics and events
perf vendor events intel: Refresh nehalemex events
perf vendor events intel: Refresh nehalemep events
...
Diffstat (limited to 'tools/arch')
-rw-r--r-- | tools/arch/x86/include/asm/cpufeatures.h | 6 | ||||
-rw-r--r-- | tools/arch/x86/include/asm/disabled-features.h | 17 | ||||
-rw-r--r-- | tools/arch/x86/include/asm/msr-index.h | 24 |
3 files changed, 38 insertions, 9 deletions
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index b71f4f2ecdd5..61012476d66e 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -304,10 +304,16 @@ #define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */ #define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */ #define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */ +#define X86_FEATURE_SGX_EDECCSSA (11*32+18) /* "" SGX EDECCSSA user leaf function */ +#define X86_FEATURE_CALL_DEPTH (11*32+19) /* "" Call depth tracking for RSB stuffing */ +#define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ +#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* "" CMPccXADD instructions */ +#define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */ +#define X86_FEATURE_AVX_IFMA (12*32+23) /* "" Support for VPMADD52[H,L]UQ */ /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h index 33d2cd04d254..c44b56f7ffba 100644 --- a/tools/arch/x86/include/asm/disabled-features.h +++ b/tools/arch/x86/include/asm/disabled-features.h @@ -69,6 +69,12 @@ # define DISABLE_UNRET (1 << (X86_FEATURE_UNRET & 31)) #endif +#ifdef CONFIG_CALL_DEPTH_TRACKING +# define DISABLE_CALL_DEPTH_TRACKING 0 +#else +# define DISABLE_CALL_DEPTH_TRACKING (1 << (X86_FEATURE_CALL_DEPTH & 31)) +#endif + #ifdef CONFIG_INTEL_IOMMU_SVM # define DISABLE_ENQCMD 0 #else @@ -81,6 +87,12 @@ # define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31)) #endif +#ifdef CONFIG_XEN_PV +# define DISABLE_XENPV 0 +#else +# define DISABLE_XENPV (1 << (X86_FEATURE_XENPV & 31)) +#endif + #ifdef CONFIG_INTEL_TDX_GUEST # define DISABLE_TDX_GUEST 0 #else @@ -98,10 +110,11 @@ #define DISABLED_MASK5 0 #define DISABLED_MASK6 0 #define DISABLED_MASK7 (DISABLE_PTI) -#define DISABLED_MASK8 (DISABLE_TDX_GUEST) +#define DISABLED_MASK8 (DISABLE_XENPV|DISABLE_TDX_GUEST) #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 -#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET) +#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ + DISABLE_CALL_DEPTH_TRACKING) #define DISABLED_MASK12 0 #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index f17ade084720..37ff47552bcb 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -4,12 +4,7 @@ #include <linux/bits.h> -/* - * CPU model specific register (MSR) numbers. - * - * Do not add new entries to this file unless the definitions are shared - * between multiple compilation units. - */ +/* CPU model specific register (MSR) numbers. */ /* x86-64 specific MSRs */ #define MSR_EFER 0xc0000080 /* extended feature register */ @@ -537,7 +532,7 @@ #define MSR_AMD64_DC_CFG 0xc0011022 #define MSR_AMD64_DE_CFG 0xc0011029 -#define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT 1 +#define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT 1 #define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE BIT_ULL(MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT) #define MSR_AMD64_BU_CFG2 0xc001102a @@ -798,6 +793,7 @@ #define ENERGY_PERF_BIAS_PERFORMANCE 0 #define ENERGY_PERF_BIAS_BALANCE_PERFORMANCE 4 #define ENERGY_PERF_BIAS_NORMAL 6 +#define ENERGY_PERF_BIAS_NORMAL_POWERSAVE 7 #define ENERGY_PERF_BIAS_BALANCE_POWERSAVE 8 #define ENERGY_PERF_BIAS_POWERSAVE 15 @@ -1052,6 +1048,20 @@ #define VMX_BASIC_MEM_TYPE_WB 6LLU #define VMX_BASIC_INOUT 0x0040000000000000LLU +/* Resctrl MSRs: */ +/* - Intel: */ +#define MSR_IA32_L3_QOS_CFG 0xc81 +#define MSR_IA32_L2_QOS_CFG 0xc82 +#define MSR_IA32_QM_EVTSEL 0xc8d +#define MSR_IA32_QM_CTR 0xc8e +#define MSR_IA32_PQR_ASSOC 0xc8f +#define MSR_IA32_L3_CBM_BASE 0xc90 +#define MSR_IA32_L2_CBM_BASE 0xd10 +#define MSR_IA32_MBA_THRTL_BASE 0xd50 + +/* - AMD: */ +#define MSR_IA32_MBA_BW_BASE 0xc0000200 + /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14) #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29) |