MCE

Peter 2021-01-24 10:53:24 UTC Rich and Joel,

can’t say enough how much you made my day! The

amdgpu.ppfeaturemask=0xffffbffd

solved it after months of debugging.

For those coming here without having gone through the entire thread:

If ‘journalctl | grep -i “hardware err”’ returns errors like bea0000000000108 and microcode 8701021 or 8701013, and the BIOS is updated to the last version, the kernel is up to date and several passes of memtest86+ have run without errors, then if you have an AMD GPU the problem might be related to that

GPU

Many report success by booting their kernels with amdgpu.ppfeaturemask=0xffffbffd. If that does not help, try amdgpu.dpm=0. If that works, either keep it as is or remove it again and experiment with other less invasive ppfeaturemask settings discussed above. If none of this helps, the problem might be related to the

CPU

The first recommendation generally is to set in the BIOS “Cool’n Quiet” to Disabled. If that does not help also set “Power Idle Current” to Typical. This should already disable the problematic C states (checking and even disabling c6 can be done also with https://github.com/r4m0n/ZenStates-Linux). If none of this helps, then the recommendation is to also set “Global C State Control” to Disabled. The next step would be to also set “SMT” of the Overclocking settings to Disabled.

Peter

A

  • Power Supply Idle Control→Typical Current Idle
  • XMP 3000 MHz
  • PBO Disabled