[v3] tpm: disable hwrng for fTPM on some AMD designs

Message ID 20230228024439.27156-1-mario.limonciello@amd.com
State New
Headers
Series [v3] tpm: disable hwrng for fTPM on some AMD designs |

Commit Message

Mario Limonciello Feb. 28, 2023, 2:44 a.m. UTC
  AMD has issued an advisory indicating that having fTPM enabled in
BIOS can cause "stuttering" in the OS.  This issue has been fixed
in newer versions of the fTPM firmware, but it's up to system
designers to decide whether to distribute it.

This issue has existed for a while, but is more prevalent starting
with kernel 6.1 because commit b006c439d58db ("hwrng: core - start
hwrng kthread also for untrusted sources") started to use the fTPM
for hwrng by default. However, all uses of /dev/hwrng result in
unacceptable stuttering.

So, simply disable registration of the defective hwrng when detecting
these faulty fTPM versions.  As this is caused by faulty firmware, it
is plausible that such a problem could also be reproduced by other TPM
interactions, but this hasn't been shown by any user's testing or reports.

It is hypothesized to be triggered more frequently by the use of the RNG
because userspace software will fetch random numbers regularly.

Intentionally continue to register other TPM functionality so that users
that rely upon PCR measurements or any storage of data will still have
access to it.  If it's found later that another TPM functionality is
exacerbating this problem a module parameter it can be turned off entirely
and a module parameter can be introduced to allow users who rely upon
fTPM functionality to turn it on even though this problem is present.

Link: https://www.amd.com/en/support/kb/faq/pa-410
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989
Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/
Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources")
Cc: stable@vger.kernel.org
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Tested-by: reach622@mailcuk.com
Tested-by: Bell <1138267643@qq.com>
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v2->v3:
 * Revert extra curl braces back to behavior in v1
 * Remove needless goto
 * Pick up 2 tested tags
---
 drivers/char/tpm/tpm-chip.c | 60 +++++++++++++++++++++++++++++-
 drivers/char/tpm/tpm.h      | 73 +++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+), 1 deletion(-)
  

Comments

Jarkko Sakkinen Feb. 28, 2023, 3:10 a.m. UTC | #1
On Mon, Feb 27, 2023 at 08:44:39PM -0600, Mario Limonciello wrote:
> AMD has issued an advisory indicating that having fTPM enabled in
> BIOS can cause "stuttering" in the OS.  This issue has been fixed
> in newer versions of the fTPM firmware, but it's up to system
> designers to decide whether to distribute it.
> 
> This issue has existed for a while, but is more prevalent starting
> with kernel 6.1 because commit b006c439d58db ("hwrng: core - start
> hwrng kthread also for untrusted sources") started to use the fTPM
> for hwrng by default. However, all uses of /dev/hwrng result in
> unacceptable stuttering.
> 
> So, simply disable registration of the defective hwrng when detecting
> these faulty fTPM versions.  As this is caused by faulty firmware, it
> is plausible that such a problem could also be reproduced by other TPM
> interactions, but this hasn't been shown by any user's testing or reports.
> 
> It is hypothesized to be triggered more frequently by the use of the RNG
> because userspace software will fetch random numbers regularly.
> 
> Intentionally continue to register other TPM functionality so that users
> that rely upon PCR measurements or any storage of data will still have
> access to it.  If it's found later that another TPM functionality is
> exacerbating this problem a module parameter it can be turned off entirely
> and a module parameter can be introduced to allow users who rely upon
> fTPM functionality to turn it on even though this problem is present.
> 
> Link: https://www.amd.com/en/support/kb/faq/pa-410
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989
> Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/
> Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources")
> Cc: stable@vger.kernel.org
> Cc: Jarkko Sakkinen <jarkko@kernel.org>
> Cc: Thorsten Leemhuis <regressions@leemhuis.info>
> Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
> Tested-by: reach622@mailcuk.com
> Tested-by: Bell <1138267643@qq.com>
> Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v2->v3:
>  * Revert extra curl braces back to behavior in v1
>  * Remove needless goto
>  * Pick up 2 tested tags
> ---
>  drivers/char/tpm/tpm-chip.c | 60 +++++++++++++++++++++++++++++-
>  drivers/char/tpm/tpm.h      | 73 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index 741d8f3e8fb3..c467eeae9973 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -512,6 +512,63 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip)
>  	return 0;
>  }
>  
> +/*
> + * Some AMD fTPM versions may cause stutter
> + * https://www.amd.com/en/support/kb/faq/pa-410
> + *
> + * Fixes are available in two series of fTPM firmware:
> + * 6.x.y.z series: 6.0.18.6 +
> + * 3.x.y.z series: 3.57.y.5 +
> + */
> +static bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
> +{
> +	u32 val1, val2;
> +	u64 version;
> +	int ret;
> +
> +	if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
> +		return false;
> +
> +	ret = tpm_request_locality(chip);
> +	if (ret)
> +		return false;
> +
> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL);
> +	if (ret)
> +		goto release;
> +	if (val1 != 0x414D4400U /* AMD */) {
> +		ret = -ENODEV;
> +		goto release;
> +	}
> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL);
> +	if (ret)
> +		goto release;
> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL);
> +
> +release:
> +	tpm_relinquish_locality(chip);
> +
> +	if (ret)
> +		return false;
> +
> +	version = ((u64)val1 << 32) | val2;
> +	if ((version >> 48) == 6) {
> +		if (version >= 0x0006000000180006ULL)
> +			return false;
> +	} else if ((version >> 48) == 3) {
> +		if (version >= 0x0003005700000005ULL)
> +			return false;
> +	} else {
> +		return false;
> +	}
> +
> +	dev_warn(&chip->dev,
> +		 "AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n",
> +		 version);
> +
> +	return true;
> +}
> +
>  static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
>  {
>  	struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng);
> @@ -521,7 +578,8 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
>  
>  static int tpm_add_hwrng(struct tpm_chip *chip)
>  {
> -	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip))
> +	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) ||
> +	    tpm_amd_is_rng_defective(chip))
>  		return 0;
>  
>  	snprintf(chip->hwrng_name, sizeof(chip->hwrng_name),
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 24ee4e1cc452..830014a26609 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -150,6 +150,79 @@ enum tpm_sub_capabilities {
>  	TPM_CAP_PROP_TIS_DURATION = 0x120,
>  };
>  
> +enum tpm2_pt_props {
> +	TPM2_PT_NONE = 0x00000000,
> +	TPM2_PT_GROUP = 0x00000100,
> +	TPM2_PT_FIXED = TPM2_PT_GROUP * 1,
> +	TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0,
> +	TPM2_PT_LEVEL = TPM2_PT_FIXED + 1,
> +	TPM2_PT_REVISION = TPM2_PT_FIXED + 2,
> +	TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3,
> +	TPM2_PT_YEAR = TPM2_PT_FIXED + 4,
> +	TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5,
> +	TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6,
> +	TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7,
> +	TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8,
> +	TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9,
> +	TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10,
> +	TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11,
> +	TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12,
> +	TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13,
> +	TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14,
> +	TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15,
> +	TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16,
> +	TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17,
> +	TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18,
> +	TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19,
> +	TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20,
> +	TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22,
> +	TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23,
> +	TPM2_PT_MEMORY = TPM2_PT_FIXED + 24,
> +	TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25,
> +	TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26,
> +	TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27,
> +	TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28,
> +	TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29,
> +	TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30,
> +	TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31,
> +	TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32,
> +	TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33,
> +	TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34,
> +	TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35,
> +	TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36,
> +	TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37,
> +	TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38,
> +	TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39,
> +	TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40,
> +	TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41,
> +	TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42,
> +	TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43,
> +	TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44,
> +	TPM2_PT_MODES = TPM2_PT_FIXED + 45,
> +	TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46,
> +	TPM2_PT_VAR = TPM2_PT_GROUP * 2,
> +	TPM2_PT_PERMANENT = TPM2_PT_VAR + 0,
> +	TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1,
> +	TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2,
> +	TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3,
> +	TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4,
> +	TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5,
> +	TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6,
> +	TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7,
> +	TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8,
> +	TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9,
> +	TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10,
> +	TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11,
> +	TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12,
> +	TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13,
> +	TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14,
> +	TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15,
> +	TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16,
> +	TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17,
> +	TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18,
> +	TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19,
> +	TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20,
> +};
>  
>  /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
>   * bytes, but 128 is still a relatively large number of random bytes and
> -- 
> 2.34.1
> 


Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko
  
Linux regression tracking (Thorsten Leemhuis) March 8, 2023, 9:42 a.m. UTC | #2
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Jarkko, thx for reviewing and picking below fix up. Are you planning to
send this to Linus anytime soon, now that the patch was a few days in
next? It would be good to get this 6.1 regression finally fixed, it
already took way longer then the time frame
Documentation/process/handling-regressions.rst outlines for a case like
this. But well, that's how it is sometimes...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

On 28.02.23 04:10, Jarkko Sakkinen wrote:
> On Mon, Feb 27, 2023 at 08:44:39PM -0600, Mario Limonciello wrote:
>> AMD has issued an advisory indicating that having fTPM enabled in
>> BIOS can cause "stuttering" in the OS.  This issue has been fixed
>> in newer versions of the fTPM firmware, but it's up to system
>> designers to decide whether to distribute it.
>>
>> This issue has existed for a while, but is more prevalent starting
>> with kernel 6.1 because commit b006c439d58db ("hwrng: core - start
>> hwrng kthread also for untrusted sources") started to use the fTPM
>> for hwrng by default. However, all uses of /dev/hwrng result in
>> unacceptable stuttering.
>>
>> So, simply disable registration of the defective hwrng when detecting
>> these faulty fTPM versions.  As this is caused by faulty firmware, it
>> is plausible that such a problem could also be reproduced by other TPM
>> interactions, but this hasn't been shown by any user's testing or reports.
>>
>> It is hypothesized to be triggered more frequently by the use of the RNG
>> because userspace software will fetch random numbers regularly.
>>
>> Intentionally continue to register other TPM functionality so that users
>> that rely upon PCR measurements or any storage of data will still have
>> access to it.  If it's found later that another TPM functionality is
>> exacerbating this problem a module parameter it can be turned off entirely
>> and a module parameter can be introduced to allow users who rely upon
>> fTPM functionality to turn it on even though this problem is present.
>>
>> Link: https://www.amd.com/en/support/kb/faq/pa-410
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989
>> Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/
>> Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources")
>> Cc: stable@vger.kernel.org
>> Cc: Jarkko Sakkinen <jarkko@kernel.org>
>> Cc: Thorsten Leemhuis <regressions@leemhuis.info>
>> Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
>> Tested-by: reach622@mailcuk.com
>> Tested-by: Bell <1138267643@qq.com>
>> Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
>> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>> v2->v3:
>>  * Revert extra curl braces back to behavior in v1
>>  * Remove needless goto
>>  * Pick up 2 tested tags
>> ---
>>  drivers/char/tpm/tpm-chip.c | 60 +++++++++++++++++++++++++++++-
>>  drivers/char/tpm/tpm.h      | 73 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 132 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>> index 741d8f3e8fb3..c467eeae9973 100644
>> --- a/drivers/char/tpm/tpm-chip.c
>> +++ b/drivers/char/tpm/tpm-chip.c
>> @@ -512,6 +512,63 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Some AMD fTPM versions may cause stutter
>> + * https://www.amd.com/en/support/kb/faq/pa-410
>> + *
>> + * Fixes are available in two series of fTPM firmware:
>> + * 6.x.y.z series: 6.0.18.6 +
>> + * 3.x.y.z series: 3.57.y.5 +
>> + */
>> +static bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
>> +{
>> +	u32 val1, val2;
>> +	u64 version;
>> +	int ret;
>> +
>> +	if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
>> +		return false;
>> +
>> +	ret = tpm_request_locality(chip);
>> +	if (ret)
>> +		return false;
>> +
>> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL);
>> +	if (ret)
>> +		goto release;
>> +	if (val1 != 0x414D4400U /* AMD */) {
>> +		ret = -ENODEV;
>> +		goto release;
>> +	}
>> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL);
>> +	if (ret)
>> +		goto release;
>> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL);
>> +
>> +release:
>> +	tpm_relinquish_locality(chip);
>> +
>> +	if (ret)
>> +		return false;
>> +
>> +	version = ((u64)val1 << 32) | val2;
>> +	if ((version >> 48) == 6) {
>> +		if (version >= 0x0006000000180006ULL)
>> +			return false;
>> +	} else if ((version >> 48) == 3) {
>> +		if (version >= 0x0003005700000005ULL)
>> +			return false;
>> +	} else {
>> +		return false;
>> +	}
>> +
>> +	dev_warn(&chip->dev,
>> +		 "AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n",
>> +		 version);
>> +
>> +	return true;
>> +}
>> +
>>  static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
>>  {
>>  	struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng);
>> @@ -521,7 +578,8 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
>>  
>>  static int tpm_add_hwrng(struct tpm_chip *chip)
>>  {
>> -	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip))
>> +	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) ||
>> +	    tpm_amd_is_rng_defective(chip))
>>  		return 0;
>>  
>>  	snprintf(chip->hwrng_name, sizeof(chip->hwrng_name),
>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>> index 24ee4e1cc452..830014a26609 100644
>> --- a/drivers/char/tpm/tpm.h
>> +++ b/drivers/char/tpm/tpm.h
>> @@ -150,6 +150,79 @@ enum tpm_sub_capabilities {
>>  	TPM_CAP_PROP_TIS_DURATION = 0x120,
>>  };
>>  
>> +enum tpm2_pt_props {
>> +	TPM2_PT_NONE = 0x00000000,
>> +	TPM2_PT_GROUP = 0x00000100,
>> +	TPM2_PT_FIXED = TPM2_PT_GROUP * 1,
>> +	TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0,
>> +	TPM2_PT_LEVEL = TPM2_PT_FIXED + 1,
>> +	TPM2_PT_REVISION = TPM2_PT_FIXED + 2,
>> +	TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3,
>> +	TPM2_PT_YEAR = TPM2_PT_FIXED + 4,
>> +	TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5,
>> +	TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6,
>> +	TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7,
>> +	TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8,
>> +	TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9,
>> +	TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10,
>> +	TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11,
>> +	TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12,
>> +	TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13,
>> +	TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14,
>> +	TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15,
>> +	TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16,
>> +	TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17,
>> +	TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18,
>> +	TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19,
>> +	TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20,
>> +	TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22,
>> +	TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23,
>> +	TPM2_PT_MEMORY = TPM2_PT_FIXED + 24,
>> +	TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25,
>> +	TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26,
>> +	TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27,
>> +	TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28,
>> +	TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29,
>> +	TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30,
>> +	TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31,
>> +	TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32,
>> +	TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33,
>> +	TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34,
>> +	TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35,
>> +	TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36,
>> +	TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37,
>> +	TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38,
>> +	TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39,
>> +	TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40,
>> +	TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41,
>> +	TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42,
>> +	TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43,
>> +	TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44,
>> +	TPM2_PT_MODES = TPM2_PT_FIXED + 45,
>> +	TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46,
>> +	TPM2_PT_VAR = TPM2_PT_GROUP * 2,
>> +	TPM2_PT_PERMANENT = TPM2_PT_VAR + 0,
>> +	TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1,
>> +	TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2,
>> +	TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3,
>> +	TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4,
>> +	TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5,
>> +	TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6,
>> +	TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7,
>> +	TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8,
>> +	TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9,
>> +	TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10,
>> +	TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11,
>> +	TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12,
>> +	TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13,
>> +	TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14,
>> +	TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15,
>> +	TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16,
>> +	TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17,
>> +	TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18,
>> +	TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19,
>> +	TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20,
>> +};
>>  
>>  /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
>>   * bytes, but 128 is still a relatively large number of random bytes and
>> -- 
>> 2.34.1
>>
> 
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> BR, Jarkko
  
Linux regression tracking (Thorsten Leemhuis) March 10, 2023, 5:43 p.m. UTC | #3
[adding Linux to the list of recipients]

On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> Jarkko, thx for reviewing and picking below fix up. Are you planning to
> send this to Linus anytime soon, now that the patch was a few days in
> next? It would be good to get this 6.1 regression finally fixed, it
> already took way longer then the time frame
> Documentation/process/handling-regressions.rst outlines for a case like
> this. But well, that's how it is sometimes...

Linus, would you consider picking this fix up directly from here or from
linux-next (8699d5244e37)? It's been in the latter for 9 days now
afaics. And the issue seems to bug more than just one or two users, so
it IMHO would be good to get this finally resolved.

Jarkko didn't reply to my inquiry, guess something else keeps him busy.

Ciao, Thorsten

> On 28.02.23 04:10, Jarkko Sakkinen wrote:
>> On Mon, Feb 27, 2023 at 08:44:39PM -0600, Mario Limonciello wrote:
>>> AMD has issued an advisory indicating that having fTPM enabled in
>>> BIOS can cause "stuttering" in the OS.  This issue has been fixed
>>> in newer versions of the fTPM firmware, but it's up to system
>>> designers to decide whether to distribute it.
>>>
>>> This issue has existed for a while, but is more prevalent starting
>>> with kernel 6.1 because commit b006c439d58db ("hwrng: core - start
>>> hwrng kthread also for untrusted sources") started to use the fTPM
>>> for hwrng by default. However, all uses of /dev/hwrng result in
>>> unacceptable stuttering.
>>>
>>> So, simply disable registration of the defective hwrng when detecting
>>> these faulty fTPM versions.  As this is caused by faulty firmware, it
>>> is plausible that such a problem could also be reproduced by other TPM
>>> interactions, but this hasn't been shown by any user's testing or reports.
>>>
>>> It is hypothesized to be triggered more frequently by the use of the RNG
>>> because userspace software will fetch random numbers regularly.
>>>
>>> Intentionally continue to register other TPM functionality so that users
>>> that rely upon PCR measurements or any storage of data will still have
>>> access to it.  If it's found later that another TPM functionality is
>>> exacerbating this problem a module parameter it can be turned off entirely
>>> and a module parameter can be introduced to allow users who rely upon
>>> fTPM functionality to turn it on even though this problem is present.
>>>
>>> Link: https://www.amd.com/en/support/kb/faq/pa-410
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989
>>> Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/
>>> Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources")
>>> Cc: stable@vger.kernel.org
>>> Cc: Jarkko Sakkinen <jarkko@kernel.org>
>>> Cc: Thorsten Leemhuis <regressions@leemhuis.info>
>>> Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
>>> Tested-by: reach622@mailcuk.com
>>> Tested-by: Bell <1138267643@qq.com>
>>> Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
>>> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>> v2->v3:
>>>  * Revert extra curl braces back to behavior in v1
>>>  * Remove needless goto
>>>  * Pick up 2 tested tags
>>> ---
>>>  drivers/char/tpm/tpm-chip.c | 60 +++++++++++++++++++++++++++++-
>>>  drivers/char/tpm/tpm.h      | 73 +++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 132 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>>> index 741d8f3e8fb3..c467eeae9973 100644
>>> --- a/drivers/char/tpm/tpm-chip.c
>>> +++ b/drivers/char/tpm/tpm-chip.c
>>> @@ -512,6 +512,63 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip)
>>>  	return 0;
>>>  }
>>>  
>>> +/*
>>> + * Some AMD fTPM versions may cause stutter
>>> + * https://www.amd.com/en/support/kb/faq/pa-410
>>> + *
>>> + * Fixes are available in two series of fTPM firmware:
>>> + * 6.x.y.z series: 6.0.18.6 +
>>> + * 3.x.y.z series: 3.57.y.5 +
>>> + */
>>> +static bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
>>> +{
>>> +	u32 val1, val2;
>>> +	u64 version;
>>> +	int ret;
>>> +
>>> +	if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
>>> +		return false;
>>> +
>>> +	ret = tpm_request_locality(chip);
>>> +	if (ret)
>>> +		return false;
>>> +
>>> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL);
>>> +	if (ret)
>>> +		goto release;
>>> +	if (val1 != 0x414D4400U /* AMD */) {
>>> +		ret = -ENODEV;
>>> +		goto release;
>>> +	}
>>> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL);
>>> +	if (ret)
>>> +		goto release;
>>> +	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL);
>>> +
>>> +release:
>>> +	tpm_relinquish_locality(chip);
>>> +
>>> +	if (ret)
>>> +		return false;
>>> +
>>> +	version = ((u64)val1 << 32) | val2;
>>> +	if ((version >> 48) == 6) {
>>> +		if (version >= 0x0006000000180006ULL)
>>> +			return false;
>>> +	} else if ((version >> 48) == 3) {
>>> +		if (version >= 0x0003005700000005ULL)
>>> +			return false;
>>> +	} else {
>>> +		return false;
>>> +	}
>>> +
>>> +	dev_warn(&chip->dev,
>>> +		 "AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n",
>>> +		 version);
>>> +
>>> +	return true;
>>> +}
>>> +
>>>  static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
>>>  {
>>>  	struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng);
>>> @@ -521,7 +578,8 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
>>>  
>>>  static int tpm_add_hwrng(struct tpm_chip *chip)
>>>  {
>>> -	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip))
>>> +	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) ||
>>> +	    tpm_amd_is_rng_defective(chip))
>>>  		return 0;
>>>  
>>>  	snprintf(chip->hwrng_name, sizeof(chip->hwrng_name),
>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>> index 24ee4e1cc452..830014a26609 100644
>>> --- a/drivers/char/tpm/tpm.h
>>> +++ b/drivers/char/tpm/tpm.h
>>> @@ -150,6 +150,79 @@ enum tpm_sub_capabilities {
>>>  	TPM_CAP_PROP_TIS_DURATION = 0x120,
>>>  };
>>>  
>>> +enum tpm2_pt_props {
>>> +	TPM2_PT_NONE = 0x00000000,
>>> +	TPM2_PT_GROUP = 0x00000100,
>>> +	TPM2_PT_FIXED = TPM2_PT_GROUP * 1,
>>> +	TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0,
>>> +	TPM2_PT_LEVEL = TPM2_PT_FIXED + 1,
>>> +	TPM2_PT_REVISION = TPM2_PT_FIXED + 2,
>>> +	TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3,
>>> +	TPM2_PT_YEAR = TPM2_PT_FIXED + 4,
>>> +	TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5,
>>> +	TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6,
>>> +	TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7,
>>> +	TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8,
>>> +	TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9,
>>> +	TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10,
>>> +	TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11,
>>> +	TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12,
>>> +	TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13,
>>> +	TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14,
>>> +	TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15,
>>> +	TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16,
>>> +	TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17,
>>> +	TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18,
>>> +	TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19,
>>> +	TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20,
>>> +	TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22,
>>> +	TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23,
>>> +	TPM2_PT_MEMORY = TPM2_PT_FIXED + 24,
>>> +	TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25,
>>> +	TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26,
>>> +	TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27,
>>> +	TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28,
>>> +	TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29,
>>> +	TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30,
>>> +	TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31,
>>> +	TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32,
>>> +	TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33,
>>> +	TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34,
>>> +	TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35,
>>> +	TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36,
>>> +	TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37,
>>> +	TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38,
>>> +	TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39,
>>> +	TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40,
>>> +	TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41,
>>> +	TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42,
>>> +	TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43,
>>> +	TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44,
>>> +	TPM2_PT_MODES = TPM2_PT_FIXED + 45,
>>> +	TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46,
>>> +	TPM2_PT_VAR = TPM2_PT_GROUP * 2,
>>> +	TPM2_PT_PERMANENT = TPM2_PT_VAR + 0,
>>> +	TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1,
>>> +	TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2,
>>> +	TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3,
>>> +	TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4,
>>> +	TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5,
>>> +	TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6,
>>> +	TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7,
>>> +	TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8,
>>> +	TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9,
>>> +	TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10,
>>> +	TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11,
>>> +	TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12,
>>> +	TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13,
>>> +	TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14,
>>> +	TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15,
>>> +	TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16,
>>> +	TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17,
>>> +	TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18,
>>> +	TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19,
>>> +	TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20,
>>> +};
>>>  
>>>  /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
>>>   * bytes, but 128 is still a relatively large number of random bytes and
>>> -- 
>>> 2.34.1
>>>
>>
>>
>> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
>>
>> BR, Jarkko
  
Jarkko Sakkinen March 12, 2023, 1:35 a.m. UTC | #4
On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
> [adding Linux to the list of recipients]
> 
> On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
> > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > for once, to make this easily accessible to everyone.
> > 
> > Jarkko, thx for reviewing and picking below fix up. Are you planning to
> > send this to Linus anytime soon, now that the patch was a few days in
> > next? It would be good to get this 6.1 regression finally fixed, it
> > already took way longer then the time frame
> > Documentation/process/handling-regressions.rst outlines for a case like
> > this. But well, that's how it is sometimes...
> 
> Linus, would you consider picking this fix up directly from here or from
> linux-next (8699d5244e37)? It's been in the latter for 9 days now
> afaics. And the issue seems to bug more than just one or two users, so
> it IMHO would be good to get this finally resolved.
> 
> Jarkko didn't reply to my inquiry, guess something else keeps him busy.

That's a bit arrogant. You emailed only 4 days ago.

I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.

BR, Jarkko
  
Jarkko Sakkinen March 12, 2023, 1:42 a.m. UTC | #5
On Sun, Mar 12, 2023 at 03:35:08AM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
> > [adding Linux to the list of recipients]
> > 
> > On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > > for once, to make this easily accessible to everyone.
> > > 
> > > Jarkko, thx for reviewing and picking below fix up. Are you planning to
> > > send this to Linus anytime soon, now that the patch was a few days in
> > > next? It would be good to get this 6.1 regression finally fixed, it
> > > already took way longer then the time frame
> > > Documentation/process/handling-regressions.rst outlines for a case like
> > > this. But well, that's how it is sometimes...
> > 
> > Linus, would you consider picking this fix up directly from here or from
> > linux-next (8699d5244e37)? It's been in the latter for 9 days now
> > afaics. And the issue seems to bug more than just one or two users, so
> > it IMHO would be good to get this finally resolved.
> > 
> > Jarkko didn't reply to my inquiry, guess something else keeps him busy.
> 
> That's a bit arrogant. You emailed only 4 days ago.
> 
> I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.

If this is about slow response with kernel bugzilla: it is not *enforced*
part of the process. If it was, I would use it. Since it isn't, I don't
really want to add any extra weight to my workflow.

It's not only extra time but also it is not documented how exactly and in
detail you would use it. For email we have all that documented. And when
you don't have guidelines, then it is too flakky to use properly.

BR, Jarkko
  
Jason A. Donenfeld March 12, 2023, 1:49 a.m. UTC | #6
On 3/12/23, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> On Sun, Mar 12, 2023 at 03:35:08AM +0200, Jarkko Sakkinen wrote:
>> On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
>> > [adding Linux to the list of recipients]
>> >
>> > On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
>> > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> > > for once, to make this easily accessible to everyone.
>> > >
>> > > Jarkko, thx for reviewing and picking below fix up. Are you planning
>> > > to
>> > > send this to Linus anytime soon, now that the patch was a few days in
>> > > next? It would be good to get this 6.1 regression finally fixed, it
>> > > already took way longer then the time frame
>> > > Documentation/process/handling-regressions.rst outlines for a case
>> > > like
>> > > this. But well, that's how it is sometimes...
>> >
>> > Linus, would you consider picking this fix up directly from here or
>> > from
>> > linux-next (8699d5244e37)? It's been in the latter for 9 days now
>> > afaics. And the issue seems to bug more than just one or two users, so
>> > it IMHO would be good to get this finally resolved.
>> >
>> > Jarkko didn't reply to my inquiry, guess something else keeps him busy.
>>
>> That's a bit arrogant. You emailed only 4 days ago.
>>
>> I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.
>
> If this is about slow response with kernel bugzilla: it is not *enforced*
> part of the process. If it was, I would use it. Since it isn't, I don't
> really want to add any extra weight to my workflow.
>
> It's not only extra time but also it is not documented how exactly and in
> detail you would use it. For email we have all that documented. And when
> you don't have guidelines, then it is too flakky to use properly.

No interest in wading into a process argument. But if you're able to
send this for rc3, please please do so. Users keep getting hit by
this, some email me directly, and I keep replying saying the fix
should be released any day now. So let's make that happen.
  
Jarkko Sakkinen March 12, 2023, 1:55 a.m. UTC | #7
On Sun, Mar 12, 2023 at 02:49:17AM +0100, Jason A. Donenfeld wrote:
> On 3/12/23, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > On Sun, Mar 12, 2023 at 03:35:08AM +0200, Jarkko Sakkinen wrote:
> >> On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
> >> > [adding Linux to the list of recipients]
> >> >
> >> > On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> >> > > for once, to make this easily accessible to everyone.
> >> > >
> >> > > Jarkko, thx for reviewing and picking below fix up. Are you planning
> >> > > to
> >> > > send this to Linus anytime soon, now that the patch was a few days in
> >> > > next? It would be good to get this 6.1 regression finally fixed, it
> >> > > already took way longer then the time frame
> >> > > Documentation/process/handling-regressions.rst outlines for a case
> >> > > like
> >> > > this. But well, that's how it is sometimes...
> >> >
> >> > Linus, would you consider picking this fix up directly from here or
> >> > from
> >> > linux-next (8699d5244e37)? It's been in the latter for 9 days now
> >> > afaics. And the issue seems to bug more than just one or two users, so
> >> > it IMHO would be good to get this finally resolved.
> >> >
> >> > Jarkko didn't reply to my inquiry, guess something else keeps him busy.
> >>
> >> That's a bit arrogant. You emailed only 4 days ago.
> >>
> >> I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.
> >
> > If this is about slow response with kernel bugzilla: it is not *enforced*
> > part of the process. If it was, I would use it. Since it isn't, I don't
> > really want to add any extra weight to my workflow.
> >
> > It's not only extra time but also it is not documented how exactly and in
> > detail you would use it. For email we have all that documented. And when
> > you don't have guidelines, then it is too flakky to use properly.
> 
> No interest in wading into a process argument. But if you're able to
> send this for rc3, please please do so. Users keep getting hit by
> this, some email me directly, and I keep replying saying the fix
> should be released any day now. So let's make that happen.

Sure, that shouldn't be a problem. I'll queue this for rc3.

BR, Jarkko
  
Jarkko Sakkinen March 12, 2023, 1:57 a.m. UTC | #8
On Sun, Mar 12, 2023 at 03:55:03AM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 12, 2023 at 02:49:17AM +0100, Jason A. Donenfeld wrote:
> > On 3/12/23, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > On Sun, Mar 12, 2023 at 03:35:08AM +0200, Jarkko Sakkinen wrote:
> > >> On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
> > >> > [adding Linux to the list of recipients]
> > >> >
> > >> > On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
> > >> > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > >> > > for once, to make this easily accessible to everyone.
> > >> > >
> > >> > > Jarkko, thx for reviewing and picking below fix up. Are you planning
> > >> > > to
> > >> > > send this to Linus anytime soon, now that the patch was a few days in
> > >> > > next? It would be good to get this 6.1 regression finally fixed, it
> > >> > > already took way longer then the time frame
> > >> > > Documentation/process/handling-regressions.rst outlines for a case
> > >> > > like
> > >> > > this. But well, that's how it is sometimes...
> > >> >
> > >> > Linus, would you consider picking this fix up directly from here or
> > >> > from
> > >> > linux-next (8699d5244e37)? It's been in the latter for 9 days now
> > >> > afaics. And the issue seems to bug more than just one or two users, so
> > >> > it IMHO would be good to get this finally resolved.
> > >> >
> > >> > Jarkko didn't reply to my inquiry, guess something else keeps him busy.
> > >>
> > >> That's a bit arrogant. You emailed only 4 days ago.
> > >>
> > >> I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.
> > >
> > > If this is about slow response with kernel bugzilla: it is not *enforced*
> > > part of the process. If it was, I would use it. Since it isn't, I don't
> > > really want to add any extra weight to my workflow.
> > >
> > > It's not only extra time but also it is not documented how exactly and in
> > > detail you would use it. For email we have all that documented. And when
> > > you don't have guidelines, then it is too flakky to use properly.
> > 
> > No interest in wading into a process argument. But if you're able to
> > send this for rc3, please please do so. Users keep getting hit by
> > this, some email me directly, and I keep replying saying the fix
> > should be released any day now. So let's make that happen.
> 
> Sure, that shouldn't be a problem. I'll queue this for rc3.

Considering "the process argument": I'm just saying that we have user
facing service that is not properly documented to the maintainers, that's
all.

BR, Jarkko
  
Linux regression tracking (Thorsten Leemhuis) March 12, 2023, 10:35 a.m. UTC | #9
On 12.03.23 02:35, Jarkko Sakkinen wrote:
> On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
>> [adding Linux to the list of recipients]
>>
>> On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>> for once, to make this easily accessible to everyone.
>>>
>>> Jarkko, thx for reviewing and picking below fix up. Are you planning to
>>> send this to Linus anytime soon, now that the patch was a few days in
>>> next? It would be good to get this 6.1 regression finally fixed, it
>>> already took way longer then the time frame
>>> Documentation/process/handling-regressions.rst outlines for a case like
>>> this. But well, that's how it is sometimes...
>>
>> Linus, would you consider picking this fix up directly from here or from
>> linux-next (8699d5244e37)? It's been in the latter for 9 days now
>> afaics. And the issue seems to bug more than just one or two users, so
>> it IMHO would be good to get this finally resolved.
>>
>> Jarkko didn't reply to my inquiry, guess something else keeps him busy.
> 
> That's a bit arrogant. You emailed only 4 days ago.

My deepest apologies if that "guess something else keeps him busy"
triggered your response, what I wanted to say is "I don't consider the
lack of a response a problem, that's how it is for all of us sometimes".
Sorry, that might not have been the best way to express that.

If my prodding itself was the cause: well, I think that's what I should
do in this case. That stance developed quickly when I started doing
regression tracking, as I noticed one thing:

Image a regression caused by a commit merged for 5.11-rc1 is reported a
day or two after 5.11-rc7 is released. Imagine further a fix is posted
for review two or three days after 5.11-rc8 is out. From what I noticed
quite a few of those fixes (not all of course) make it to mainline in
time for the release of 5.11. But the picture looked totally different
when the fix was posted for review shortly *after* 5.11 was out, as I
noticed quite a few of those were only mainlined 9 or 10 weeks later for
5.13-rc1 (and only then can be backported to 5.11.y and 5.12.y).

[above was just a hypothetical example with the worst timing to
illustrate the core problem, the timelines are different in case of the
fTPM issue]

From my understanding of things that's not how it should be (unless
there are strong reasons in the individual case). That's why I'm working
against that. Still working on optimizing when/how I ask, as I'm not yet
happy with how I do that.

Don't worry, I use my best judgment in that process; if the fix is
complex and the next merge window is near, I might let it slip – OTOH if
it's something that apparently bugs quite a few people, I prod
developers and maintainers more quickly & often, like I did in this case.

In the end situations like the one outlined above lead me to writing the
section "Prioritize work on fixing regressions" in
Documentation/process/handling-regressions.rst (
https://docs.kernel.org/process/handling-regressions.html ). Greg acked
it; Linus never commented on it, not sure if he looked at it when he
merged that. But I have no idea how developers actually have seen it
and/or take it seriously. But from what I saw it already helped somewhat.

> I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.

From later in this thread I see that you plan to do that now, thus:

many thx!

Ciao, Thorsten
  
Linux regression tracking (Thorsten Leemhuis) March 12, 2023, 10:40 a.m. UTC | #10
On 12.03.23 02:42, Jarkko Sakkinen wrote:
> On Sun, Mar 12, 2023 at 03:35:08AM +0200, Jarkko Sakkinen wrote:
>> On Fri, Mar 10, 2023 at 06:43:47PM +0100, Thorsten Leemhuis wrote:
>>> [adding Linux to the list of recipients]
>>>
>>> On 08.03.23 10:42, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>>> for once, to make this easily accessible to everyone.
>>>>
>>>> Jarkko, thx for reviewing and picking below fix up. Are you planning to
>>>> send this to Linus anytime soon, now that the patch was a few days in
>>>> next? It would be good to get this 6.1 regression finally fixed, it
>>>> already took way longer then the time frame
>>>> Documentation/process/handling-regressions.rst outlines for a case like
>>>> this. But well, that's how it is sometimes...
>>>
>>> Linus, would you consider picking this fix up directly from here or from
>>> linux-next (8699d5244e37)? It's been in the latter for 9 days now
>>> afaics. And the issue seems to bug more than just one or two users, so
>>> it IMHO would be good to get this finally resolved.
>>>
>>> Jarkko didn't reply to my inquiry, guess something else keeps him busy.
>>
>> That's a bit arrogant. You emailed only 4 days ago.
>>
>> I'm open to do PR for rc3 with the fix, if it cannot wait to v6.4 pr.
> 
> If this is about slow response with kernel bugzilla: [...]

Not at all, I don't care if developers use bugzilla or ignore it, as
long as the regression itself it dealt with.

Fun fact: I actually wanted to get rid of bugzilla for developers/
subsystems that didn't opt-in, but then another plan came up. See
https://lwn.net/Articles/910740/

Ciao, Thorsten
  

Patch

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 741d8f3e8fb3..c467eeae9973 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -512,6 +512,63 @@  static int tpm_add_legacy_sysfs(struct tpm_chip *chip)
 	return 0;
 }
 
+/*
+ * Some AMD fTPM versions may cause stutter
+ * https://www.amd.com/en/support/kb/faq/pa-410
+ *
+ * Fixes are available in two series of fTPM firmware:
+ * 6.x.y.z series: 6.0.18.6 +
+ * 3.x.y.z series: 3.57.y.5 +
+ */
+static bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
+{
+	u32 val1, val2;
+	u64 version;
+	int ret;
+
+	if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
+		return false;
+
+	ret = tpm_request_locality(chip);
+	if (ret)
+		return false;
+
+	ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL);
+	if (ret)
+		goto release;
+	if (val1 != 0x414D4400U /* AMD */) {
+		ret = -ENODEV;
+		goto release;
+	}
+	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL);
+	if (ret)
+		goto release;
+	ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL);
+
+release:
+	tpm_relinquish_locality(chip);
+
+	if (ret)
+		return false;
+
+	version = ((u64)val1 << 32) | val2;
+	if ((version >> 48) == 6) {
+		if (version >= 0x0006000000180006ULL)
+			return false;
+	} else if ((version >> 48) == 3) {
+		if (version >= 0x0003005700000005ULL)
+			return false;
+	} else {
+		return false;
+	}
+
+	dev_warn(&chip->dev,
+		 "AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n",
+		 version);
+
+	return true;
+}
+
 static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
 {
 	struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng);
@@ -521,7 +578,8 @@  static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
 
 static int tpm_add_hwrng(struct tpm_chip *chip)
 {
-	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip))
+	if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) ||
+	    tpm_amd_is_rng_defective(chip))
 		return 0;
 
 	snprintf(chip->hwrng_name, sizeof(chip->hwrng_name),
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 24ee4e1cc452..830014a26609 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -150,6 +150,79 @@  enum tpm_sub_capabilities {
 	TPM_CAP_PROP_TIS_DURATION = 0x120,
 };
 
+enum tpm2_pt_props {
+	TPM2_PT_NONE = 0x00000000,
+	TPM2_PT_GROUP = 0x00000100,
+	TPM2_PT_FIXED = TPM2_PT_GROUP * 1,
+	TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0,
+	TPM2_PT_LEVEL = TPM2_PT_FIXED + 1,
+	TPM2_PT_REVISION = TPM2_PT_FIXED + 2,
+	TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3,
+	TPM2_PT_YEAR = TPM2_PT_FIXED + 4,
+	TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5,
+	TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6,
+	TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7,
+	TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8,
+	TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9,
+	TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10,
+	TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11,
+	TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12,
+	TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13,
+	TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14,
+	TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15,
+	TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16,
+	TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17,
+	TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18,
+	TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19,
+	TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20,
+	TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22,
+	TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23,
+	TPM2_PT_MEMORY = TPM2_PT_FIXED + 24,
+	TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25,
+	TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26,
+	TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27,
+	TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28,
+	TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29,
+	TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30,
+	TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31,
+	TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32,
+	TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33,
+	TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34,
+	TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35,
+	TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36,
+	TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37,
+	TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38,
+	TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39,
+	TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40,
+	TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41,
+	TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42,
+	TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43,
+	TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44,
+	TPM2_PT_MODES = TPM2_PT_FIXED + 45,
+	TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46,
+	TPM2_PT_VAR = TPM2_PT_GROUP * 2,
+	TPM2_PT_PERMANENT = TPM2_PT_VAR + 0,
+	TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1,
+	TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2,
+	TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3,
+	TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4,
+	TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5,
+	TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6,
+	TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7,
+	TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8,
+	TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9,
+	TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10,
+	TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11,
+	TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12,
+	TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13,
+	TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14,
+	TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15,
+	TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16,
+	TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17,
+	TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18,
+	TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19,
+	TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20,
+};
 
 /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
  * bytes, but 128 is still a relatively large number of random bytes and