firmware_loader: Use init_utsname()->release

Message ID 20240222145819.96646-1-john.g.garry@oracle.com
State New
Headers
Series firmware_loader: Use init_utsname()->release |

Commit Message

John Garry Feb. 22, 2024, 2:58 p.m. UTC
  Instead of using UTS_RELEASE, use init_utsname()->release, which means
that we don't need to rebuild the code just for the git head commit
changing.

Since UTS_RELEASE is used for fw_path[] and this points to const data,
append init_utsname()->release dynamically to an intermediate string.

The check for appending uts release is if the string in fw_path[] ends
in '/'. Since fw_path[] may include a path from the kernel command line
in fw_path_para, and it would be valid for this string to end in '/',
check for fw_path_para when appending uts release. 

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
Note: As mentioned by Masahiro in [0], when CONFIG_MODVERSIONS=y it
could be possible for a driver to be built as a module with a different
kernel baseline and so use a different UTS_RELEASE from the baseline. So
now using init_utsname()->release could lead to a change in behaviour
in this driver. However, considering the nature of this driver and how it
would not make sense to build as module against a different tree, this
change should be ok.

[0] https://lore.kernel.org/lkml/CAK7LNAQ_r5yUjNpOppLkDBQ12sDxBYQTvRZGn1ng8D1POfZr_A@mail.gmail.com/
  

Comments

Luis Chamberlain Feb. 22, 2024, 3:53 p.m. UTC | #1
On Thu, Feb 22, 2024 at 02:58:19PM +0000, John Garry wrote:
> Instead of using UTS_RELEASE, use init_utsname()->release, which means
> that we don't need to rebuild the code just for the git head commit
> changing.
> 
> Since UTS_RELEASE is used for fw_path[] and this points to const data,
> append init_utsname()->release dynamically to an intermediate string.
> 
> The check for appending uts release is if the string in fw_path[] ends
> in '/'. Since fw_path[] may include a path from the kernel command line
> in fw_path_para, and it would be valid for this string to end in '/',
> check for fw_path_para when appending uts release. 
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> Note: As mentioned by Masahiro in [0], when CONFIG_MODVERSIONS=y it
> could be possible for a driver to be built as a module with a different
> kernel baseline and so use a different UTS_RELEASE from the baseline. So
> now using init_utsname()->release could lead to a change in behaviour
> in this driver. However, considering the nature of this driver and how it
> would not make sense to build as module against a different tree, this

would not make sense to build it as an external module against against a
different tree, so this change should not have any effect to users.

> change should be ok.
> 
> [0] https://lore.kernel.org/lkml/CAK7LNAQ_r5yUjNpOppLkDBQ12sDxBYQTvRZGn1ng8D1POfZr_A@mail.gmail.com/

Thanks for doing this, could you send a v2 with the "Note:" removed but
instead include that blurb as part of the commit log as it is important
information to retain.

Could you also test the selftests to ensure nothing is broken ?

/tools/testing/selftests/firmware/fw_run_tests.sh

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis
  
John Garry Feb. 23, 2024, 2:29 p.m. UTC | #2
>> ---
>> Note: As mentioned by Masahiro in [0], when CONFIG_MODVERSIONS=y it
>> could be possible for a driver to be built as a module with a different
>> kernel baseline and so use a different UTS_RELEASE from the baseline. So
>> now using init_utsname()->release could lead to a change in behaviour
>> in this driver. However, considering the nature of this driver and how it
>> would not make sense to build as module against a different tree, this
> 
> would not make sense to build it as an external module against against a
> different tree, so this change should not have any effect to users.
> 

ok


>> change should be ok.
>>
>> [0] https://urldefense.com/v3/__https://lore.kernel.org/lkml/CAK7LNAQ_r5yUjNpOppLkDBQ12sDxBYQTvRZGn1ng8D1POfZr_A@mail.gmail.com/__;!!ACWV5N9M2RV99hQ!KxnFzsitgB8e0kDndt3nYwqw0FcAPKzJjuRl9BzwLQ9dbAtqK_SZOkhHw9ssT2PobYVkh8UU7WryWKwGXg$
> 
> Thanks for doing this, could you send a v2 with the "Note:" removed but
> instead include that blurb as part of the commit log as it is important
> information to retain.

ok

> 
> Could you also test the selftests to ensure nothing is broken ?
> 
> ./tools/testing/selftests/firmware/fw_run_tests.sh

I am running this now, but it does not seem stable on a v6.8-rc5 
baseline. I am seeing hangs. Are there any known issues?

This one passed, it seems to me:

https://pastebin.com/ZySPmH9h

But then on another run I see a hang at:

...
Testing with the file missing...
Batched request_firmware() nofile try #1: OK
Batched request_firmware() nofile try #2: OK
Batched request_firmware() nofile try #3: OK
Batched request_firmware() nofile try #4: OK
Batched request_firmware() nofile try #5: OK
Batched request_firmware_into_buf() nofile try #1: OK
Batched request_firmware_into_buf() nofile try #2: OK
Batched request_firmware_into_buf() nofile try #3: OK
Batched request_firmware_into_buf() nofile try #4: OK
Batched request_firmware_into_buf() nofile try #5: OK
Batched request_firmware_direct() nofile try #1: OK
Batched request_firmware_direct() nofile try #2: OK
Batched request_firmware_direct() nofile try #3: OK
Batched request_firmware_direct() nofile try #4: OK
Batched request_firmware_direct() nofile try #5: OK
Batched request_firmware_nowait(uevent=true) nofile try #1: OK
Batched request_firmware_nowait(uevent=true) nofile try #2: OK
Batched request_firmware_nowait(uevent=true) nofile try #3: OK
Batched request_firmware_nowait(uevent=true) nofile try #4: OK
Batched request_firmware_nowait(uevent=true) nofile try #5: OK
Batched request_firmware_nowait(uevent=false) nofile try #1: OK
Batched request_firmware_nowait(uevent=false) nofile try #2: OK
Batched request_firmware_nowait(uevent=false) nofile try #3: OK
Batched request_firmware_nowait(uevent=false) nofile try #4:


And kernel log:

[  385.105004] test_firmware: #3: failed to async load firmware
[  385.181221] test_firmware: #0: failed to async load firmware
[  385.252231] test_firmware: #1: failed to async load firmware
[  385.323307] test_firmware: #2: failed to async load firmware
[  385.397444] test_firmware: #3: failed to async load firmware
[  644.462125] INFO: task fw_filesystem.s:5466 blocked for more than 120 
seconds.
[  644.548803]       Not tainted 6.8.0-rc5-g7e58e46976f2 #32
[  644.613581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  765.294106] INFO: task fw_filesystem.s:5466 blocked for more than 241 
seconds.
[  765.380805]       Not tainted 6.8.0-rc5-g7e58e46976f2 #32
[  765.445610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  886.126154] INFO: task fw_filesystem.s:5466 blocked for more than 362 
seconds.
[  886.212883]       Not tainted 6.8.0-rc5-g7e58e46976f2 #32
[  886.277695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

FWIW, with my patch we get same output for "passing" run upto another hang.

Thanks,
John

>
  
John Garry Feb. 23, 2024, 3:34 p.m. UTC | #3
> 
> FWIW, with my patch we get same output for "passing" run upto another hang.

I will note that it has "passed" with this patch, so I sent the updated 
v2. Please let me know any testing or further debug that is required for 
that intermittent hang.

Cheers,
John
  
Luis Chamberlain Feb. 26, 2024, 4:34 p.m. UTC | #4
On Fri, Feb 23, 2024 at 02:29:19PM +0000, John Garry wrote:
> > Could you also test the selftests to ensure nothing is broken ?
> > 
> > ./tools/testing/selftests/firmware/fw_run_tests.sh
> 
> I am running this now, but it does not seem stable on a v6.8-rc5 baseline. I
> am seeing hangs. Are there any known issues?

I tested next-20240108 and so no issues.

Lemme upgrade.

Note you also need /lib/udev/rules.d/50-firmware.rules removed if
you have it.

  Luis
  
Luis Chamberlain Feb. 26, 2024, 5:10 p.m. UTC | #5
On Mon, Feb 26, 2024 at 08:34:11AM -0800, Luis Chamberlain wrote:
> On Fri, Feb 23, 2024 at 02:29:19PM +0000, John Garry wrote:
> > > Could you also test the selftests to ensure nothing is broken ?
> > > 
> > > ./tools/testing/selftests/firmware/fw_run_tests.sh
> > 
> > I am running this now, but it does not seem stable on a v6.8-rc5 baseline. I
> > am seeing hangs. Are there any known issues?
> 
> I tested next-20240108 and so no issues.
> 
> Lemme upgrade.
> 
> Note you also need /lib/udev/rules.d/50-firmware.rules removed if
> you have it.

I don't see any hung tasks on next-20240226. Perhaps its with your
config that it is triggered.

  Luis
  
John Garry Feb. 26, 2024, 5:13 p.m. UTC | #6
On 26/02/2024 17:10, Luis Chamberlain wrote:
>>> I am running this now, but it does not seem stable on a v6.8-rc5 baseline. I
>>> am seeing hangs. Are there any known issues?
>> I tested next-20240108 and so no issues.
>>
>> Lemme upgrade.
>>
>> Note you also need /lib/udev/rules.d/50-firmware.rules removed if
>> you have it.

I have this:
ubuntu@jgarry-ubuntu-bm5-instance-20230215-1843:~/mnt/linux2$ more 
/lib/udev/rules.d/50-firmware.rules
# stub for immediately telling the kernel that userspace firmware loading
# failed; necessary to avoid long timeouts with 
CONFIG_FW_LOADER_USER_HELPER=y
SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1"

So I can remove it.

> I don't see any hung tasks on next-20240226. Perhaps its with your
> config that it is triggered.
OK, I'll check again when I get a chance, but I am not so keen on 
checking linux-next in general.

Cheers,
John
  
John Garry Feb. 27, 2024, 1:02 p.m. UTC | #7
On 27/02/2024 12:59, Luis Chamberlain wrote:
> On Mon, Feb 26, 2024 at 05:13:44PM +0000, John Garry wrote:
>> On 26/02/2024 17:10, Luis Chamberlain wrote:
>>>>> I am running this now, but it does not seem stable on a v6.8-rc5 baseline. I
>>>>> am seeing hangs. Are there any known issues?
>>>> I tested next-20240108 and so no issues.
>>>>
>>>> Lemme upgrade.
>>>>
>>>> Note you also need /lib/udev/rules.d/50-firmware.rules removed if
>>>> you have it.
>> I have this:
>> ubuntu@jgarry-ubuntu-bm5-instance-20230215-1843:~/mnt/linux2$ more
>> /lib/udev/rules.d/50-firmware.rules
>> # stub for immediately telling the kernel that userspace firmware loading
>> # failed; necessary to avoid long timeouts with
>> CONFIG_FW_LOADER_USER_HELPER=y
>> SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1"
>>
>> So I can remove it.
> You can keep it on a distro use case, but for testing it is important to
> remove it.

ok

> 
>>> I don't see any hung tasks on next-20240226. Perhaps its with your
>>> config that it is triggered.
>> OK, I'll check again when I get a chance, but I am not so keen on checking
>> linux-next in general.
> What kernel are you seeing this issue on?

I was testing v6.8-rc5 last week

Thanks,
John
  
John Garry Feb. 28, 2024, 12:31 p.m. UTC | #8
On 27/02/2024 13:50, Luis Chamberlain wrote:
> On Tue, Feb 27, 2024 at 01:02:26PM +0000, John Garry wrote:
>> On 27/02/2024 12:59, Luis Chamberlain wrote:
>>> What kernel are you seeing this issue on?
>> I was testing v6.8-rc5 last week
> I cannot reproduce a hung task there either on that kernel.
> 
> I see this:
> 
> sysfs: cannot create duplicate filename '/devices/virtual/misc/test_firmware/nope-test-firmware.bin'
> 
> But these are expected as the selftests tries silly things to ensure
> they are not allowed.
> 
> If you can reproduce it there, it would be appreciated if you look underneath
> the hood a bit, or share anything glaring and obvious which may help
> reproduce this.

Update: commenting-out /lib/udev/rules.d/50-firmware.rules seems to make 
the test reliably pass for v6.8-rc5, but not with my patch on top - I 
still get a hang there. I'll investigate that hang with my patch.

Thanks,
John
  
John Garry Feb. 28, 2024, 4:12 p.m. UTC | #9
On 28/02/2024 15:18, Luis Chamberlain wrote:
>>> But these are expected as the selftests tries silly things to ensure
>>> they are not allowed.
>>>
>>> If you can reproduce it there, it would be appreciated if you look underneath
>>> the hood a bit, or share anything glaring and obvious which may help
>>> reproduce this.
>> Update: commenting-out /lib/udev/rules.d/50-firmware.rules seems to make the
>> test reliably pass for v6.8-rc5,
> Great, note that if you had a hung task even with the udev rule that is
> not expected and is indicative of a bug I cannot reproduce.
> 
>> but not with my patch on top - I still get
>> a hang there. I'll investigate that hang with my patch.
> So your hang is with the udev rule on vanilla v6.8-rc5 ?
> Because for the
> life of me, I don't see it.

I think that I spoke too soon. After adding debug to see any difference 
between mainline and my patch, I get this:

[  806.830318] misc test_firmware: Direct firmware load for 
tmp.k5xh5F4xvG failed with error -2
[  806.830322] misc test_firmware: Falling back to sysfs fallback for: 
tmp.k5xh5F4xvG
[ 1006.958148] INFO: task fw_filesystem.s:5565 blocked for more than 120 
seconds.
[ 1007.045111]       Not tainted 6.8.0-rc5-g11c039aebb43 #40
[ 1007.110262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1007.204462] task:fw_filesystem.s state:D stack:0     pid:5565 
tgid:5565  ppid:1      flags:0x00000002
[ 1007.204470] Call Trace:
[ 1007.204472]  <TASK>
[ 1007.204477]  __schedule+0x3d7/0x1720
[ 1007.204486]  ? ttwu_do_activate+0x7a/0x260
[ 1007.204493]  ? try_to_wake_up+0x81/0x6c0
[ 1007.204495]  schedule+0x39/0x100
[ 1007.204496]  schedule_timeout+0x14f/0x160
[ 1007.204502]  ? __queue_work+0x212/0x500
[ 1007.204507]  ? fw_devm_match+0x29/0x40
[ 1007.204514]  __wait_for_common+0x8f/0x190
[ 1007.204517]  ? __pfx_schedule_timeout+0x10/0x10
[ 1007.204520]  wait_for_completion+0x28/0x30
[ 1007.204522]  trigger_batched_requests_async_store+0x95/0x220
[ 1007.204532]  dev_attr_store+0x18/0x30
[ 1007.204539]  sysfs_kf_write+0x3f/0x50
[ 1007.204547]  kernfs_fop_write_iter+0x140/0x1d0
[ 1007.204550]  vfs_write+0x311/0x430
[ 1007.204557]  ksys_write+0x6b/0xf0
[ 1007.204560]  __x64_sys_write+0x1d/0x30
[ 1007.204562]  do_syscall_64+0x77/0x120
[ 1007.204568]  ? __count_memcg_events+0x6f/0x110
[ 1007.204576]  ? count_memcg_events.constprop.0+0x1e/0x40
[ 1007.204584]  ? handle_mm_fault+0x192/0x2f0
[ 1007.204587]  ? do_user_addr_fault+0x33f/0x6c0
[ 1007.204594]  ? irqentry_exit_to_user_mode+0x6b/0x180
[ 1007.204599]  ? irqentry_exit+0x3f/0x50
[ 1007.204601]  ? exc_page_fault+0x8e/0x190
[ 1007.204604]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 1007.204609] RIP: 0033:0x7f8172b14887

This is without the udev rule.

I'll try to now see where this hang really is coming from... but it 
might take a while. Maybe first I should enable some more debug config 
options.

Thanks,
John
  

Patch

diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index 3c67f24785fc..9a3671659134 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -38,7 +38,7 @@ 
 #include <linux/zstd.h>
 #include <linux/xz.h>
 
-#include <generated/utsrelease.h>
+#include <linux/utsname.h>
 
 #include "../base.h"
 #include "firmware.h"
@@ -471,9 +471,9 @@  static int fw_decompress_xz(struct device *dev, struct fw_priv *fw_priv,
 static char fw_path_para[256];
 static const char * const fw_path[] = {
 	fw_path_para,
-	"/lib/firmware/updates/" UTS_RELEASE,
+	"/lib/firmware/updates/", /* UTS_RELEASE is appended later */
 	"/lib/firmware/updates",
-	"/lib/firmware/" UTS_RELEASE,
+	"/lib/firmware/", /* UTS_RELEASE is appended later */
 	"/lib/firmware"
 };
 
@@ -496,7 +496,7 @@  fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
 	size_t size;
 	int i, len, maxlen = 0;
 	int rc = -ENOENT;
-	char *path, *nt = NULL;
+	char *path, *fw_path_string, *nt = NULL;
 	size_t msize = INT_MAX;
 	void *buffer = NULL;
 	dev_err(device, "%s suffix=%s\n", __func__, suffix);
@@ -511,6 +511,12 @@  fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
 	if (!path)
 		return -ENOMEM;
 
+	fw_path_string = __getname();
+	if (!fw_path_string) {
+		__putname(path);
+		return -ENOMEM;
+	}
+
 	wait_for_initramfs();
 	for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
 		size_t file_size = 0;
@@ -521,16 +527,32 @@  fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
 		if (!fw_path[i][0])
 			continue;
 
+		len = snprintf(fw_path_string, PATH_MAX, "%s", fw_path[i]);
+		if (len >= PATH_MAX) {
+			rc = -ENAMETOOLONG;
+			break;
+		}
+
+		/* Special handling to append UTS_RELEASE */
+		if ((fw_path[i] != fw_path_para) && (fw_path[i][len - 1] == '/')) {
+			len = snprintf(fw_path_string, PATH_MAX, "%s%s",
+					fw_path[i], init_utsname()->release);
+			if (len >= PATH_MAX) {
+				rc = -ENAMETOOLONG;
+				break;
+			}
+		}
+
 		/* strip off \n from customized path */
-		maxlen = strlen(fw_path[i]);
+		maxlen = strlen(fw_path_string);
 		if (i == 0) {
-			nt = strchr(fw_path[i], '\n');
+			nt = strchr(fw_path_string, '\n');
 			if (nt)
-				maxlen = nt - fw_path[i];
+				maxlen = nt - fw_path_string;
 		}
 
 		len = snprintf(path, PATH_MAX, "%.*s/%s%s",
-			       maxlen, fw_path[i],
+			       maxlen, fw_path_string,
 			       fw_priv->fw_name, suffix);
 		if (len >= PATH_MAX) {
 			rc = -ENAMETOOLONG;
@@ -588,6 +610,7 @@  fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
 		break;
 	}
 	__putname(path);
+	__putname(fw_path_string);
 
 	return rc;
 }