Message ID | 20240124004028.16826-2-zfigura@codeweavers.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-36241-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2553:b0:103:945f:af90 with SMTP id p19csp699808dyi; Tue, 23 Jan 2024 17:01:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IF3oq0VBkp66uJu/lc3chGX2FGwyO5bRQH6yU4hA23IIwPqG3dAiWMTSMSAKRwbI7OILoQv X-Received: by 2002:a17:902:ec85:b0:1d7:4d4f:db21 with SMTP id x5-20020a170902ec8500b001d74d4fdb21mr57253plg.112.1706058092018; Tue, 23 Jan 2024 17:01:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706058092; cv=pass; d=google.com; s=arc-20160816; b=Te7JYWNQ49dIBrydVH0A+U//88DVsCAks+zZTjLbQgrHxyLblGulrCJW2cHIAUze6P GSIBpgHjErZpQHJB/hle/eudTDq55xx4doMOe1OAewqAI8WmMEEKW/+277+Hwib72TLy o5ciQZ9C4ePdYwWKRo5dR1PqEurA+NqpVv1RP6U/8bpVBiD0Db3UIYGTtu2oGX8Ak04q AftFy8x3XYX2lP9Aek7Xpxw6bRkM5GIBW6PyCfPdT615As8TGO1PSmuei4f+yjoOckE9 9OtPGtCrTCHcWe7KpFg0AsnEkJj1ZMSxzlgxVF+dbB3oqleEeE2DfRoFk4WlPbtBVs0X jzlw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=oUXTx6PCjwpopSp5hsJS/fwxSNq/Qs+N3FA1Q73jh2c=; fh=zOh/gHR1JdjLnPSwcFKri+IPCR5+T1wa2H0dRJVHA2I=; b=vZq3FK7Zv5/l/CccLbKVXu6MYtLy/yE97z/uqndFPcaFh2glAr7nXWbtab35DByXke IitYYVqjdpE/rW2CtMeGV+RMgy28ivTwn8Q0RYICSqxW632cPMU6J77ODtiAocH0ytP6 hwd41vN5Y9HkEdraAuvi8u5+K50cruFC47zud64OE2cx6EF0OwKK9DmFfkdGmPy50SnI t+JLuzs/oT7vYYVWRBkyUyli5SI9yfFZEczjKj1HXQRN1Eg+MzbkmByel6muNXEY5OMC FynqYFZ8dkApA9ctpZpYyosEmk6NP9MbgnqVrnK9uyiFgDvraZOfAwQLfgDM1PbETx3C 1v2w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@codeweavers.com header.s=s1 header.b=iIP0o2mv; arc=pass (i=1 spf=pass spfdomain=codeweavers.com dkim=pass dkdomain=codeweavers.com dmarc=pass fromdomain=codeweavers.com); spf=pass (google.com: domain of linux-kernel+bounces-36241-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-36241-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id h15-20020a170902680f00b001d5efc74216si10669580plk.452.2024.01.23.17.01.31 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jan 2024 17:01:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-36241-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@codeweavers.com header.s=s1 header.b=iIP0o2mv; arc=pass (i=1 spf=pass spfdomain=codeweavers.com dkim=pass dkdomain=codeweavers.com dmarc=pass fromdomain=codeweavers.com); spf=pass (google.com: domain of linux-kernel+bounces-36241-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-36241-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id EB399B285AB for <ouuuleilei@gmail.com>; Wed, 24 Jan 2024 00:59:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BFEDA4694; Wed, 24 Jan 2024 00:58:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=codeweavers.com header.i=@codeweavers.com header.b="iIP0o2mv" Received: from mail.codeweavers.com (mail.codeweavers.com [4.36.192.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B35073C3B; Wed, 24 Jan 2024 00:58:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=4.36.192.163 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706057920; cv=none; b=tvXEZDjXhg6sPCmyCHBaVIsjhF/0E7gVv9LXi04XSlke4kMm8q6NUyoQUWJxPyMx7yT5Plf0+f74xzYy3UajY0wDWg5x9bV6XyJCFOnWrHZ87DLMcJPkTO3ivYFA4QRfY+pj506q3ufPZC18s9Na22zclEM+618VXUFm6ctIdc0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706057920; c=relaxed/simple; bh=0afrJ/uftX6Nq9Eow5vBIO/VvCEiQGqndgMtPqx2aco=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m49nn17HmjhVVIbg+obWHJJ59/jV3TiyH99G15+Nm5VvcN47b1p1Fr0YIagKQ5QzSkNTc4mDp4ncfRQBkWQRDI87+ktVPOzfTi1TG+DYBczU64djNdLX8YK3BQjn9Nhbdon+TSbOq+BA77d9ASHjzQRh8QfGPidFnhJomAICXWU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=codeweavers.com; spf=pass smtp.mailfrom=codeweavers.com; dkim=pass (2048-bit key) header.d=codeweavers.com header.i=@codeweavers.com header.b=iIP0o2mv; arc=none smtp.client-ip=4.36.192.163 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=codeweavers.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=codeweavers.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codeweavers.com; s=s1; h=Message-ID:Date:Subject:Cc:To:From:Sender; bh=oUXTx6PCjwpopSp5hsJS/fwxSNq/Qs+N3FA1Q73jh2c=; b=iIP0o2mvYtke4of5HpTYWx6VlE Sh6yOl5yvxhtSKUuhRSmREPdm5W47goV8nGyVpgZFxd+BCbGvztK6DtAcXympigwr/hYC+cZEY7gG 46JY3WLXYoos5DnuH8XFmofKKt2/K1dqYo+mUQtElj7AlREQp2XRRQBcna6nPTspsNvV3VLNUDoCN NfU56VZYnkIr2i5lf1a7vNIV9zUSWuI6AKToCVrHafYPS9+Gzbigtu1y+5uTDuhHy/2LR3Eb0/t/k AysYzbjiv9g+Ynrv3lZYI9juC64FRlQxtaYV9+wV/NP4TKZGYymut8MfURH3t+Fj9Q79d/+/yhweu cM/3TTiA==; Received: from cw137ip160.mn.codeweavers.com ([10.69.137.160] helo=camazotz.mn.codeweavers.com) by mail.codeweavers.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from <zfigura@codeweavers.com>) id 1rSRLB-00DVeW-0r; Tue, 23 Jan 2024 18:42:05 -0600 From: Elizabeth Figura <zfigura@codeweavers.com> To: Arnd Bergmann <arnd@arndb.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: wine-devel@winehq.org, =?utf-8?q?Andr=C3=A9_Almeida?= <andrealmeid@igalia.com>, Wolfram Sang <wsa@kernel.org>, Arkadiusz Hiler <ahiler@codeweavers.com>, Peter Zijlstra <peterz@infradead.org>, Elizabeth Figura <zfigura@codeweavers.com> Subject: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device. Date: Tue, 23 Jan 2024 18:40:20 -0600 Message-ID: <20240124004028.16826-2-zfigura@codeweavers.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240124004028.16826-1-zfigura@codeweavers.com> References: <20240124004028.16826-1-zfigura@codeweavers.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788931569382404749 X-GMAIL-MSGID: 1788931569382404749 |
Series |
NT synchronization primitive driver
|
|
Commit Message
Elizabeth Figura
Jan. 24, 2024, 12:40 a.m. UTC
ntsync uses a misc device as the simplest and least intrusive uAPI interface.
Each file description on the device represents an isolated NT instance, intended
to correspond to a single NT virtual machine.
Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
drivers/misc/Kconfig | 9 ++++++++
drivers/misc/Makefile | 1 +
drivers/misc/ntsync.c | 53 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+)
create mode 100644 drivers/misc/ntsync.c
Comments
On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote: > ntsync uses a misc device as the simplest and least intrusive uAPI interface. > > Each file description on the device represents an isolated NT instance, intended > to correspond to a single NT virtual machine. > > Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> I'm looking at the ioctl interface to ensure it's well-formed. Your patches look ok from that perspective, but there are a few minor things I would check for consistency here: > + > +static const struct file_operations ntsync_fops = { > + .owner = THIS_MODULE, > + .open = ntsync_char_open, > + .release = ntsync_char_release, > + .unlocked_ioctl = ntsync_char_ioctl, > + .compat_ioctl = ntsync_char_ioctl, > + .llseek = no_llseek, > +}; The .compat_ioctl pointer should point to compat_ptr_ioctl() since the actual ioctl commands all take pointers instead of interpreting the argument as a number. On x86 and arm64 this won't make a difference as compat_ptr() is a nop. Arnd
On Wednesday, 24 January 2024 01:38:52 CST Arnd Bergmann wrote: > On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote: > > ntsync uses a misc device as the simplest and least intrusive uAPI interface. > > > > Each file description on the device represents an isolated NT instance, intended > > to correspond to a single NT virtual machine. > > > > Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> > > I'm looking at the ioctl interface to ensure it's well-formed. > > Your patches look ok from that perspective, but there are a > few minor things I would check for consistency here: > > > + > > +static const struct file_operations ntsync_fops = { > > + .owner = THIS_MODULE, > > + .open = ntsync_char_open, > > + .release = ntsync_char_release, > > + .unlocked_ioctl = ntsync_char_ioctl, > > + .compat_ioctl = ntsync_char_ioctl, > > + .llseek = no_llseek, > > +}; > > The .compat_ioctl pointer should point to compat_ptr_ioctl() > since the actual ioctl commands all take pointers instead > of interpreting the argument as a number. > > On x86 and arm64 this won't make a difference as compat_ptr() > is a nop. Thanks; will fix.
On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura <zfigura@codeweavers.com> wrote: > > ntsync uses a misc device as the simplest and least intrusive uAPI interface. > > Each file description on the device represents an isolated NT instance, intended > to correspond to a single NT virtual machine. If I understand this text right, and if I understood the code right, you're saying that each open instance of the device represents an entire universe of NT synchronization objects, and no security or isolation is possible between those objects. For single-process use, this seems fine. But fork() will be a bit odd (although NT doesn't really believe in fork, so maybe this is fine). Except that NT has *named* semaphores and such. And I'm pretty sure I've written GUI programs that use named synchronization objects (IIRC they were events, and this was a *very* common pattern, regularly discussed in MSDN, usenet, etc) to detect whether another instance of the program is running. And this all works on real Windows because sessions have sufficiently separated namespaces, and the security all works out about as any other security on Windows, etc. But implementing *that* on top of this file-description-plus-integer-equals-object will be fundamentally quite subject to one buggy program completely clobbering someone else's state. Would it make sense and scale appropriately for an NT synchronization *object* to be a Linux open file description? Then SCM_RIGHTS could pass them around, an RPC server could manage *named* objects, and they'd generally work just like other "Object Manager" objects like, say, files. --Andy
On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura > <zfigura@codeweavers.com> wrote: > > > > ntsync uses a misc device as the simplest and least intrusive uAPI interface. > > > > Each file description on the device represents an isolated NT instance, intended > > to correspond to a single NT virtual machine. > > If I understand this text right, and if I understood the code right, > you're saying that each open instance of the device represents an > entire universe of NT synchronization objects, and no security or > isolation is possible between those objects. For single-process use, > this seems fine. But fork() will be a bit odd (although NT doesn't > really believe in fork, so maybe this is fine). > > Except that NT has *named* semaphores and such. And I'm pretty sure > I've written GUI programs that use named synchronization objects (IIRC > they were events, and this was a *very* common pattern, regularly > discussed in MSDN, usenet, etc) to detect whether another instance of > the program is running. And this all works on real Windows because > sessions have sufficiently separated namespaces, and the security all > works out about as any other security on Windows, etc. But > implementing *that* on top of this > file-description-plus-integer-equals-object will be fundamentally > quite subject to one buggy program completely clobbering someone > else's state. > > Would it make sense and scale appropriately for an NT synchronization > *object* to be a Linux open file description? Then SCM_RIGHTS could > pass them around, an RPC server could manage *named* objects, and > they'd generally work just like other "Object Manager" objects like, > say, files. It's a sensible concern. I think when I discussed this with Alexandre Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't something we were concerned about. While the current model *does* allow for processes to arbitrarily mess with each other, accidentally or not, I think we're not concerned with the scope of that than we are about implementing a whole scheduler in user space. For one, you can't corrupt the wineserver state this way—wineserver being sort of like a dedicated process that handles many of the things that a kernel would, and so sometimes needs to set or reset events, or perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state. Whereas trying to implement a scheduler in user space would involve the wineserver taking locks, and hence other processes could deadlock. For two, it's probably a lot harder to mess with that internal state accidentally. [There is also a potential problem where some broken applications create a million (literally) sync objects. Making these into files runs into NOFILE. We did specifically push distributions and systemd to increase those limits because an older solution *did* use eventfds and *did* run into those limits. Since that push was successful I don't know if this is *actually* a concern anymore, but avoiding files is probably not a bad thing either.] --Zeb
On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote: > On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: > > > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura > > <zfigura@codeweavers.com> wrote: > > > > > > > > > > > ntsync uses a misc device as the simplest and least intrusive uAPI > > > interface. > > > > > > > > > > > Each file description on the device represents an isolated NT instance, > > > intended to correspond to a single NT virtual machine. > > > > > > If I understand this text right, and if I understood the code right, > > you're saying that each open instance of the device represents an > > entire universe of NT synchronization objects, and no security or > > isolation is possible between those objects. For single-process use, > > this seems fine. But fork() will be a bit odd (although NT doesn't > > really believe in fork, so maybe this is fine). > > > > Except that NT has *named* semaphores and such. And I'm pretty sure > > I've written GUI programs that use named synchronization objects (IIRC > > they were events, and this was a *very* common pattern, regularly > > discussed in MSDN, usenet, etc) to detect whether another instance of > > the program is running. And this all works on real Windows because > > sessions have sufficiently separated namespaces, and the security all > > works out about as any other security on Windows, etc. But > > implementing *that* on top of this > > file-description-plus-integer-equals-object will be fundamentally > > quite subject to one buggy program completely clobbering someone > > else's state. > > > > Would it make sense and scale appropriately for an NT synchronization > > *object* to be a Linux open file description? Then SCM_RIGHTS could > > pass them around, an RPC server could manage *named* objects, and > > they'd generally work just like other "Object Manager" objects like, > > say, files. > > > It's a sensible concern. I think when I discussed this with Alexandre > Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't > something we were concerned about. > > While the current model *does* allow for processes to arbitrarily mess > with each other, accidentally or not, I think we're not concerned with > the scope of that than we are about implementing a whole scheduler in > user space. > > For one, you can't corrupt the wineserver state this way—wineserver > being sort of like a dedicated process that handles many of the things > that a kernel would, and so sometimes needs to set or reset events, or > perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state. > Whereas trying to implement a scheduler in user space would involve the > wineserver taking locks, and hence other processes could deadlock. > > For two, it's probably a lot harder to mess with that internal state > accidentally. > > [There is also a potential problem where some broken applications > create a million (literally) sync objects. Making these into files runs > into NOFILE. We did specifically push distributions and systemd to > increase those limits because an older solution *did* use eventfds and > *did* run into those limits. Since that push was successful I don't > know if this is *actually* a concern anymore, but avoiding files is > probably not a bad thing either.] Of course, looking at it from a kernel maintainer's perspective, it wouldn't be insane to do this anyway. If we at some point do start to care about cross- process isolation in this way, or if another NT emulator wants to use this interface and does care about cross-process isolation, it'll be necessary. At least it'd make sense to make them separate files even if we don't implement granular permission handling just yet. The main question is, is NOFILE a realistic concern, and what other problems might there be, in terms of making these heavier objects? Besides memory usage I can't think of any, but of course I don't have much knowledge of this area. Alternatively, maybe there's another more lightweight way to store per-process data?
Elizabeth Figura <zfigura@codeweavers.com> writes: > On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: >> On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura >> <zfigura@codeweavers.com> wrote: >> > >> > ntsync uses a misc device as the simplest and least intrusive uAPI interface. >> > >> > Each file description on the device represents an isolated NT instance, intended >> > to correspond to a single NT virtual machine. >> >> If I understand this text right, and if I understood the code right, >> you're saying that each open instance of the device represents an >> entire universe of NT synchronization objects, and no security or >> isolation is possible between those objects. For single-process use, >> this seems fine. But fork() will be a bit odd (although NT doesn't >> really believe in fork, so maybe this is fine). >> >> Except that NT has *named* semaphores and such. And I'm pretty sure >> I've written GUI programs that use named synchronization objects (IIRC >> they were events, and this was a *very* common pattern, regularly >> discussed in MSDN, usenet, etc) to detect whether another instance of >> the program is running. And this all works on real Windows because >> sessions have sufficiently separated namespaces, and the security all >> works out about as any other security on Windows, etc. But >> implementing *that* on top of this >> file-description-plus-integer-equals-object will be fundamentally >> quite subject to one buggy program completely clobbering someone >> else's state. >> >> Would it make sense and scale appropriately for an NT synchronization >> *object* to be a Linux open file description? Then SCM_RIGHTS could >> pass them around, an RPC server could manage *named* objects, and >> they'd generally work just like other "Object Manager" objects like, >> say, files. > > It's a sensible concern. I think when I discussed this with Alexandre > Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't > something we were concerned about. > > While the current model *does* allow for processes to arbitrarily mess > with each other, accidentally or not, I think we're not concerned with > the scope of that than we are about implementing a whole scheduler in > user space. I may have misunderstood something in that dicussion then, because it would definitely be a concern. It's OK for a process to be able to mess up the state of any object that it has an NT handle to, but it shouldn't be possible to mess up the state of unrelated objects in other processes simply by passing the wrong integer id. The concern is not so much about a malicious process going out of its way to corrupt others, because it could do that through the NT API just as well. But if a wayward pointer corrupts the client-side handle cache, that shouldn't take down the entire session.
On Thu, Jan 25, 2024, at 04:42, Elizabeth Figura wrote: > On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote: >> On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: >> > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura <zfigura@codeweavers.com> wrote: >> >> [There is also a potential problem where some broken applications >> create a million (literally) sync objects. Making these into files runs >> into NOFILE. We did specifically push distributions and systemd to >> increase those limits because an older solution *did* use eventfds and >> *did* run into those limits. Since that push was successful I don't >> know if this is *actually* a concern anymore, but avoiding files is >> probably not a bad thing either.] > > Of course, looking at it from a kernel maintainer's perspective, it wouldn't > be insane to do this anyway. If we at some point do start to care about cross- > process isolation in this way, or if another NT emulator wants to use this > interface and does care about cross-process isolation, it'll be necessary. At > least it'd make sense to make them separate files even if we don't implement > granular permission handling just yet. I can think of a few other possible benefits of going with per-mutex file descriptors: - being able to use poll() for waiting on them individually in combination with other file descriptor based events (socket, signalfd, pidfd, ...) - replacing your logic around xarray with something a bit simpler. As far as I can tell, your code is all correct here, but it would be easier to understand if it looked more like other code I'm familiar with. > The main question is, is NOFILE a realistic concern, and what other problems > might there be, in terms of making these heavier objects? Besides memory usage > I can't think of any, but of course I don't have much knowledge of this area. I would think that RLIMIT_NOFILE is a sensible way of managing this, at least this way it's possible to prevent exhausting memory with too many mutexes, but still raising the limit if you need more than whatever default one might come up with. Arnd
On Thursday, 25 January 2024 10:47:49 CST Arnd Bergmann wrote: > On Thu, Jan 25, 2024, at 04:42, Elizabeth Figura wrote: > > On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote: > >> On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: > >> > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura <zfigura@codeweavers.com> wrote: > >> [There is also a potential problem where some broken applications > >> create a million (literally) sync objects. Making these into files runs > >> into NOFILE. We did specifically push distributions and systemd to > >> increase those limits because an older solution *did* use eventfds and > >> *did* run into those limits. Since that push was successful I don't > >> know if this is *actually* a concern anymore, but avoiding files is > >> probably not a bad thing either.] > > > > Of course, looking at it from a kernel maintainer's perspective, it > > wouldn't be insane to do this anyway. If we at some point do start to > > care about cross- process isolation in this way, or if another NT > > emulator wants to use this interface and does care about cross-process > > isolation, it'll be necessary. At least it'd make sense to make them > > separate files even if we don't implement granular permission handling > > just yet. > > I can think of a few other possible benefits of going with > per-mutex file descriptors: > > - being able to use poll() for waiting on them individually in > combination with other file descriptor based events (socket, > signalfd, pidfd, ...) I can say for sure this isn't going to be useful for Wine, at least not with the current design. It also doesn't really mesh well with the NT design in the first place. NTSYNC_IOC_WAIT_ANY differs from poll() in two major ways: it consumes state of most object types, and (as coded here) it needs the owner thread ID to be specifically passed for mutexes. Anyway, as Alexandre has informed me I clearly have misunderstood our requirements, so I'm going to try to put together something using files instead.
On Wed, Jan 24, 2024 at 7:42 PM Elizabeth Figura <zfigura@codeweavers.com> wrote: > > On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote: > > On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: > > > > > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura > > > <zfigura@codeweavers.com> wrote: > > > > > > > > > > > > > > > ntsync uses a misc device as the simplest and least intrusive uAPI > > > > interface. > > > > > > > > > > > > > > > Each file description on the device represents an isolated NT instance, > > > > intended to correspond to a single NT virtual machine. > > > > > > > > > If I understand this text right, and if I understood the code right, > > > you're saying that each open instance of the device represents an > > > entire universe of NT synchronization objects, and no security or > > > isolation is possible between those objects. For single-process use, > > > this seems fine. But fork() will be a bit odd (although NT doesn't > > > really believe in fork, so maybe this is fine). > > > > > > Except that NT has *named* semaphores and such. And I'm pretty sure > > > I've written GUI programs that use named synchronization objects (IIRC > > > they were events, and this was a *very* common pattern, regularly > > > discussed in MSDN, usenet, etc) to detect whether another instance of > > > the program is running. And this all works on real Windows because > > > sessions have sufficiently separated namespaces, and the security all > > > works out about as any other security on Windows, etc. But > > > implementing *that* on top of this > > > file-description-plus-integer-equals-object will be fundamentally > > > quite subject to one buggy program completely clobbering someone > > > else's state. > > > > > > Would it make sense and scale appropriately for an NT synchronization > > > *object* to be a Linux open file description? Then SCM_RIGHTS could > > > pass them around, an RPC server could manage *named* objects, and > > > they'd generally work just like other "Object Manager" objects like, > > > say, files. > > > > > > It's a sensible concern. I think when I discussed this with Alexandre > > Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't > > something we were concerned about. > > > > While the current model *does* allow for processes to arbitrarily mess > > with each other, accidentally or not, I think we're not concerned with > > the scope of that than we are about implementing a whole scheduler in > > user space. > > > > For one, you can't corrupt the wineserver state this way—wineserver > > being sort of like a dedicated process that handles many of the things > > that a kernel would, and so sometimes needs to set or reset events, or > > perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state. > > Whereas trying to implement a scheduler in user space would involve the > > wineserver taking locks, and hence other processes could deadlock. > > > > For two, it's probably a lot harder to mess with that internal state > > accidentally. > > > > [There is also a potential problem where some broken applications > > create a million (literally) sync objects. Making these into files runs > > into NOFILE. We did specifically push distributions and systemd to > > increase those limits because an older solution *did* use eventfds and > > *did* run into those limits. Since that push was successful I don't > > know if this is *actually* a concern anymore, but avoiding files is > > probably not a bad thing either.] > > Of course, looking at it from a kernel maintainer's perspective, it wouldn't > be insane to do this anyway. If we at some point do start to care about cross- > process isolation in this way, or if another NT emulator wants to use this > interface and does care about cross-process isolation, it'll be necessary At > least it'd make sense to make them separate files even if we don't implement > granular permission handling just yet. I'm not convinced that any complexity at all beyond using individual files is needed for granular permission handling. Unless something actually needs permission bits on different files pointing at the same sync object (which I believe NT supports, but it's sort of an odd concept and I'm not immediately convinced that anything uses it), merely having individual files ought to do the trick. Handling of who has permission to open a given named object can live in a daemon, and I'd guess that Wine even already implements this. And keeping everything together gives me flashbacks of Windows 95 and Mac OS pre-X. Sure, in principle the software wasn't malicious, but there was no shortage whatsoever of buggy crap out there, and systems were quite unstable. Even just: CreateSemaphore(); fork(); sleep a few seconds; exit(); seems like it could corrupt the shared namespace world. (Obviously no one would ever do that, right?) Also, handle leaks: while(true) { make a subprocess, which creates a semaphore and crashes; } > > The main question is, is NOFILE a realistic concern, and what other problems > might there be, in terms of making these heavier objects? Besides memory usage > I can't think of any, but of course I don't have much knowledge of this area. Years ago there was some discussion of making struct file lighter-weight for light-weight things that aren't files. And, in any case, even the little integer indices in your code aren't free -- they just aren't accounted as files. And struct file isn't *that* bad. I bet it's not dramatically bigger, or even smaller, than whatever the Windows kernel stores for a semaphore handle. --Andy
On Thursday, 25 January 2024 12:55:04 CST Andy Lutomirski wrote: > On Wed, Jan 24, 2024 at 7:42 PM Elizabeth Figura > <zfigura@codeweavers.com> wrote: > > > > On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote: > > > On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote: > > > > > > > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura > > > > <zfigura@codeweavers.com> wrote: > > > > > > > > > > > > > > > > > > > ntsync uses a misc device as the simplest and least intrusive uAPI > > > > > interface. > > > > > > > > > > > > > > > > > > > Each file description on the device represents an isolated NT instance, > > > > > intended to correspond to a single NT virtual machine. > > > > > > > > > > > > If I understand this text right, and if I understood the code right, > > > > you're saying that each open instance of the device represents an > > > > entire universe of NT synchronization objects, and no security or > > > > isolation is possible between those objects. For single-process use, > > > > this seems fine. But fork() will be a bit odd (although NT doesn't > > > > really believe in fork, so maybe this is fine). > > > > > > > > Except that NT has *named* semaphores and such. And I'm pretty sure > > > > I've written GUI programs that use named synchronization objects (IIRC > > > > they were events, and this was a *very* common pattern, regularly > > > > discussed in MSDN, usenet, etc) to detect whether another instance of > > > > the program is running. And this all works on real Windows because > > > > sessions have sufficiently separated namespaces, and the security all > > > > works out about as any other security on Windows, etc. But > > > > implementing *that* on top of this > > > > file-description-plus-integer-equals-object will be fundamentally > > > > quite subject to one buggy program completely clobbering someone > > > > else's state. > > > > > > > > Would it make sense and scale appropriately for an NT synchronization > > > > *object* to be a Linux open file description? Then SCM_RIGHTS could > > > > pass them around, an RPC server could manage *named* objects, and > > > > they'd generally work just like other "Object Manager" objects like, > > > > say, files. > > > > > > > > > It's a sensible concern. I think when I discussed this with Alexandre > > > Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't > > > something we were concerned about. > > > > > > While the current model *does* allow for processes to arbitrarily mess > > > with each other, accidentally or not, I think we're not concerned with > > > the scope of that than we are about implementing a whole scheduler in > > > user space. > > > > > > For one, you can't corrupt the wineserver state this way—wineserver > > > being sort of like a dedicated process that handles many of the things > > > that a kernel would, and so sometimes needs to set or reset events, or > > > perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state. > > > Whereas trying to implement a scheduler in user space would involve the > > > wineserver taking locks, and hence other processes could deadlock. > > > > > > For two, it's probably a lot harder to mess with that internal state > > > accidentally. > > > > > > [There is also a potential problem where some broken applications > > > create a million (literally) sync objects. Making these into files runs > > > into NOFILE. We did specifically push distributions and systemd to > > > increase those limits because an older solution *did* use eventfds and > > > *did* run into those limits. Since that push was successful I don't > > > know if this is *actually* a concern anymore, but avoiding files is > > > probably not a bad thing either.] > > > > Of course, looking at it from a kernel maintainer's perspective, it wouldn't > > be insane to do this anyway. If we at some point do start to care about cross- > > process isolation in this way, or if another NT emulator wants to use this > > interface and does care about cross-process isolation, it'll be necessary. At > > least it'd make sense to make them separate files even if we don't implement > > granular permission handling just yet. > > I'm not convinced that any complexity at all beyond using individual > files is needed for granular permission handling. Unless something > actually needs permission bits on different files pointing at the same > sync object (which I believe NT supports, but it's sort of an odd > concept and I'm not immediately convinced that anything uses it), > merely having individual files ought to do the trick. Handling of who > has permission to open a given named object can live in a daemon, and > I'd guess that Wine even already implements this. This is mostly correct. NT has file descriptors and descriptions (the former is called a "handle"), though unlike Unix access bits are specific to the *descriptor* (handle). I don't know if anything uses it, but we do currently implement that basic functionality, so I can't say that nothing does either. So inasmuch as access to someone else's object is a concern, access to your object with bits you don't have permission for could be a concern along the same lines. However, from conversation with Alexandre I believe it'd be fine to just implement those checks in user space. > And keeping everything together gives me flashbacks of Windows 95 and > Mac OS pre-X. Sure, in principle the software wasn't malicious, but > there was no shortage whatsoever of buggy crap out there, and systems > were quite unstable. Even just: > > CreateSemaphore(); > fork(); > sleep a few seconds; > exit(); > > seems like it could corrupt the shared namespace world. (Obviously no > one would ever do that, right?) > > Also, handle leaks: > > while(true) { > make a subprocess, which creates a semaphore and crashes; > } For whatever it's worth, this particular thing wouldn't be a concern; Wine's "kernel" daemon already has to detect when a process dies and close all its outstanding handles.
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 4fb291f0bf7c..bdd8a71bd853 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -504,6 +504,15 @@ config OPEN_DICE measured boot flow. Userspace can use CDIs for remote attestation and sealing. +config NTSYNC + tristate "NT synchronization primitive emulation" + help + This module provides kernel support for emulation of Windows NT + synchronization primitives. It is not a hardware driver. + + To compile this driver as a module, choose M here: the + module will be called ntsync. + If unsure, say N. config VCPU_STALL_DETECTOR diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index ea6ea5bbbc9c..153a3f4837e8 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_PVPANIC) += pvpanic/ obj-$(CONFIG_UACCE) += uacce/ obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o obj-$(CONFIG_HISI_HIKEY_USB) += hisi_hikey_usb.o +obj-$(CONFIG_NTSYNC) += ntsync.o obj-$(CONFIG_HI6421V600_IRQ) += hi6421v600-irq.o obj-$(CONFIG_OPEN_DICE) += open-dice.o obj-$(CONFIG_GP_PCI1XXXX) += mchp_pci1xxxx/ diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c new file mode 100644 index 000000000000..9424c6210e51 --- /dev/null +++ b/drivers/misc/ntsync.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * ntsync.c - Kernel driver for NT synchronization primitives + * + * Copyright (C) 2021-2022 Elizabeth Figura + */ + +#include <linux/fs.h> +#include <linux/miscdevice.h> +#include <linux/module.h> + +#define NTSYNC_NAME "ntsync" + +static int ntsync_char_open(struct inode *inode, struct file *file) +{ + return nonseekable_open(inode, file); +} + +static int ntsync_char_release(struct inode *inode, struct file *file) +{ + return 0; +} + +static long ntsync_char_ioctl(struct file *file, unsigned int cmd, + unsigned long parm) +{ + switch (cmd) { + default: + return -ENOIOCTLCMD; + } +} + +static const struct file_operations ntsync_fops = { + .owner = THIS_MODULE, + .open = ntsync_char_open, + .release = ntsync_char_release, + .unlocked_ioctl = ntsync_char_ioctl, + .compat_ioctl = ntsync_char_ioctl, + .llseek = no_llseek, +}; + +static struct miscdevice ntsync_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = NTSYNC_NAME, + .fops = &ntsync_fops, +}; + +module_misc_device(ntsync_misc); + +MODULE_AUTHOR("Elizabeth Figura"); +MODULE_DESCRIPTION("Kernel driver for NT synchronization primitives"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("devname:" NTSYNC_NAME);