Message ID | 20230119135721.83345-3-alexander.shishkin@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp9285wrn; Thu, 19 Jan 2023 20:37:20 -0800 (PST) X-Google-Smtp-Source: AMrXdXtYSqUZZtdrrqmls89fVl9P9Zj2xBJNuWq9E3mwXXU2ywPnIh7GEGoKY3idg5j8z/JWbyd9 X-Received: by 2002:a05:6a21:9189:b0:b5:ce03:6548 with SMTP id tp9-20020a056a21918900b000b5ce036548mr15031112pzb.58.1674189440369; Thu, 19 Jan 2023 20:37:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674189440; cv=none; d=google.com; s=arc-20160816; b=kq1++aHPq6vVGUPS6My2wHTLWhmqQVvruCphRxlybDq2EtsC06RHeI1A8dORRi1DTP WwJpG6WOKH/CJgPO4mL5/vpaVvvx+3sgNKo322VLgZFfA5guwoqXAFhvPPcdsw/knQTA ZvIxhLLltrdsszpf+YhJEXnzHc24IcS41grHRd4tyfnZSNPTEJ90hADgGeR4ux7QuNIl rUgOPM/hE8kvmIG99Eceg6R1Jv4FYtHkPlD/mXEh7ouZf1c8MMMILt/gt4hsX3kRB+RR AB6DsP/tYKtbYNHCwASohkWlO2JIkczwhQTdUdgZ7W2geOUm/YMd3K1tMfqDfyRdKdy9 MXQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RnqjcVnvz4RjFpJJmYtHRYk8X2HBXSi1U/qjV5hqbTo=; b=MJVowRoCD0jM3Zhj2fbrD4Xr4GG5XrtjyWjR4lah2pZW5qlfi63pLmm0T/59IynFwC NqB9A7G3g6Z73kLQhjuGeEbAxE48nU9znbS53Nf8tkf8MggK8xbQxNxXJYELqXhCSaAj ehfMoIMDxOEybQazEIItAo6SZXc3Vo9995HGBRzg+J9/ngAo2KMpJ0hrtnoQ3i3Hgzsl AC+b0kZVfZmc8oWoXkp6bbmPwmCqYk28pHPV0KIQF1OXw/wJ7sOnVCrLGlR7+n0XM8F3 Gwmqx8tUR1FdkM+cCfCtijqKpTR6puVw9mh53+qg8d0YPtQp9I/uMNssO9mTcGxZReKL VZIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="a/IFoQ0D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o25-20020a635a19000000b004aef9c39bb5si36521983pgb.393.2023.01.19.20.37.08; Thu, 19 Jan 2023 20:37:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="a/IFoQ0D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230215AbjATEgf (ORCPT <rfc822;xxoosimple@gmail.com> + 99 others); Thu, 19 Jan 2023 23:36:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229991AbjATEgD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Jan 2023 23:36:03 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73E54C13E1 for <linux-kernel@vger.kernel.org>; Thu, 19 Jan 2023 20:33:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674189238; x=1705725238; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s0MLNXyh3mQKki8IqbR9mf6okk8J+XoJLrOwzthuHvM=; b=a/IFoQ0DKbYNsYnpzHz8EzqWL3CwX22sWNu89nIbaDyoiBCUJhU9uQbq tunBQNifxpRXmAUqxKo65dkYJzI6NVFo/I3A6bbOKzTk5Q7nD5J64o4PE DnIXJT16B5KkEfs4KlSCXhEc2wqvwuKQBT3kBR6G5qn6OMueQUbtToEi4 M47c5UHViIbpIoUZ4/BKDf0+rhb1DT0KWdq5W2+L3gX1zODwd+70Czo7Y 6FhJ6tOmDxUVJXfWs9rNn+ipdQbD4+sGSKSFWPNBdneBamtw0eVhS4Ott j8v3DCvyWCc5dAJEmlCVLAAPExfjOPob2rzMAjmyZWPSMJU6V6ISN96Qi w==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="411526113" X-IronPort-AV: E=Sophos;i="5.97,229,1669104000"; d="scan'208";a="411526113" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 05:57:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="988993915" X-IronPort-AV: E=Sophos;i="5.97,229,1669104000"; d="scan'208";a="988993915" Received: from black.fi.intel.com (HELO black.fi.intel.com.) ([10.237.72.28]) by fmsmga005.fm.intel.com with ESMTP; 19 Jan 2023 05:57:06 -0800 From: Alexander Shishkin <alexander.shishkin@linux.intel.com> To: mst@redhat.com, jasowang@redhat.com Cc: virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, elena.reshetova@intel.com, kirill.shutemov@linux.intel.com, Andi Kleen <ak@linux.intel.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Amit Shah <amit@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org> Subject: [PATCH v1 2/6] virtio console: Harden port adding Date: Thu, 19 Jan 2023 15:57:17 +0200 Message-Id: <20230119135721.83345-3-alexander.shishkin@linux.intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119135721.83345-1-alexander.shishkin@linux.intel.com> References: <20230119135721.83345-1-alexander.shishkin@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755465518283737002?= X-GMAIL-MSGID: =?utf-8?q?1755514866754087853?= |
Series |
Harden a few virtio bits
|
|
Commit Message
Alexander Shishkin
Jan. 19, 2023, 1:57 p.m. UTC
From: Andi Kleen <ak@linux.intel.com> The ADD_PORT operation reads and sanity checks the port id multiple times from the untrusted host. This is not safe because a malicious host could change it between reads. Read the port id only once and cache it for subsequent uses. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- drivers/char/virtio_console.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Comments
On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > Then you need to copy it out once, and then only deal with the local > > copy. Otherwise you have an incomplete snapshot. > > Ok, would you be partial to something like this: > > >From 1bc9bb84004154376c2a0cf643d53257da6d1cd7 Mon Sep 17 00:00:00 2001 > From: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Date: Thu, 19 Jan 2023 21:59:02 +0200 > Subject: [PATCH] virtio console: Keep a local copy of the control structure > > When handling control messages, instead of peeking at the device memory > to obtain bits of the control structure, take a snapshot of it once and > use it instead, to prevent it from changing under us. This avoids races > between port id validation and control event decoding, which can lead > to, for example, a NULL dereference in port removal of a nonexistent > port. > > The control structure is small enough (8 bytes) that it can be cached > directly on the stack. > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Amit Shah <amit@kernel.org> > --- > drivers/char/virtio_console.c | 29 +++++++++++++++-------------- > 1 file changed, 15 insertions(+), 14 deletions(-) Yes, this looks much better, thanks! Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Thu, Jan 19, 2023 at 03:57:17PM +0200, Alexander Shishkin wrote: > From: Andi Kleen <ak@linux.intel.com> > > The ADD_PORT operation reads and sanity checks the port id multiple > times from the untrusted host. This is not safe because a malicious > host could change it between reads. > > Read the port id only once and cache it for subsequent uses. > > Signed-off-by: Andi Kleen <ak@linux.intel.com> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Amit Shah <amit@kernel.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> I suspect anyone worried about this kind of thing already uses a bounce buffer. No? The patch itself makes the code more readable, except maybe for the READ_ONCE thing. > --- > drivers/char/virtio_console.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c > index f4fd5fe7cd3a..6599c2956ba4 100644 > --- a/drivers/char/virtio_console.c > +++ b/drivers/char/virtio_console.c > @@ -1563,10 +1563,13 @@ static void handle_control_message(struct virtio_device *vdev, > struct port *port; > size_t name_size; > int err; > + unsigned id; > > cpkt = (struct virtio_console_control *)(buf->buf + buf->offset); > > - port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt->id)); > + /* Make sure the host cannot change id under us */ > + id = virtio32_to_cpu(vdev, READ_ONCE(cpkt->id)); > + port = find_port_by_id(portdev, id); > if (!port && > cpkt->event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) { > /* No valid header at start of buffer. Drop it. */ > @@ -1583,15 +1586,14 @@ static void handle_control_message(struct virtio_device *vdev, > send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1); > break; > } > - if (virtio32_to_cpu(vdev, cpkt->id) >= > - portdev->max_nr_ports) { > + if (id >= portdev->max_nr_ports) { > dev_warn(&portdev->vdev->dev, > "Request for adding port with " > "out-of-bound id %u, max. supported id: %u\n", > cpkt->id, portdev->max_nr_ports - 1); > break; > } > - add_port(portdev, virtio32_to_cpu(vdev, cpkt->id)); > + add_port(portdev, id); > break; > case VIRTIO_CONSOLE_PORT_REMOVE: > unplug_port(port); > -- > 2.39.0
On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > Then you need to copy it out once, and then only deal with the local > > copy. Otherwise you have an incomplete snapshot. > > Ok, would you be partial to something like this: > > >From 1bc9bb84004154376c2a0cf643d53257da6d1cd7 Mon Sep 17 00:00:00 2001 > From: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Date: Thu, 19 Jan 2023 21:59:02 +0200 > Subject: [PATCH] virtio console: Keep a local copy of the control structure > > When handling control messages, instead of peeking at the device memory > to obtain bits of the control structure, Except the message makes it seem that we are getting data from device memory, when we do nothing of the kind. > take a snapshot of it once and > use it instead, to prevent it from changing under us. This avoids races > between port id validation and control event decoding, which can lead > to, for example, a NULL dereference in port removal of a nonexistent > port. > > The control structure is small enough (8 bytes) that it can be cached > directly on the stack. I still have no real idea why we want a copy here. If device can poke anywhere at memory then it can crash kernel anyway. If there's a bounce buffer or an iommu or some other protection in place, then this memory can no longer change by the time we look at it. > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Amit Shah <amit@kernel.org> > --- > drivers/char/virtio_console.c | 29 +++++++++++++++-------------- > 1 file changed, 15 insertions(+), 14 deletions(-) > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c > index 6a821118d553..42be0991a72f 100644 > --- a/drivers/char/virtio_console.c > +++ b/drivers/char/virtio_console.c > @@ -1559,23 +1559,24 @@ static void handle_control_message(struct virtio_device *vdev, > struct ports_device *portdev, > struct port_buffer *buf) > { > - struct virtio_console_control *cpkt; > + struct virtio_console_control cpkt; > struct port *port; > size_t name_size; > int err; > > - cpkt = (struct virtio_console_control *)(buf->buf + buf->offset); > + /* Keep a local copy of the control structure */ > + memcpy(&cpkt, buf->buf + buf->offset, sizeof(cpkt)); > > - port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt->id)); > + port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt.id)); > if (!port && > - cpkt->event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) { > + cpkt.event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) { > /* No valid header at start of buffer. Drop it. */ > dev_dbg(&portdev->vdev->dev, > - "Invalid index %u in control packet\n", cpkt->id); > + "Invalid index %u in control packet\n", cpkt.id); > return; > } > > - switch (virtio16_to_cpu(vdev, cpkt->event)) { > + switch (virtio16_to_cpu(vdev, cpkt.event)) { > case VIRTIO_CONSOLE_PORT_ADD: > if (port) { > dev_dbg(&portdev->vdev->dev, > @@ -1583,21 +1584,21 @@ static void handle_control_message(struct virtio_device *vdev, > send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1); > break; > } > - if (virtio32_to_cpu(vdev, cpkt->id) >= > + if (virtio32_to_cpu(vdev, cpkt.id) >= > portdev->max_nr_ports) { > dev_warn(&portdev->vdev->dev, > "Request for adding port with " > "out-of-bound id %u, max. supported id: %u\n", > - cpkt->id, portdev->max_nr_ports - 1); > + cpkt.id, portdev->max_nr_ports - 1); > break; > } > - add_port(portdev, virtio32_to_cpu(vdev, cpkt->id)); > + add_port(portdev, virtio32_to_cpu(vdev, cpkt.id)); > break; > case VIRTIO_CONSOLE_PORT_REMOVE: > unplug_port(port); > break; > case VIRTIO_CONSOLE_CONSOLE_PORT: > - if (!cpkt->value) > + if (!cpkt.value) > break; > if (is_console_port(port)) > break; > @@ -1618,7 +1619,7 @@ static void handle_control_message(struct virtio_device *vdev, > if (!is_console_port(port)) > break; > > - memcpy(&size, buf->buf + buf->offset + sizeof(*cpkt), > + memcpy(&size, buf->buf + buf->offset + sizeof(cpkt), > sizeof(size)); > set_console_size(port, size.rows, size.cols); > > @@ -1627,7 +1628,7 @@ static void handle_control_message(struct virtio_device *vdev, > break; > } > case VIRTIO_CONSOLE_PORT_OPEN: > - port->host_connected = virtio16_to_cpu(vdev, cpkt->value); > + port->host_connected = virtio16_to_cpu(vdev, cpkt.value); > wake_up_interruptible(&port->waitqueue); > /* > * If the host port got closed and the host had any > @@ -1658,7 +1659,7 @@ static void handle_control_message(struct virtio_device *vdev, > * Skip the size of the header and the cpkt to get the size > * of the name that was sent > */ > - name_size = buf->len - buf->offset - sizeof(*cpkt) + 1; > + name_size = buf->len - buf->offset - sizeof(cpkt) + 1; > > port->name = kmalloc(name_size, GFP_KERNEL); > if (!port->name) { > @@ -1666,7 +1667,7 @@ static void handle_control_message(struct virtio_device *vdev, > "Not enough space to store port name\n"); > break; > } > - strncpy(port->name, buf->buf + buf->offset + sizeof(*cpkt), > + strncpy(port->name, buf->buf + buf->offset + sizeof(cpkt), > name_size - 1); > port->name[name_size - 1] = 0; > > -- > 2.39.0
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: >> When handling control messages, instead of peeking at the device memory >> to obtain bits of the control structure, > > Except the message makes it seem that we are getting data from > device memory, when we do nothing of the kind. We can be, see below. >> take a snapshot of it once and >> use it instead, to prevent it from changing under us. This avoids races >> between port id validation and control event decoding, which can lead >> to, for example, a NULL dereference in port removal of a nonexistent >> port. >> >> The control structure is small enough (8 bytes) that it can be cached >> directly on the stack. > > I still have no real idea why we want a copy here. > If device can poke anywhere at memory then it can crash kernel anyway. > If there's a bounce buffer or an iommu or some other protection > in place, then this memory can no longer change by the time > we look at it. We can have shared pages between the host and guest without bounce buffers in between, so they can be both looking directly at the same page. Regards, -- Alex
On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: > >> When handling control messages, instead of peeking at the device memory > >> to obtain bits of the control structure, > > > > Except the message makes it seem that we are getting data from > > device memory, when we do nothing of the kind. > > We can be, see below. > > >> take a snapshot of it once and > >> use it instead, to prevent it from changing under us. This avoids races > >> between port id validation and control event decoding, which can lead > >> to, for example, a NULL dereference in port removal of a nonexistent > >> port. > >> > >> The control structure is small enough (8 bytes) that it can be cached > >> directly on the stack. > > > > I still have no real idea why we want a copy here. > > If device can poke anywhere at memory then it can crash kernel anyway. > > If there's a bounce buffer or an iommu or some other protection > > in place, then this memory can no longer change by the time > > we look at it. > > We can have shared pages between the host and guest without bounce > buffers in between, so they can be both looking directly at the same > page. > > Regards, How does this configuration work? What else is in this page? > -- > Alex
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> >> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: >> >> When handling control messages, instead of peeking at the device memory >> >> to obtain bits of the control structure, >> > >> > Except the message makes it seem that we are getting data from >> > device memory, when we do nothing of the kind. >> >> We can be, see below. >> >> >> take a snapshot of it once and >> >> use it instead, to prevent it from changing under us. This avoids races >> >> between port id validation and control event decoding, which can lead >> >> to, for example, a NULL dereference in port removal of a nonexistent >> >> port. >> >> >> >> The control structure is small enough (8 bytes) that it can be cached >> >> directly on the stack. >> > >> > I still have no real idea why we want a copy here. >> > If device can poke anywhere at memory then it can crash kernel anyway. >> > If there's a bounce buffer or an iommu or some other protection >> > in place, then this memory can no longer change by the time >> > we look at it. >> >> We can have shared pages between the host and guest without bounce >> buffers in between, so they can be both looking directly at the same >> page. >> >> Regards, > > How does this configuration work? What else is in this page? So, for example in TDX, you have certain pages as "shared", as in between guest and hypervisor. You can have virtio ring(s) in such pages. It's likely that there'd be a swiotlb buffer there instead, but sharing pages between host virtio and guest virtio drivers is possible. Apologies if the language is confusing, I hope I'm answering the question. Regards, -- Alex
On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > >> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: > >> >> When handling control messages, instead of peeking at the device memory > >> >> to obtain bits of the control structure, > >> > > >> > Except the message makes it seem that we are getting data from > >> > device memory, when we do nothing of the kind. > >> > >> We can be, see below. > >> > >> >> take a snapshot of it once and > >> >> use it instead, to prevent it from changing under us. This avoids races > >> >> between port id validation and control event decoding, which can lead > >> >> to, for example, a NULL dereference in port removal of a nonexistent > >> >> port. > >> >> > >> >> The control structure is small enough (8 bytes) that it can be cached > >> >> directly on the stack. > >> > > >> > I still have no real idea why we want a copy here. > >> > If device can poke anywhere at memory then it can crash kernel anyway. > >> > If there's a bounce buffer or an iommu or some other protection > >> > in place, then this memory can no longer change by the time > >> > we look at it. > >> > >> We can have shared pages between the host and guest without bounce > >> buffers in between, so they can be both looking directly at the same > >> page. > >> > >> Regards, > > > > How does this configuration work? What else is in this page? > > So, for example in TDX, you have certain pages as "shared", as in > between guest and hypervisor. You can have virtio ring(s) in such > pages. It's likely that there'd be a swiotlb buffer there instead, but > sharing pages between host virtio and guest virtio drivers is possible. If it is shared, then what does this mean? Do we then need to copy everything out of that buffer first before doing anything with it because the data could change later on? Or do we not trust anything in it at all and we throw it away? Or something else (trust for a short while and then we don't?) Please be specific as to what you want to see happen here, and why. thanks, greg k-h
On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > >> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote: > >> >> When handling control messages, instead of peeking at the device memory > >> >> to obtain bits of the control structure, > >> > > >> > Except the message makes it seem that we are getting data from > >> > device memory, when we do nothing of the kind. > >> > >> We can be, see below. > >> > >> >> take a snapshot of it once and > >> >> use it instead, to prevent it from changing under us. This avoids races > >> >> between port id validation and control event decoding, which can lead > >> >> to, for example, a NULL dereference in port removal of a nonexistent > >> >> port. > >> >> > >> >> The control structure is small enough (8 bytes) that it can be cached > >> >> directly on the stack. > >> > > >> > I still have no real idea why we want a copy here. > >> > If device can poke anywhere at memory then it can crash kernel anyway. > >> > If there's a bounce buffer or an iommu or some other protection > >> > in place, then this memory can no longer change by the time > >> > we look at it. > >> > >> We can have shared pages between the host and guest without bounce > >> buffers in between, so they can be both looking directly at the same > >> page. > >> > >> Regards, > > > > How does this configuration work? What else is in this page? > > So, for example in TDX, you have certain pages as "shared", as in > between guest and hypervisor. You can have virtio ring(s) in such > pages. That one's marked as dma coherent. > It's likely that there'd be a swiotlb buffer there instead, but > sharing pages between host virtio and guest virtio drivers is possible. It's not something console does though, does it? > Apologies if the language is confusing, I hope I'm answering the > question. > > Regards, > -- > Alex I'd like an answer to when does the console driver share the buffer in question, not when generally some pages shared.
Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: >> >> We can have shared pages between the host and guest without bounce >> >> buffers in between, so they can be both looking directly at the same >> >> page. >> >> >> >> Regards, >> > >> > How does this configuration work? What else is in this page? >> >> So, for example in TDX, you have certain pages as "shared", as in >> between guest and hypervisor. You can have virtio ring(s) in such >> pages. It's likely that there'd be a swiotlb buffer there instead, but >> sharing pages between host virtio and guest virtio drivers is possible. > > If it is shared, then what does this mean? Do we then need to copy > everything out of that buffer first before doing anything with it > because the data could change later on? Or do we not trust anything in > it at all and we throw it away? Or something else (trust for a short > while and then we don't?) The first one, we need a consistent view of the metadata (the ckpt in this case), so we take a snapshot of it. Then, we validate it (because we don't trust it) to be correct. If it is not, we discard it, otherwise we act on it. Since this is a ring, we just move on to the next record if there is one. Meanwhile, in the shared page, it can change from correct to incorrect, but it won't affect us because we have this consistent view at the moment the snapshot was taken. > Please be specific as to what you want to see happen here, and why. For example, if we get a control message to add a port and cpkt->event==PORT_ADD, we skip validation of cpkt->id (port id), because we're intending to add a new one. At this point, the device can change cpkt->event to PORT_REMOVE, which does require a valid cpkt->id and the subsequent code runs into a NULL dereference on the port value, which should have been looked up from cpkt->id. Now, if we take a snapshot of cpkt, we naturally don't have this problem, because we're looking at a consistent state of cpkt: it's either PORT_ADD or PORT_REMOVE all the way. Which is what this patch does. Does this answer your question? Thanks, -- Alex
On Fri, Jan 27, 2023 at 04:17:46PM +0200, Alexander Shishkin wrote: > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: > >> >> We can have shared pages between the host and guest without bounce > >> >> buffers in between, so they can be both looking directly at the same > >> >> page. > >> >> > >> >> Regards, > >> > > >> > How does this configuration work? What else is in this page? > >> > >> So, for example in TDX, you have certain pages as "shared", as in > >> between guest and hypervisor. You can have virtio ring(s) in such > >> pages. It's likely that there'd be a swiotlb buffer there instead, but > >> sharing pages between host virtio and guest virtio drivers is possible. > > > > If it is shared, then what does this mean? Do we then need to copy > > everything out of that buffer first before doing anything with it > > because the data could change later on? Or do we not trust anything in > > it at all and we throw it away? Or something else (trust for a short > > while and then we don't?) > > The first one, we need a consistent view of the metadata (the ckpt in > this case), so we take a snapshot of it. Then, we validate it (because > we don't trust it) to be correct. If it is not, we discard it, otherwise > we act on it. Since this is a ring, we just move on to the next record > if there is one. So you do an additional extra copy of everything, making the bounce buffer useless? :) > Meanwhile, in the shared page, it can change from correct to incorrect, > but it won't affect us because we have this consistent view at the > moment the snapshot was taken. Wonderful, copy everything out then, the whole page, don't do it piecemeal field by field. And then justify it to everyone whose throughput you just tanked... good luck! greg k-h
On Fri, Jan 27, 2023 at 04:17:46PM +0200, Alexander Shishkin wrote: > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: > >> >> We can have shared pages between the host and guest without bounce > >> >> buffers in between, so they can be both looking directly at the same > >> >> page. > >> >> > >> >> Regards, > >> > > >> > How does this configuration work? What else is in this page? > >> > >> So, for example in TDX, you have certain pages as "shared", as in > >> between guest and hypervisor. You can have virtio ring(s) in such > >> pages. It's likely that there'd be a swiotlb buffer there instead, but > >> sharing pages between host virtio and guest virtio drivers is possible. > > > > If it is shared, then what does this mean? Do we then need to copy > > everything out of that buffer first before doing anything with it > > because the data could change later on? Or do we not trust anything in > > it at all and we throw it away? Or something else (trust for a short > > while and then we don't?) > > The first one, we need a consistent view of the metadata (the ckpt in > this case), so we take a snapshot of it. Then, we validate it (because > we don't trust it) to be correct. If it is not, we discard it, otherwise > we act on it. Since this is a ring, we just move on to the next record > if there is one. > > Meanwhile, in the shared page, it can change from correct to incorrect, > but it won't affect us because we have this consistent view at the > moment the snapshot was taken. > > > Please be specific as to what you want to see happen here, and why. > > For example, if we get a control message to add a port and > cpkt->event==PORT_ADD, we skip validation of cpkt->id (port id), because > we're intending to add a new one. At this point, the device can change > cpkt->event to PORT_REMOVE, which does require a valid cpkt->id and the > subsequent code runs into a NULL dereference on the port value, which > should have been looked up from cpkt->id. > > Now, if we take a snapshot of cpkt, we naturally don't have this > problem, because we're looking at a consistent state of cpkt: it's > either PORT_ADD or PORT_REMOVE all the way. Which is what this patch > does. > > Does this answer your question? > > Thanks, > -- > Alex Not sure about Greg but it doesn't answer my question because either the bad device has access to all memory at which point it's not clear why is it changing cpkt->event and not e.g. stack. Or it's restricted to only access memory when mapped through the DMA API. Which is not the case here.
> On Fri, Jan 27, 2023 at 04:17:46PM +0200, Alexander Shishkin wrote: > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > > > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote: > > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > > >> > > >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote: > > >> >> We can have shared pages between the host and guest without bounce > > >> >> buffers in between, so they can be both looking directly at the same > > >> >> page. > > >> >> > > >> >> Regards, > > >> > > > >> > How does this configuration work? What else is in this page? > > >> > > >> So, for example in TDX, you have certain pages as "shared", as in > > >> between guest and hypervisor. You can have virtio ring(s) in such > > >> pages. It's likely that there'd be a swiotlb buffer there instead, but > > >> sharing pages between host virtio and guest virtio drivers is possible. > > > > > > If it is shared, then what does this mean? Do we then need to copy > > > everything out of that buffer first before doing anything with it > > > because the data could change later on? Or do we not trust anything in > > > it at all and we throw it away? Or something else (trust for a short > > > while and then we don't?) > > > > The first one, we need a consistent view of the metadata (the ckpt in > > this case), so we take a snapshot of it. Then, we validate it (because > > we don't trust it) to be correct. If it is not, we discard it, otherwise > > we act on it. Since this is a ring, we just move on to the next record > > if there is one. > > > > Meanwhile, in the shared page, it can change from correct to incorrect, > > but it won't affect us because we have this consistent view at the > > moment the snapshot was taken. > > > > > Please be specific as to what you want to see happen here, and why. > > > > For example, if we get a control message to add a port and > > cpkt->event==PORT_ADD, we skip validation of cpkt->id (port id), because > > we're intending to add a new one. At this point, the device can change > > cpkt->event to PORT_REMOVE, which does require a valid cpkt->id and the > > subsequent code runs into a NULL dereference on the port value, which > > should have been looked up from cpkt->id. > > > > Now, if we take a snapshot of cpkt, we naturally don't have this > > problem, because we're looking at a consistent state of cpkt: it's > > either PORT_ADD or PORT_REMOVE all the way. Which is what this patch > > does. > > > > Does this answer your question? > > > > Thanks, > > -- > > Alex > > > Not sure about Greg but it doesn't answer my question because either the > bad device has access to all memory at which point it's not clear why > is it changing cpkt->event and not e.g. stack. Or it's restricted to > only access memory when mapped through the DMA API. Which is not the > case here. We do enforce virtio usage via DMA API only for TDX guest. Alex has a patch queued for that also. But not sure if this addresses your concern here. Best Regards, Elena.
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index f4fd5fe7cd3a..6599c2956ba4 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -1563,10 +1563,13 @@ static void handle_control_message(struct virtio_device *vdev, struct port *port; size_t name_size; int err; + unsigned id; cpkt = (struct virtio_console_control *)(buf->buf + buf->offset); - port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt->id)); + /* Make sure the host cannot change id under us */ + id = virtio32_to_cpu(vdev, READ_ONCE(cpkt->id)); + port = find_port_by_id(portdev, id); if (!port && cpkt->event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) { /* No valid header at start of buffer. Drop it. */ @@ -1583,15 +1586,14 @@ static void handle_control_message(struct virtio_device *vdev, send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1); break; } - if (virtio32_to_cpu(vdev, cpkt->id) >= - portdev->max_nr_ports) { + if (id >= portdev->max_nr_ports) { dev_warn(&portdev->vdev->dev, "Request for adding port with " "out-of-bound id %u, max. supported id: %u\n", cpkt->id, portdev->max_nr_ports - 1); break; } - add_port(portdev, virtio32_to_cpu(vdev, cpkt->id)); + add_port(portdev, id); break; case VIRTIO_CONSOLE_PORT_REMOVE: unplug_port(port);