[v1,2/6] virtio console: Harden port adding

Message ID 20230119135721.83345-3-alexander.shishkin@linux.intel.com
State New
Headers
Series Harden a few virtio bits |

Commit Message

Alexander Shishkin Jan. 19, 2023, 1:57 p.m. UTC
  From: Andi Kleen <ak@linux.intel.com>

The ADD_PORT operation reads and sanity checks the port id multiple
times from the untrusted host. This is not safe because a malicious
host could change it between reads.

Read the port id only once and cache it for subsequent uses.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/char/virtio_console.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
  

Comments

Greg KH Jan. 20, 2023, 7:15 a.m. UTC | #1
On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> > Then you need to copy it out once, and then only deal with the local
> > copy.  Otherwise you have an incomplete snapshot.
> 
> Ok, would you be partial to something like this:
> 
> >From 1bc9bb84004154376c2a0cf643d53257da6d1cd7 Mon Sep 17 00:00:00 2001
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Date: Thu, 19 Jan 2023 21:59:02 +0200
> Subject: [PATCH] virtio console: Keep a local copy of the control structure
> 
> When handling control messages, instead of peeking at the device memory
> to obtain bits of the control structure, take a snapshot of it once and
> use it instead, to prevent it from changing under us. This avoids races
> between port id validation and control event decoding, which can lead
> to, for example, a NULL dereference in port removal of a nonexistent
> port.
> 
> The control structure is small enough (8 bytes) that it can be cached
> directly on the stack.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Amit Shah <amit@kernel.org>
> ---
>  drivers/char/virtio_console.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)

Yes, this looks much better, thanks!

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
  
Michael S. Tsirkin Jan. 20, 2023, 12:59 p.m. UTC | #2
On Thu, Jan 19, 2023 at 03:57:17PM +0200, Alexander Shishkin wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> The ADD_PORT operation reads and sanity checks the port id multiple
> times from the untrusted host. This is not safe because a malicious
> host could change it between reads.
> 
> Read the port id only once and cache it for subsequent uses.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Amit Shah <amit@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


I suspect anyone worried about this kind of thing already uses a bounce
buffer. No?  The patch itself makes the code more readable, except maybe
for the READ_ONCE thing.


> ---
>  drivers/char/virtio_console.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index f4fd5fe7cd3a..6599c2956ba4 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -1563,10 +1563,13 @@ static void handle_control_message(struct virtio_device *vdev,
>  	struct port *port;
>  	size_t name_size;
>  	int err;
> +	unsigned id;
>  
>  	cpkt = (struct virtio_console_control *)(buf->buf + buf->offset);
>  
> -	port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt->id));
> +	/* Make sure the host cannot change id under us */
> +	id = virtio32_to_cpu(vdev, READ_ONCE(cpkt->id));
> +	port = find_port_by_id(portdev, id);
>  	if (!port &&
>  	    cpkt->event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) {
>  		/* No valid header at start of buffer.  Drop it. */
> @@ -1583,15 +1586,14 @@ static void handle_control_message(struct virtio_device *vdev,
>  			send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1);
>  			break;
>  		}
> -		if (virtio32_to_cpu(vdev, cpkt->id) >=
> -		    portdev->max_nr_ports) {
> +		if (id >= portdev->max_nr_ports) {
>  			dev_warn(&portdev->vdev->dev,
>  				"Request for adding port with "
>  				"out-of-bound id %u, max. supported id: %u\n",
>  				cpkt->id, portdev->max_nr_ports - 1);
>  			break;
>  		}
> -		add_port(portdev, virtio32_to_cpu(vdev, cpkt->id));
> +		add_port(portdev, id);
>  		break;
>  	case VIRTIO_CONSOLE_PORT_REMOVE:
>  		unplug_port(port);
> -- 
> 2.39.0
  
Michael S. Tsirkin Jan. 27, 2023, 11:02 a.m. UTC | #3
On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> > Then you need to copy it out once, and then only deal with the local
> > copy.  Otherwise you have an incomplete snapshot.
> 
> Ok, would you be partial to something like this:
> 
> >From 1bc9bb84004154376c2a0cf643d53257da6d1cd7 Mon Sep 17 00:00:00 2001
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Date: Thu, 19 Jan 2023 21:59:02 +0200
> Subject: [PATCH] virtio console: Keep a local copy of the control structure
> 
> When handling control messages, instead of peeking at the device memory
> to obtain bits of the control structure,

Except the message makes it seem that we are getting data from
device memory, when we do nothing of the kind.

> take a snapshot of it once and
> use it instead, to prevent it from changing under us. This avoids races
> between port id validation and control event decoding, which can lead
> to, for example, a NULL dereference in port removal of a nonexistent
> port.
> 
> The control structure is small enough (8 bytes) that it can be cached
> directly on the stack.

I still have no real idea why we want a copy here.
If device can poke anywhere at memory then it can crash kernel anyway.
If there's a bounce buffer or an iommu or some other protection
in place, then this memory can no longer change by the time
we look at it.

> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Amit Shah <amit@kernel.org>
> ---
>  drivers/char/virtio_console.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index 6a821118d553..42be0991a72f 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -1559,23 +1559,24 @@ static void handle_control_message(struct virtio_device *vdev,
>  				   struct ports_device *portdev,
>  				   struct port_buffer *buf)
>  {
> -	struct virtio_console_control *cpkt;
> +	struct virtio_console_control cpkt;
>  	struct port *port;
>  	size_t name_size;
>  	int err;
>  
> -	cpkt = (struct virtio_console_control *)(buf->buf + buf->offset);
> +	/* Keep a local copy of the control structure */
> +	memcpy(&cpkt, buf->buf + buf->offset, sizeof(cpkt));
>  
> -	port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt->id));
> +	port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt.id));
>  	if (!port &&
> -	    cpkt->event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) {
> +	    cpkt.event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) {
>  		/* No valid header at start of buffer.  Drop it. */
>  		dev_dbg(&portdev->vdev->dev,
> -			"Invalid index %u in control packet\n", cpkt->id);
> +			"Invalid index %u in control packet\n", cpkt.id);
>  		return;
>  	}
>  
> -	switch (virtio16_to_cpu(vdev, cpkt->event)) {
> +	switch (virtio16_to_cpu(vdev, cpkt.event)) {
>  	case VIRTIO_CONSOLE_PORT_ADD:
>  		if (port) {
>  			dev_dbg(&portdev->vdev->dev,
> @@ -1583,21 +1584,21 @@ static void handle_control_message(struct virtio_device *vdev,
>  			send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1);
>  			break;
>  		}
> -		if (virtio32_to_cpu(vdev, cpkt->id) >=
> +		if (virtio32_to_cpu(vdev, cpkt.id) >=
>  		    portdev->max_nr_ports) {
>  			dev_warn(&portdev->vdev->dev,
>  				"Request for adding port with "
>  				"out-of-bound id %u, max. supported id: %u\n",
> -				cpkt->id, portdev->max_nr_ports - 1);
> +				cpkt.id, portdev->max_nr_ports - 1);
>  			break;
>  		}
> -		add_port(portdev, virtio32_to_cpu(vdev, cpkt->id));
> +		add_port(portdev, virtio32_to_cpu(vdev, cpkt.id));
>  		break;
>  	case VIRTIO_CONSOLE_PORT_REMOVE:
>  		unplug_port(port);
>  		break;
>  	case VIRTIO_CONSOLE_CONSOLE_PORT:
> -		if (!cpkt->value)
> +		if (!cpkt.value)
>  			break;
>  		if (is_console_port(port))
>  			break;
> @@ -1618,7 +1619,7 @@ static void handle_control_message(struct virtio_device *vdev,
>  		if (!is_console_port(port))
>  			break;
>  
> -		memcpy(&size, buf->buf + buf->offset + sizeof(*cpkt),
> +		memcpy(&size, buf->buf + buf->offset + sizeof(cpkt),
>  		       sizeof(size));
>  		set_console_size(port, size.rows, size.cols);
>  
> @@ -1627,7 +1628,7 @@ static void handle_control_message(struct virtio_device *vdev,
>  		break;
>  	}
>  	case VIRTIO_CONSOLE_PORT_OPEN:
> -		port->host_connected = virtio16_to_cpu(vdev, cpkt->value);
> +		port->host_connected = virtio16_to_cpu(vdev, cpkt.value);
>  		wake_up_interruptible(&port->waitqueue);
>  		/*
>  		 * If the host port got closed and the host had any
> @@ -1658,7 +1659,7 @@ static void handle_control_message(struct virtio_device *vdev,
>  		 * Skip the size of the header and the cpkt to get the size
>  		 * of the name that was sent
>  		 */
> -		name_size = buf->len - buf->offset - sizeof(*cpkt) + 1;
> +		name_size = buf->len - buf->offset - sizeof(cpkt) + 1;
>  
>  		port->name = kmalloc(name_size, GFP_KERNEL);
>  		if (!port->name) {
> @@ -1666,7 +1667,7 @@ static void handle_control_message(struct virtio_device *vdev,
>  				"Not enough space to store port name\n");
>  			break;
>  		}
> -		strncpy(port->name, buf->buf + buf->offset + sizeof(*cpkt),
> +		strncpy(port->name, buf->buf + buf->offset + sizeof(cpkt),
>  			name_size - 1);
>  		port->name[name_size - 1] = 0;
>  
> -- 
> 2.39.0
  
Alexander Shishkin Jan. 27, 2023, 11:55 a.m. UTC | #4
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
>> When handling control messages, instead of peeking at the device memory
>> to obtain bits of the control structure,
>
> Except the message makes it seem that we are getting data from
> device memory, when we do nothing of the kind.

We can be, see below.

>> take a snapshot of it once and
>> use it instead, to prevent it from changing under us. This avoids races
>> between port id validation and control event decoding, which can lead
>> to, for example, a NULL dereference in port removal of a nonexistent
>> port.
>> 
>> The control structure is small enough (8 bytes) that it can be cached
>> directly on the stack.
>
> I still have no real idea why we want a copy here.
> If device can poke anywhere at memory then it can crash kernel anyway.
> If there's a bounce buffer or an iommu or some other protection
> in place, then this memory can no longer change by the time
> we look at it.

We can have shared pages between the host and guest without bounce
buffers in between, so they can be both looking directly at the same
page.

Regards,
--
Alex
  
Michael S. Tsirkin Jan. 27, 2023, 12:12 p.m. UTC | #5
On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
> >> When handling control messages, instead of peeking at the device memory
> >> to obtain bits of the control structure,
> >
> > Except the message makes it seem that we are getting data from
> > device memory, when we do nothing of the kind.
> 
> We can be, see below.
> 
> >> take a snapshot of it once and
> >> use it instead, to prevent it from changing under us. This avoids races
> >> between port id validation and control event decoding, which can lead
> >> to, for example, a NULL dereference in port removal of a nonexistent
> >> port.
> >> 
> >> The control structure is small enough (8 bytes) that it can be cached
> >> directly on the stack.
> >
> > I still have no real idea why we want a copy here.
> > If device can poke anywhere at memory then it can crash kernel anyway.
> > If there's a bounce buffer or an iommu or some other protection
> > in place, then this memory can no longer change by the time
> > we look at it.
> 
> We can have shared pages between the host and guest without bounce
> buffers in between, so they can be both looking directly at the same
> page.
> 
> Regards,

How does this configuration work? What else is in this page?

> --
> Alex
  
Alexander Shishkin Jan. 27, 2023, 12:47 p.m. UTC | #6
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
>> >> When handling control messages, instead of peeking at the device memory
>> >> to obtain bits of the control structure,
>> >
>> > Except the message makes it seem that we are getting data from
>> > device memory, when we do nothing of the kind.
>> 
>> We can be, see below.
>> 
>> >> take a snapshot of it once and
>> >> use it instead, to prevent it from changing under us. This avoids races
>> >> between port id validation and control event decoding, which can lead
>> >> to, for example, a NULL dereference in port removal of a nonexistent
>> >> port.
>> >> 
>> >> The control structure is small enough (8 bytes) that it can be cached
>> >> directly on the stack.
>> >
>> > I still have no real idea why we want a copy here.
>> > If device can poke anywhere at memory then it can crash kernel anyway.
>> > If there's a bounce buffer or an iommu or some other protection
>> > in place, then this memory can no longer change by the time
>> > we look at it.
>> 
>> We can have shared pages between the host and guest without bounce
>> buffers in between, so they can be both looking directly at the same
>> page.
>> 
>> Regards,
>
> How does this configuration work? What else is in this page?

So, for example in TDX, you have certain pages as "shared", as in
between guest and hypervisor. You can have virtio ring(s) in such
pages. It's likely that there'd be a swiotlb buffer there instead, but
sharing pages between host virtio and guest virtio drivers is possible.

Apologies if the language is confusing, I hope I'm answering the
question.

Regards,
--
Alex
  
Greg KH Jan. 27, 2023, 1:31 p.m. UTC | #7
On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
> >> >> When handling control messages, instead of peeking at the device memory
> >> >> to obtain bits of the control structure,
> >> >
> >> > Except the message makes it seem that we are getting data from
> >> > device memory, when we do nothing of the kind.
> >> 
> >> We can be, see below.
> >> 
> >> >> take a snapshot of it once and
> >> >> use it instead, to prevent it from changing under us. This avoids races
> >> >> between port id validation and control event decoding, which can lead
> >> >> to, for example, a NULL dereference in port removal of a nonexistent
> >> >> port.
> >> >> 
> >> >> The control structure is small enough (8 bytes) that it can be cached
> >> >> directly on the stack.
> >> >
> >> > I still have no real idea why we want a copy here.
> >> > If device can poke anywhere at memory then it can crash kernel anyway.
> >> > If there's a bounce buffer or an iommu or some other protection
> >> > in place, then this memory can no longer change by the time
> >> > we look at it.
> >> 
> >> We can have shared pages between the host and guest without bounce
> >> buffers in between, so they can be both looking directly at the same
> >> page.
> >> 
> >> Regards,
> >
> > How does this configuration work? What else is in this page?
> 
> So, for example in TDX, you have certain pages as "shared", as in
> between guest and hypervisor. You can have virtio ring(s) in such
> pages. It's likely that there'd be a swiotlb buffer there instead, but
> sharing pages between host virtio and guest virtio drivers is possible.

If it is shared, then what does this mean?  Do we then need to copy
everything out of that buffer first before doing anything with it
because the data could change later on?  Or do we not trust anything in
it at all and we throw it away?  Or something else (trust for a short
while and then we don't?)

Please be specific as to what you want to see happen here, and why.

thanks,

greg k-h
  
Michael S. Tsirkin Jan. 27, 2023, 1:52 p.m. UTC | #8
On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Thu, Jan 19, 2023 at 10:13:18PM +0200, Alexander Shishkin wrote:
> >> >> When handling control messages, instead of peeking at the device memory
> >> >> to obtain bits of the control structure,
> >> >
> >> > Except the message makes it seem that we are getting data from
> >> > device memory, when we do nothing of the kind.
> >> 
> >> We can be, see below.
> >> 
> >> >> take a snapshot of it once and
> >> >> use it instead, to prevent it from changing under us. This avoids races
> >> >> between port id validation and control event decoding, which can lead
> >> >> to, for example, a NULL dereference in port removal of a nonexistent
> >> >> port.
> >> >> 
> >> >> The control structure is small enough (8 bytes) that it can be cached
> >> >> directly on the stack.
> >> >
> >> > I still have no real idea why we want a copy here.
> >> > If device can poke anywhere at memory then it can crash kernel anyway.
> >> > If there's a bounce buffer or an iommu or some other protection
> >> > in place, then this memory can no longer change by the time
> >> > we look at it.
> >> 
> >> We can have shared pages between the host and guest without bounce
> >> buffers in between, so they can be both looking directly at the same
> >> page.
> >> 
> >> Regards,
> >
> > How does this configuration work? What else is in this page?
> 
> So, for example in TDX, you have certain pages as "shared", as in
> between guest and hypervisor. You can have virtio ring(s) in such
> pages.

That one's marked as dma coherent.

> It's likely that there'd be a swiotlb buffer there instead, but
> sharing pages between host virtio and guest virtio drivers is possible.

It's not something console does though, does it?

> Apologies if the language is confusing, I hope I'm answering the
> question.
> 
> Regards,
> --
> Alex

I'd like an answer to when does the console driver share the buffer
in question, not when generally some pages shared.
  
Alexander Shishkin Jan. 27, 2023, 2:17 p.m. UTC | #9
Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:

> On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
>> >> We can have shared pages between the host and guest without bounce
>> >> buffers in between, so they can be both looking directly at the same
>> >> page.
>> >> 
>> >> Regards,
>> >
>> > How does this configuration work? What else is in this page?
>> 
>> So, for example in TDX, you have certain pages as "shared", as in
>> between guest and hypervisor. You can have virtio ring(s) in such
>> pages. It's likely that there'd be a swiotlb buffer there instead, but
>> sharing pages between host virtio and guest virtio drivers is possible.
>
> If it is shared, then what does this mean?  Do we then need to copy
> everything out of that buffer first before doing anything with it
> because the data could change later on?  Or do we not trust anything in
> it at all and we throw it away?  Or something else (trust for a short
> while and then we don't?)

The first one, we need a consistent view of the metadata (the ckpt in
this case), so we take a snapshot of it. Then, we validate it (because
we don't trust it) to be correct. If it is not, we discard it, otherwise
we act on it. Since this is a ring, we just move on to the next record
if there is one.

Meanwhile, in the shared page, it can change from correct to incorrect,
but it won't affect us because we have this consistent view at the
moment the snapshot was taken.

> Please be specific as to what you want to see happen here, and why.

For example, if we get a control message to add a port and
cpkt->event==PORT_ADD, we skip validation of cpkt->id (port id), because
we're intending to add a new one. At this point, the device can change
cpkt->event to PORT_REMOVE, which does require a valid cpkt->id and the
subsequent code runs into a NULL dereference on the port value, which
should have been looked up from cpkt->id.

Now, if we take a snapshot of cpkt, we naturally don't have this
problem, because we're looking at a consistent state of cpkt: it's
either PORT_ADD or PORT_REMOVE all the way. Which is what this patch
does.

Does this answer your question?

Thanks,
--
Alex
  
Greg KH Jan. 27, 2023, 2:37 p.m. UTC | #10
On Fri, Jan 27, 2023 at 04:17:46PM +0200, Alexander Shishkin wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
> >> >> We can have shared pages between the host and guest without bounce
> >> >> buffers in between, so they can be both looking directly at the same
> >> >> page.
> >> >> 
> >> >> Regards,
> >> >
> >> > How does this configuration work? What else is in this page?
> >> 
> >> So, for example in TDX, you have certain pages as "shared", as in
> >> between guest and hypervisor. You can have virtio ring(s) in such
> >> pages. It's likely that there'd be a swiotlb buffer there instead, but
> >> sharing pages between host virtio and guest virtio drivers is possible.
> >
> > If it is shared, then what does this mean?  Do we then need to copy
> > everything out of that buffer first before doing anything with it
> > because the data could change later on?  Or do we not trust anything in
> > it at all and we throw it away?  Or something else (trust for a short
> > while and then we don't?)
> 
> The first one, we need a consistent view of the metadata (the ckpt in
> this case), so we take a snapshot of it. Then, we validate it (because
> we don't trust it) to be correct. If it is not, we discard it, otherwise
> we act on it. Since this is a ring, we just move on to the next record
> if there is one.

So you do an additional extra copy of everything, making the bounce
buffer useless?  :)

> Meanwhile, in the shared page, it can change from correct to incorrect,
> but it won't affect us because we have this consistent view at the
> moment the snapshot was taken.

Wonderful, copy everything out then, the whole page, don't do it
piecemeal field by field.  And then justify it to everyone whose
throughput you just tanked...

good luck!

greg k-h
  
Michael S. Tsirkin Jan. 27, 2023, 2:46 p.m. UTC | #11
On Fri, Jan 27, 2023 at 04:17:46PM +0200, Alexander Shishkin wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
> >> >> We can have shared pages between the host and guest without bounce
> >> >> buffers in between, so they can be both looking directly at the same
> >> >> page.
> >> >> 
> >> >> Regards,
> >> >
> >> > How does this configuration work? What else is in this page?
> >> 
> >> So, for example in TDX, you have certain pages as "shared", as in
> >> between guest and hypervisor. You can have virtio ring(s) in such
> >> pages. It's likely that there'd be a swiotlb buffer there instead, but
> >> sharing pages between host virtio and guest virtio drivers is possible.
> >
> > If it is shared, then what does this mean?  Do we then need to copy
> > everything out of that buffer first before doing anything with it
> > because the data could change later on?  Or do we not trust anything in
> > it at all and we throw it away?  Or something else (trust for a short
> > while and then we don't?)
> 
> The first one, we need a consistent view of the metadata (the ckpt in
> this case), so we take a snapshot of it. Then, we validate it (because
> we don't trust it) to be correct. If it is not, we discard it, otherwise
> we act on it. Since this is a ring, we just move on to the next record
> if there is one.
> 
> Meanwhile, in the shared page, it can change from correct to incorrect,
> but it won't affect us because we have this consistent view at the
> moment the snapshot was taken.
> 
> > Please be specific as to what you want to see happen here, and why.
> 
> For example, if we get a control message to add a port and
> cpkt->event==PORT_ADD, we skip validation of cpkt->id (port id), because
> we're intending to add a new one. At this point, the device can change
> cpkt->event to PORT_REMOVE, which does require a valid cpkt->id and the
> subsequent code runs into a NULL dereference on the port value, which
> should have been looked up from cpkt->id.
> 
> Now, if we take a snapshot of cpkt, we naturally don't have this
> problem, because we're looking at a consistent state of cpkt: it's
> either PORT_ADD or PORT_REMOVE all the way. Which is what this patch
> does.
> 
> Does this answer your question?
> 
> Thanks,
> --
> Alex


Not sure about Greg but it doesn't answer my question because either the
bad device has access to all memory at which point it's not clear why
is it changing cpkt->event and not e.g. stack. Or it's restricted to
only access memory when mapped through the DMA API. Which is not the
case here.
  
Reshetova, Elena Feb. 2, 2023, 12:02 p.m. UTC | #12
> On Fri, Jan 27, 2023 at 04:17:46PM +0200, Alexander Shishkin wrote:
> > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> >
> > > On Fri, Jan 27, 2023 at 02:47:55PM +0200, Alexander Shishkin wrote:
> > >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > >>
> > >> > On Fri, Jan 27, 2023 at 01:55:43PM +0200, Alexander Shishkin wrote:
> > >> >> We can have shared pages between the host and guest without bounce
> > >> >> buffers in between, so they can be both looking directly at the same
> > >> >> page.
> > >> >>
> > >> >> Regards,
> > >> >
> > >> > How does this configuration work? What else is in this page?
> > >>
> > >> So, for example in TDX, you have certain pages as "shared", as in
> > >> between guest and hypervisor. You can have virtio ring(s) in such
> > >> pages. It's likely that there'd be a swiotlb buffer there instead, but
> > >> sharing pages between host virtio and guest virtio drivers is possible.
> > >
> > > If it is shared, then what does this mean?  Do we then need to copy
> > > everything out of that buffer first before doing anything with it
> > > because the data could change later on?  Or do we not trust anything in
> > > it at all and we throw it away?  Or something else (trust for a short
> > > while and then we don't?)
> >
> > The first one, we need a consistent view of the metadata (the ckpt in
> > this case), so we take a snapshot of it. Then, we validate it (because
> > we don't trust it) to be correct. If it is not, we discard it, otherwise
> > we act on it. Since this is a ring, we just move on to the next record
> > if there is one.
> >
> > Meanwhile, in the shared page, it can change from correct to incorrect,
> > but it won't affect us because we have this consistent view at the
> > moment the snapshot was taken.
> >
> > > Please be specific as to what you want to see happen here, and why.
> >
> > For example, if we get a control message to add a port and
> > cpkt->event==PORT_ADD, we skip validation of cpkt->id (port id), because
> > we're intending to add a new one. At this point, the device can change
> > cpkt->event to PORT_REMOVE, which does require a valid cpkt->id and the
> > subsequent code runs into a NULL dereference on the port value, which
> > should have been looked up from cpkt->id.
> >
> > Now, if we take a snapshot of cpkt, we naturally don't have this
> > problem, because we're looking at a consistent state of cpkt: it's
> > either PORT_ADD or PORT_REMOVE all the way. Which is what this patch
> > does.
> >
> > Does this answer your question?
> >
> > Thanks,
> > --
> > Alex
> 
> 
> Not sure about Greg but it doesn't answer my question because either the
> bad device has access to all memory at which point it's not clear why
> is it changing cpkt->event and not e.g. stack. Or it's restricted to
> only access memory when mapped through the DMA API. Which is not the
> case here.

We do enforce virtio usage via DMA API only for TDX guest. Alex has a patch
queued for that also. 
But not sure if this addresses your concern here. 

Best Regards,
Elena.
  

Patch

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index f4fd5fe7cd3a..6599c2956ba4 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1563,10 +1563,13 @@  static void handle_control_message(struct virtio_device *vdev,
 	struct port *port;
 	size_t name_size;
 	int err;
+	unsigned id;
 
 	cpkt = (struct virtio_console_control *)(buf->buf + buf->offset);
 
-	port = find_port_by_id(portdev, virtio32_to_cpu(vdev, cpkt->id));
+	/* Make sure the host cannot change id under us */
+	id = virtio32_to_cpu(vdev, READ_ONCE(cpkt->id));
+	port = find_port_by_id(portdev, id);
 	if (!port &&
 	    cpkt->event != cpu_to_virtio16(vdev, VIRTIO_CONSOLE_PORT_ADD)) {
 		/* No valid header at start of buffer.  Drop it. */
@@ -1583,15 +1586,14 @@  static void handle_control_message(struct virtio_device *vdev,
 			send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1);
 			break;
 		}
-		if (virtio32_to_cpu(vdev, cpkt->id) >=
-		    portdev->max_nr_ports) {
+		if (id >= portdev->max_nr_ports) {
 			dev_warn(&portdev->vdev->dev,
 				"Request for adding port with "
 				"out-of-bound id %u, max. supported id: %u\n",
 				cpkt->id, portdev->max_nr_ports - 1);
 			break;
 		}
-		add_port(portdev, virtio32_to_cpu(vdev, cpkt->id));
+		add_port(portdev, id);
 		break;
 	case VIRTIO_CONSOLE_PORT_REMOVE:
 		unplug_port(port);