diff mbox series

[1/2] PCI/MSI: Cache the MSIX table size

Message ID	20230119170633.40944-2-alexander.shishkin@linux.intel.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Alexander Shishkin <alexander.shishkin@linux.intel.com> To: Bjorn Helgaas <bhelgaas@google.com>, Thomas Gleixner <tglx@linutronix.de>, linux-pci@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Marc Zyngier <maz@kernel.org>, darwi@linutronix.de, elena.reshetova@intel.com, kirill.shutemov@linux.intel.com, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Mika Westerberg <mika.westerberg@linux.intel.com>, stable@vger.kernel.org Subject: [PATCH 1/2] PCI/MSI: Cache the MSIX table size Date: Thu, 19 Jan 2023 19:06:32 +0200 Message-Id: <20230119170633.40944-2-alexander.shishkin@linux.intel.com> In-Reply-To: <20230119170633.40944-1-alexander.shishkin@linux.intel.com> References: <20230119170633.40944-1-alexander.shishkin@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	PCI/MSI: Hardening fixes \| [0/2] PCI/MSI: Hardening fixes [1/2] PCI/MSI: Cache the MSIX table size [2/2] PCI/MSI: Validate device supplied MSI table offset and size

Commit Message

Alexander Shishkin Jan. 19, 2023, 5:06 p.m. UTC

  A malicious device can change its MSIX table size between the table
ioremap() and subsequent accesses, resulting in a kernel page fault in
pci_write_msg_msix().

To avoid this, cache the table size observed at the moment of table
ioremap() and use the cached value. This, however, does not help drivers
that peek at the PCIE_MSIX_FLAGS register directly.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/pci/msi/api.c | 7 ++++++-
 drivers/pci/msi/msi.c | 2 +-
 include/linux/pci.h   | 1 +
 3 files changed, 8 insertions(+), 2 deletions(-)

Comments

Leon Romanovsky Jan. 22, 2023, 9 a.m. UTC | #1

On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> A malicious device can change its MSIX table size between the table
> ioremap() and subsequent accesses, resulting in a kernel page fault in
> pci_write_msg_msix().
> 
> To avoid this, cache the table size observed at the moment of table
> ioremap() and use the cached value. This, however, does not help drivers
> that peek at the PCIE_MSIX_FLAGS register directly.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/pci/msi/api.c | 7 ++++++-
>  drivers/pci/msi/msi.c | 2 +-
>  include/linux/pci.h   | 1 +
>  3 files changed, 8 insertions(+), 2 deletions(-)

I'm not security expert here, but not sure that this protects from anything.
1. Kernel relies on working and not-malicious HW. There are gazillion ways
to cause crashes other than changing MSI-X.
2. Device can report large table size, kernel will cache it and
malicious device will reduce it back. It is not handled and will cause
to kernel crash too.

Thanks

> 
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index b8009aa11f3c..617ea1256487 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -75,8 +75,13 @@ int pci_msix_vec_count(struct pci_dev *dev)
>  	if (!dev->msix_cap)
>  		return -EINVAL;
>  
> +	if (dev->flags_qsize)
> +		return dev->flags_qsize;
> +
>  	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
> -	return msix_table_size(control);
> +	dev->flags_qsize = msix_table_size(control);
> +
> +	return dev->flags_qsize;
>  }
>  EXPORT_SYMBOL(pci_msix_vec_count);
>  
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 1f716624ca56..d50cd45119f1 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -715,7 +715,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  
>  	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
>  	/* Request & Map MSI-X table region */
> -	tsize = msix_table_size(control);
> +	tsize = pci_msix_vec_count(dev);
>  	dev->msix_base = msix_map_region(dev, tsize);
>  	if (!dev->msix_base) {
>  		ret = -ENOMEM;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index adffd65e84b4..2e1a72a2139d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -352,6 +352,7 @@ struct pci_dev {
>  	u8		rom_base_reg;	/* Config register controlling ROM */
>  	u8		pin;		/* Interrupt pin this device uses */
>  	u16		pcie_flags_reg;	/* Cached PCIe Capabilities Register */
> +	u16		flags_qsize;	/* Cached MSIX table size */
>  	unsigned long	*dma_alias_mask;/* Mask of enabled devfn aliases */
>  
>  	struct pci_driver *driver;	/* Driver bound to this device */
> -- 
> 2.39.0
>

Marc Zyngier Jan. 22, 2023, 10:57 a.m. UTC | #2

On Sun, 22 Jan 2023 09:00:04 +0000,
Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> > A malicious device can change its MSIX table size between the table
> > ioremap() and subsequent accesses, resulting in a kernel page fault in
> > pci_write_msg_msix().
> > 
> > To avoid this, cache the table size observed at the moment of table
> > ioremap() and use the cached value. This, however, does not help drivers
> > that peek at the PCIE_MSIX_FLAGS register directly.
> > 
> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/pci/msi/api.c | 7 ++++++-
> >  drivers/pci/msi/msi.c | 2 +-
> >  include/linux/pci.h   | 1 +
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> I'm not security expert here, but not sure that this protects from anything.
> 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> to cause crashes other than changing MSI-X.
> 2. Device can report large table size, kernel will cache it and
> malicious device will reduce it back. It is not handled and will cause
> to kernel crash too.
> 

Indeed, this was my exact reaction reading this patch. This only makes
sure the same (potentially wrong) value is used at all times. So while
this results in a consistent use, this doesn't give much guarantee.

The only way to deal with this is to actually handle the resulting
fault, similar to what the kernel does when accessing userspace. Not
sure how possible this is with something like PCIe.

	M.

Greg KH Jan. 22, 2023, 10:57 a.m. UTC | #3

On Sun, Jan 22, 2023 at 11:00:04AM +0200, Leon Romanovsky wrote:
> On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> > A malicious device can change its MSIX table size between the table
> > ioremap() and subsequent accesses, resulting in a kernel page fault in
> > pci_write_msg_msix().
> > 
> > To avoid this, cache the table size observed at the moment of table
> > ioremap() and use the cached value. This, however, does not help drivers
> > that peek at the PCIE_MSIX_FLAGS register directly.
> > 
> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/pci/msi/api.c | 7 ++++++-
> >  drivers/pci/msi/msi.c | 2 +-
> >  include/linux/pci.h   | 1 +
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> I'm not security expert here, but not sure that this protects from anything.
> 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> to cause crashes other than changing MSI-X.

Linux does NOT protect from malicious PCIe devices at this point in
time, you are correct.  If we wish to change that model, then we can
work on that with the explict understanding that most all drivers will
need to change as will the bus logic for the busses involved.

To do piece-meal patches like this for no good reason is not a good idea
as it achieves nothing in the end :(

thanks,

greg k-h

David Laight Jan. 22, 2023, 3:34 p.m. UTC | #4

From: Marc Zyngier
> Sent: 22 January 2023 10:57
> 
> On Sun, 22 Jan 2023 09:00:04 +0000,
> Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> > > A malicious device can change its MSIX table size between the table
> > > ioremap() and subsequent accesses, resulting in a kernel page fault in
> > > pci_write_msg_msix().
> > >
> > > To avoid this, cache the table size observed at the moment of table
> > > ioremap() and use the cached value. This, however, does not help drivers
> > > that peek at the PCIE_MSIX_FLAGS register directly.
> > >
> > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  drivers/pci/msi/api.c | 7 ++++++-
> > >  drivers/pci/msi/msi.c | 2 +-
> > >  include/linux/pci.h   | 1 +
> > >  3 files changed, 8 insertions(+), 2 deletions(-)
> >
> > I'm not security expert here, but not sure that this protects from anything.
> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> > to cause crashes other than changing MSI-X.
> > 2. Device can report large table size, kernel will cache it and
> > malicious device will reduce it back. It is not handled and will cause
> > to kernel crash too.
> >
> 
> Indeed, this was my exact reaction reading this patch. This only makes
> sure the same (potentially wrong) value is used at all times. So while
> this results in a consistent use, this doesn't give much guarantee.

Yep, the device can 'choose' to error any PCIe read.

> The only way to deal with this is to actually handle the resulting
> fault, similar to what the kernel does when accessing userspace. Not
> sure how possible this is with something like PCIe.

I don't think you get a fault, the PCIe read completes with value ~0.
You might get an AER indication, that may not be helpful at all.
We've some x86 systems where that ends up as an NMI and panic!

A more valid reason for caching the MSIX table size is to avoid
doing a slow PCIe read.
I'm not sure how fast they are on 'normal' hardware, but the Altera
fpga we use is particularly pedestrian.
I just measured back-to-back reads at 126 clocks of the internal
125MHz clock so almost exactly 1us - or 3000 clocks of a 3GHz cpu.
(I added PCIe trace to the fpga so we can see what goes on.)

There is actually a much more 'interesting' issue with MSIX.
There are 16 bytes of data for each interrupt.

The kernel doesn't even try to ensure they are written as
a single PCIe TLP, and even if it did there is no real
guarantee the writes aren't split before the logic that
raises interrupts reads the values.

It is also quite likely that the interrupt raising logic
doesn't to an atomic read of all 16 bytes, so a cpu write
could split the reads.

This doesn't normally matter - the interrupt is enabled long
after the address/data fields are initialised.
But if the interrupt affinity is changed both the address and
data fields are likely to be changed on an interrupt that is
(nominally) enabled.

It is pretty easy to imagine the new address being used with
the old data (or v.v.) or even a 'torn read' of the 64bit
address field.

I don't remember seeing anything in the MSIX spec about
requirements on the hardware - which puts the onus on
the software to ensure the MSIX data is always valid.
This means that changing the vector needs to:
	Disable the interrupt.
	Delay (a read from the MSIX block is probably enough).
	Update the address and data.
	Delay.
	Enable the interrupt.
But I don't remember seeing the kernel to any of that.

	David.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Jonathan Cameron Jan. 23, 2023, 10:22 a.m. UTC | #5

On Sun, 22 Jan 2023 11:57:58 +0100
Greg KH <gregkh@linuxfoundation.org> wrote:

> On Sun, Jan 22, 2023 at 11:00:04AM +0200, Leon Romanovsky wrote:
> > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:  
> > > A malicious device can change its MSIX table size between the table
> > > ioremap() and subsequent accesses, resulting in a kernel page fault in
> > > pci_write_msg_msix().
> > > 
> > > To avoid this, cache the table size observed at the moment of table
> > > ioremap() and use the cached value. This, however, does not help drivers
> > > that peek at the PCIE_MSIX_FLAGS register directly.
> > > 
> > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  drivers/pci/msi/api.c | 7 ++++++-
> > >  drivers/pci/msi/msi.c | 2 +-
> > >  include/linux/pci.h   | 1 +
> > >  3 files changed, 8 insertions(+), 2 deletions(-)  
> > 
> > I'm not security expert here, but not sure that this protects from anything.
> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> > to cause crashes other than changing MSI-X.  
> 
> Linux does NOT protect from malicious PCIe devices at this point in
> time, you are correct.  If we wish to change that model, then we can
> work on that with the explict understanding that most all drivers will
> need to change as will the bus logic for the busses involved.
> 
> To do piece-meal patches like this for no good reason is not a good idea
> as it achieves nothing in the end :(
> 
> thanks,
> 
> greg k-h

If you care enough about potential malicious PCIe devices, do device
attestation and reject any devices that don't support it (which means
rejecting pretty much everything today ;).
Or potentially limit what non attested devices are allowed to do.

+CC Lukas who is working on this.

Jonathan

Alexander Shishkin Jan. 24, 2023, 11:52 a.m. UTC | #6

Leon Romanovsky <leon@kernel.org> writes:

> On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
>> A malicious device can change its MSIX table size between the table
>> ioremap() and subsequent accesses, resulting in a kernel page fault in
>> pci_write_msg_msix().
>> 
>> To avoid this, cache the table size observed at the moment of table
>> ioremap() and use the cached value. This, however, does not help drivers
>> that peek at the PCIE_MSIX_FLAGS register directly.
>> 
>> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> Cc: stable@vger.kernel.org
>> ---
>>  drivers/pci/msi/api.c | 7 ++++++-
>>  drivers/pci/msi/msi.c | 2 +-
>>  include/linux/pci.h   | 1 +
>>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> I'm not security expert here, but not sure that this protects from anything.
> 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> to cause crashes other than changing MSI-X.

This particular bug was preventing our fuzzing from going deeper into
the code and reaching some more of the aforementioned gazillion bugs.

> 2. Device can report large table size, kernel will cache it and
> malicious device will reduce it back. It is not handled and will cause
> to kernel crash too.

How would that happen? If the device decides to have fewer vectors,
they'll all still fit in the ioremapped MSIX table. The worst thing that
can happen is 0xffffffff reads from the mmio space, which a device can
do anyway. But that shouldn't trigger a page fault or otherwise
crash. Or am I missing something?

Thanks,
--
Alex

Alexander Shishkin Jan. 24, 2023, 11:59 a.m. UTC | #7

Marc Zyngier <maz@kernel.org> writes:

> On Sun, 22 Jan 2023 09:00:04 +0000,
> Leon Romanovsky <leon@kernel.org> wrote:
>> 
>> On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
>> > A malicious device can change its MSIX table size between the table
>> > ioremap() and subsequent accesses, resulting in a kernel page fault in
>> > pci_write_msg_msix().
>> > 
>> > To avoid this, cache the table size observed at the moment of table
>> > ioremap() and use the cached value. This, however, does not help drivers
>> > that peek at the PCIE_MSIX_FLAGS register directly.
>> > 
>> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> > Cc: stable@vger.kernel.org
>> > ---
>> >  drivers/pci/msi/api.c | 7 ++++++-
>> >  drivers/pci/msi/msi.c | 2 +-
>> >  include/linux/pci.h   | 1 +
>> >  3 files changed, 8 insertions(+), 2 deletions(-)
>> 
>> I'm not security expert here, but not sure that this protects from anything.
>> 1. Kernel relies on working and not-malicious HW. There are gazillion ways
>> to cause crashes other than changing MSI-X.
>> 2. Device can report large table size, kernel will cache it and
>> malicious device will reduce it back. It is not handled and will cause
>> to kernel crash too.
>> 
>
> Indeed, this was my exact reaction reading this patch. This only makes
> sure the same (potentially wrong) value is used at all times. So while
> this results in a consistent use, this doesn't give much guarantee.

It guarantees that the MSIX table is big enough to fit all the vectors,
so it should prevent the page faults from out-of-bounds accesses.

> The only way to deal with this is to actually handle the resulting
> fault, similar to what the kernel does when accessing userspace. Not
> sure how possible this is with something like PCIe.

Do you mean replacing MMIO accesses with exception handling accessors?
That seems like a monumental effort. And then we'd have to figure out
how to handle errors in the __pci_write_msi_msg() path.

Preventing page faults from happening in the first place seems like a
more reasonable solution, or what do you think?

Thanks,
--
Alex

Leon Romanovsky Jan. 24, 2023, 12:10 p.m. UTC | #8

On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote:
> Leon Romanovsky <leon@kernel.org> writes:
> 
> > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> >> A malicious device can change its MSIX table size between the table
> >> ioremap() and subsequent accesses, resulting in a kernel page fault in
> >> pci_write_msg_msix().
> >> 
> >> To avoid this, cache the table size observed at the moment of table
> >> ioremap() and use the cached value. This, however, does not help drivers
> >> that peek at the PCIE_MSIX_FLAGS register directly.
> >> 
> >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>  drivers/pci/msi/api.c | 7 ++++++-
> >>  drivers/pci/msi/msi.c | 2 +-
> >>  include/linux/pci.h   | 1 +
> >>  3 files changed, 8 insertions(+), 2 deletions(-)
> >
> > I'm not security expert here, but not sure that this protects from anything.
> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> > to cause crashes other than changing MSI-X.
> 
> This particular bug was preventing our fuzzing from going deeper into
> the code and reaching some more of the aforementioned gazillion bugs.

Your commit message says nothing about fuzzing, but talks about
malicious device. 
Do you see "gazillion bugs" for devices which don't change their MSI-X
table size under the hood, which is main kernel assumption?

If yes, you should fix these bugs.

> 
> > 2. Device can report large table size, kernel will cache it and
> > malicious device will reduce it back. It is not handled and will cause
> > to kernel crash too.
> 
> How would that happen? If the device decides to have fewer vectors,
> they'll all still fit in the ioremapped MSIX table. The worst thing that
> can happen is 0xffffffff reads from the mmio space, which a device can
> do anyway. But that shouldn't trigger a page fault or otherwise
> crash. Or am I missing something?

Like I said, I'm no expert. You should tell me if it safe for all
callers of pci_msix_vec_count().

Thanks

> 
> Thanks,
> --
> Alex

Alexander Shishkin Jan. 24, 2023, 12:42 p.m. UTC | #9

Leon Romanovsky <leon@kernel.org> writes:

> On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote:
>> Leon Romanovsky <leon@kernel.org> writes:
>> 
>> > I'm not security expert here, but not sure that this protects from anything.
>> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
>> > to cause crashes other than changing MSI-X.
>> 
>> This particular bug was preventing our fuzzing from going deeper into
>> the code and reaching some more of the aforementioned gazillion bugs.
>
> Your commit message says nothing about fuzzing, but talks about
> malicious device. 

A malicious device is what the fuzzing is aiming to simulate. The fact
of fuzzing process itself didn't seem relevant to the patch, so I didn't
include it, going instead for the problem statement and proposed
solution. Will the commit message benefit from mentioning fuzzing?

> Do you see "gazillion bugs" for devices which don't change their MSI-X
> table size under the hood, which is main kernel assumption?

Not so far.

> If yes, you should fix these bugs.

That's absolutely the intention.

>> > 2. Device can report large table size, kernel will cache it and
>> > malicious device will reduce it back. It is not handled and will cause
>> > to kernel crash too.
>> 
>> How would that happen? If the device decides to have fewer vectors,
>> they'll all still fit in the ioremapped MSIX table. The worst thing that
>> can happen is 0xffffffff reads from the mmio space, which a device can
>> do anyway. But that shouldn't trigger a page fault or otherwise
>> crash. Or am I missing something?
>
> Like I said, I'm no expert. You should tell me if it safe for all
> callers of pci_msix_vec_count().

Well, since you stated that the reverse will cause a kernel crash, I had
to ask how. I'll include some version of the above paragraph in the
commit message to indicate that we reverse situation has been considered.

Regards,
--
Alex

Leon Romanovsky Jan. 24, 2023, 12:59 p.m. UTC | #10

On Tue, Jan 24, 2023 at 02:42:11PM +0200, Alexander Shishkin wrote:
> Leon Romanovsky <leon@kernel.org> writes:
> 
> > On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote:
> >> Leon Romanovsky <leon@kernel.org> writes:
> >> 
> >> > I'm not security expert here, but not sure that this protects from anything.
> >> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> >> > to cause crashes other than changing MSI-X.
> >> 
> >> This particular bug was preventing our fuzzing from going deeper into
> >> the code and reaching some more of the aforementioned gazillion bugs.
> >
> > Your commit message says nothing about fuzzing, but talks about
> > malicious device. 
> 
> A malicious device is what the fuzzing is aiming to simulate. The fact
> of fuzzing process itself didn't seem relevant to the patch, so I didn't
> include it, going instead for the problem statement and proposed
> solution. Will the commit message benefit from mentioning fuzzing?

No, for most if not all kernel developers, the fuzzing means some sort of
random user-space input. PCI devices are trusted in the kernel.

> 
> > Do you see "gazillion bugs" for devices which don't change their MSI-X
> > table size under the hood, which is main kernel assumption?
> 
> Not so far.

So please share them with us.

> 
> > If yes, you should fix these bugs.
> 
> That's absolutely the intention.

So let's fix the bugs and not hide them.

> 
> >> > 2. Device can report large table size, kernel will cache it and
> >> > malicious device will reduce it back. It is not handled and will cause
> >> > to kernel crash too.
> >> 
> >> How would that happen? If the device decides to have fewer vectors,
> >> they'll all still fit in the ioremapped MSIX table. The worst thing that
> >> can happen is 0xffffffff reads from the mmio space, which a device can
> >> do anyway. But that shouldn't trigger a page fault or otherwise
> >> crash. Or am I missing something?
> >
> > Like I said, I'm no expert. You should tell me if it safe for all
> > callers of pci_msix_vec_count().
> 
> Well, since you stated that the reverse will cause a kernel crash, I had
> to ask how. I'll include some version of the above paragraph in the
> commit message to indicate that we reverse situation has been considered.

Not really. I didn't see any explanation how will it work if number
of vectors (which MSI-X table represents) is completely different from
seeing by PCI core.

Thanks

> 
> Regards,
> --
> Alex

Alexander Shishkin Jan. 24, 2023, 3:28 p.m. UTC | #11

Leon Romanovsky <leon@kernel.org> writes:

> On Tue, Jan 24, 2023 at 02:42:11PM +0200, Alexander Shishkin wrote:
>> Leon Romanovsky <leon@kernel.org> writes:
>> 
>> A malicious device is what the fuzzing is aiming to simulate. The fact
>> of fuzzing process itself didn't seem relevant to the patch, so I didn't
>> include it, going instead for the problem statement and proposed
>> solution. Will the commit message benefit from mentioning fuzzing?
>
> No, for most if not all kernel developers, the fuzzing means some sort of
> random user-space input. PCI devices are trusted in the kernel.

Right, it's a different kind of fuzzing. Apologies, I should have made
it clear.

>> > Do you see "gazillion bugs" for devices which don't change their MSI-X
>> > table size under the hood, which is main kernel assumption?
>> 
>> Not so far.
>
> So please share them with us.

We do, as soon as we find them. This patch is one such instance.

>> > If yes, you should fix these bugs.
>> 
>> That's absolutely the intention.
>
> So let's fix the bugs and not hide them.

Yes, that's what this patch aims to achieve.

Thanks,
--
Alex

Greg KH Jan. 24, 2023, 3:32 p.m. UTC | #12

On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote:
> Leon Romanovsky <leon@kernel.org> writes:
> 
> > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> >> A malicious device can change its MSIX table size between the table
> >> ioremap() and subsequent accesses, resulting in a kernel page fault in
> >> pci_write_msg_msix().
> >> 
> >> To avoid this, cache the table size observed at the moment of table
> >> ioremap() and use the cached value. This, however, does not help drivers
> >> that peek at the PCIE_MSIX_FLAGS register directly.
> >> 
> >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>  drivers/pci/msi/api.c | 7 ++++++-
> >>  drivers/pci/msi/msi.c | 2 +-
> >>  include/linux/pci.h   | 1 +
> >>  3 files changed, 8 insertions(+), 2 deletions(-)
> >
> > I'm not security expert here, but not sure that this protects from anything.
> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> > to cause crashes other than changing MSI-X.
> 
> This particular bug was preventing our fuzzing from going deeper into
> the code and reaching some more of the aforementioned gazillion bugs.

As per our documentation, if you are "fixing" things based on a tool you
have, you HAVE TO document that in the changelog.  Otherwise we just get
to reject the patch flat out (hint, this has caused more than one group
of developers to just be flat out banned for ignoring...)

And what kind of tool is now fuzzing PCI config accesses with
"malicious" devices?  Again, this is NOT something that Linux currently
even attempts to want to protect itself against.  If you are wanting to
change that model, wonderful, let's discuss that and work on defining
the scope of your new security threat model and go from there.  Don't
throw random patches at us and expect us to think they are even valid.

Again, Linux trusts PCI devices.  If you don't want to trust PCI
devices, we already have a framework in place to prevent that which is
independant of this area of the PCI code.  Use that, or let's discuss
why that is insufficient.

Note, this is NOT the first time I have told developers from Intel about
this.  Why you all keep ignoring this is beyond me, I think you keep
thinking that if you don't send patches through me, you can just ignore
my statements about this.  Odd...

greg k-h

Reshetova, Elena Jan. 25, 2023, 12:33 p.m. UTC | #13

> On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote:
> > Leon Romanovsky <leon@kernel.org> writes:
> >
> > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> > >> A malicious device can change its MSIX table size between the table
> > >> ioremap() and subsequent accesses, resulting in a kernel page fault in
> > >> pci_write_msg_msix().
> > >>
> > >> To avoid this, cache the table size observed at the moment of table
> > >> ioremap() and use the cached value. This, however, does not help drivers
> > >> that peek at the PCIE_MSIX_FLAGS register directly.
> > >>
> > >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > >> Cc: stable@vger.kernel.org
> > >> ---
> > >>  drivers/pci/msi/api.c | 7 ++++++-
> > >>  drivers/pci/msi/msi.c | 2 +-
> > >>  include/linux/pci.h   | 1 +
> > >>  3 files changed, 8 insertions(+), 2 deletions(-)
> > >
> > > I'm not security expert here, but not sure that this protects from anything.
> > > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> > > to cause crashes other than changing MSI-X.
> >
> > This particular bug was preventing our fuzzing from going deeper into
> > the code and reaching some more of the aforementioned gazillion bugs.
> 
> As per our documentation, if you are "fixing" things based on a tool you
> have, you HAVE TO document that in the changelog.  Otherwise we just get
> to reject the patch flat out (hint, this has caused more than one group
> of developers to just be flat out banned for ignoring...)
> 
> And what kind of tool is now fuzzing PCI config accesses with
> "malicious" devices?  Again, this is NOT something that Linux currently
> even attempts to want to protect itself against.  If you are wanting to
> change that model, wonderful, let's discuss that and work on defining
> the scope of your new security threat model and go from there.  Don't
> throw random patches at us and expect us to think they are even valid.
> 
> Again, Linux trusts PCI devices.  If you don't want to trust PCI
> devices, we already have a framework in place to prevent that which is
> independant of this area of the PCI code.  Use that, or let's discuss
> why that is insufficient.

Sure, I have started a new thread on this in 
https://lore.kernel.org/all/DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com/

I think it is much bigger topic to discuss, so better be handled separately. 

Best Regards,
Elena.

diff mbox series

Patch

diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index b8009aa11f3c..617ea1256487 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -75,8 +75,13 @@  int pci_msix_vec_count(struct pci_dev *dev)
 	if (!dev->msix_cap)
 		return -EINVAL;
 
+	if (dev->flags_qsize)
+		return dev->flags_qsize;
+
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
-	return msix_table_size(control);
+	dev->flags_qsize = msix_table_size(control);
+
+	return dev->flags_qsize;
 }
 EXPORT_SYMBOL(pci_msix_vec_count);
 
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 1f716624ca56..d50cd45119f1 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -715,7 +715,7 @@  static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
 	/* Request & Map MSI-X table region */
-	tsize = msix_table_size(control);
+	tsize = pci_msix_vec_count(dev);
 	dev->msix_base = msix_map_region(dev, tsize);
 	if (!dev->msix_base) {
 		ret = -ENOMEM;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index adffd65e84b4..2e1a72a2139d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -352,6 +352,7 @@  struct pci_dev {
 	u8		rom_base_reg;	/* Config register controlling ROM */
 	u8		pin;		/* Interrupt pin this device uses */
 	u16		pcie_flags_reg;	/* Cached PCIe Capabilities Register */
+	u16		flags_qsize;	/* Cached MSIX table size */
 	unsigned long	*dma_alias_mask;/* Mask of enabled devfn aliases */
 
 	struct pci_driver *driver;	/* Driver bound to this device */