Message ID | 20230119170633.40944-2-alexander.shishkin@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp446823wrn; Thu, 19 Jan 2023 09:16:21 -0800 (PST) X-Google-Smtp-Source: AMrXdXsgig5YIS/n49O7ZH6BmJ8CyLgJR2ZwP1PYrZhlmTHC2zIvjhtFn1bMQOqgVh+Dj14p9BEP X-Received: by 2002:a17:907:c606:b0:870:d9a:9ebb with SMTP id ud6-20020a170907c60600b008700d9a9ebbmr8306348ejc.38.1674148581510; Thu, 19 Jan 2023 09:16:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674148581; cv=none; d=google.com; s=arc-20160816; b=a/D20SEELpOFB8iEPaDPocxT3PBxQntW9TSU0zD8tcQ3jrcgOcJ9+HRbO2P7iyzNxw +7+TwIQvwrxTBFZlymTvOKsorl2a1fB88eaokfoSKVxKjtgeqgWf63SfozXoyNKHbUGr Lkrvf+Ia3ZZDWOHcMr9UJvpIsDslHOWURp1Uv4ige4/2cJ6+ofVZVO84kwsL1ch+J0i/ dMjluAauK0C5U3qcal8TzNBJ45BWm/+BICVEV4JPO80EyWmt+Whf4IvkLkHUgWXpmKQK 39QkkfAiXCyRtJl/R7PszXcuIsKZKhS5PWg6YoQu9O0ek9qhe902NViqtYKkFRYPUk+h SH4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bQipJj8eF9do85wBXhdSWSe/f855wHzm81YZ6Tb7gzw=; b=TKjWnDcXkVYnZCce8AEWnNwd4lo+wn60ACux8VqEk/puNctyBSnM6r50X/DPLFnOR3 SlZVolsQmOthWcackNx62wLfwhvpolVjoDhJJ31B+36aA5G6cngwi0iwgA+PdLk5GKFj W5oHaFjmUAPH9apiexS4Br3+zX+NE7X8brZiMOCWiXdV/x0LbK9WUsHxd3BCgHNhPHpq 4UyePd9jXsS1TmIBJzWHpcTPrEq4p1K/Gvd4s/wxfM8Av8WZtBhQx4KvGG/ustPyderE cm4gkuc6kfkGx+3dYjVBfxTzEho0NyWdAL4kujPhZpvuxqjlYllhXjeSsBqhlFjU/HS7 6IFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cJuffKRZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d14-20020aa7d5ce000000b0049b63678b52si16507844eds.516.2023.01.19.09.15.57; Thu, 19 Jan 2023 09:16:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cJuffKRZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230073AbjASRKx (ORCPT <rfc822;pfffrao@gmail.com> + 99 others); Thu, 19 Jan 2023 12:10:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229915AbjASRKu (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Jan 2023 12:10:50 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C736E5B91; Thu, 19 Jan 2023 09:10:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674148247; x=1705684247; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WZXENZe/rH7FLxenBxCcqTEtaplqgNUH7FpDX/thmSk=; b=cJuffKRZyiNr4wZK0xBT/+6tqIBnxjiVOF/2B7Tu105zHva4SDcq5Xqp I/1rkZ7+6StkYjcpQlkfKVUseuFpjGsdyTGXKfG1QLTnmIGKXIPTyoE25 t2nAN0QrNPy09svIf0jbR43uMQ9eAcEJZ0GryjTvXctFYYbh+kuHQ+ilT dmR6MIb5vph8mkGAE/VebqNKiHmZe8rlesSMqUZhYVZZHZSzoupbTzwiq u5vnXRsnEAum6XwMLv99xAi2ZCuJn2lZ70D9h9mAoU1gBXBtCh2YNcQaz PFWV94qvUrCkTUlfgNUM73alvYUXqy3Cl/cmNnG+UjlqyH1EELT7eGh2w Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="305714674" X-IronPort-AV: E=Sophos;i="5.97,229,1669104000"; d="scan'208";a="305714674" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 09:06:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="784124771" X-IronPort-AV: E=Sophos;i="5.97,229,1669104000"; d="scan'208";a="784124771" Received: from black.fi.intel.com (HELO black.fi.intel.com.) ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 19 Jan 2023 09:06:16 -0800 From: Alexander Shishkin <alexander.shishkin@linux.intel.com> To: Bjorn Helgaas <bhelgaas@google.com>, Thomas Gleixner <tglx@linutronix.de>, linux-pci@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Marc Zyngier <maz@kernel.org>, darwi@linutronix.de, elena.reshetova@intel.com, kirill.shutemov@linux.intel.com, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Mika Westerberg <mika.westerberg@linux.intel.com>, stable@vger.kernel.org Subject: [PATCH 1/2] PCI/MSI: Cache the MSIX table size Date: Thu, 19 Jan 2023 19:06:32 +0200 Message-Id: <20230119170633.40944-2-alexander.shishkin@linux.intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119170633.40944-1-alexander.shishkin@linux.intel.com> References: <20230119170633.40944-1-alexander.shishkin@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755472022906670901?= X-GMAIL-MSGID: =?utf-8?q?1755472022906670901?= |
Series |
PCI/MSI: Hardening fixes
|
|
Commit Message
Alexander Shishkin
Jan. 19, 2023, 5:06 p.m. UTC
A malicious device can change its MSIX table size between the table ioremap() and subsequent accesses, resulting in a kernel page fault in pci_write_msg_msix(). To avoid this, cache the table size observed at the moment of table ioremap() and use the cached value. This, however, does not help drivers that peek at the PCIE_MSIX_FLAGS register directly. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org --- drivers/pci/msi/api.c | 7 ++++++- drivers/pci/msi/msi.c | 2 +- include/linux/pci.h | 1 + 3 files changed, 8 insertions(+), 2 deletions(-)
Comments
On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > A malicious device can change its MSIX table size between the table > ioremap() and subsequent accesses, resulting in a kernel page fault in > pci_write_msg_msix(). > > To avoid this, cache the table size observed at the moment of table > ioremap() and use the cached value. This, however, does not help drivers > that peek at the PCIE_MSIX_FLAGS register directly. > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Cc: stable@vger.kernel.org > --- > drivers/pci/msi/api.c | 7 ++++++- > drivers/pci/msi/msi.c | 2 +- > include/linux/pci.h | 1 + > 3 files changed, 8 insertions(+), 2 deletions(-) I'm not security expert here, but not sure that this protects from anything. 1. Kernel relies on working and not-malicious HW. There are gazillion ways to cause crashes other than changing MSI-X. 2. Device can report large table size, kernel will cache it and malicious device will reduce it back. It is not handled and will cause to kernel crash too. Thanks > > diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c > index b8009aa11f3c..617ea1256487 100644 > --- a/drivers/pci/msi/api.c > +++ b/drivers/pci/msi/api.c > @@ -75,8 +75,13 @@ int pci_msix_vec_count(struct pci_dev *dev) > if (!dev->msix_cap) > return -EINVAL; > > + if (dev->flags_qsize) > + return dev->flags_qsize; > + > pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); > - return msix_table_size(control); > + dev->flags_qsize = msix_table_size(control); > + > + return dev->flags_qsize; > } > EXPORT_SYMBOL(pci_msix_vec_count); > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > index 1f716624ca56..d50cd45119f1 100644 > --- a/drivers/pci/msi/msi.c > +++ b/drivers/pci/msi/msi.c > @@ -715,7 +715,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, > > pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); > /* Request & Map MSI-X table region */ > - tsize = msix_table_size(control); > + tsize = pci_msix_vec_count(dev); > dev->msix_base = msix_map_region(dev, tsize); > if (!dev->msix_base) { > ret = -ENOMEM; > diff --git a/include/linux/pci.h b/include/linux/pci.h > index adffd65e84b4..2e1a72a2139d 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -352,6 +352,7 @@ struct pci_dev { > u8 rom_base_reg; /* Config register controlling ROM */ > u8 pin; /* Interrupt pin this device uses */ > u16 pcie_flags_reg; /* Cached PCIe Capabilities Register */ > + u16 flags_qsize; /* Cached MSIX table size */ > unsigned long *dma_alias_mask;/* Mask of enabled devfn aliases */ > > struct pci_driver *driver; /* Driver bound to this device */ > -- > 2.39.0 >
On Sun, 22 Jan 2023 09:00:04 +0000, Leon Romanovsky <leon@kernel.org> wrote: > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > > A malicious device can change its MSIX table size between the table > > ioremap() and subsequent accesses, resulting in a kernel page fault in > > pci_write_msg_msix(). > > > > To avoid this, cache the table size observed at the moment of table > > ioremap() and use the cached value. This, however, does not help drivers > > that peek at the PCIE_MSIX_FLAGS register directly. > > > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > Cc: stable@vger.kernel.org > > --- > > drivers/pci/msi/api.c | 7 ++++++- > > drivers/pci/msi/msi.c | 2 +- > > include/linux/pci.h | 1 + > > 3 files changed, 8 insertions(+), 2 deletions(-) > > I'm not security expert here, but not sure that this protects from anything. > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > to cause crashes other than changing MSI-X. > 2. Device can report large table size, kernel will cache it and > malicious device will reduce it back. It is not handled and will cause > to kernel crash too. > Indeed, this was my exact reaction reading this patch. This only makes sure the same (potentially wrong) value is used at all times. So while this results in a consistent use, this doesn't give much guarantee. The only way to deal with this is to actually handle the resulting fault, similar to what the kernel does when accessing userspace. Not sure how possible this is with something like PCIe. M.
On Sun, Jan 22, 2023 at 11:00:04AM +0200, Leon Romanovsky wrote: > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > > A malicious device can change its MSIX table size between the table > > ioremap() and subsequent accesses, resulting in a kernel page fault in > > pci_write_msg_msix(). > > > > To avoid this, cache the table size observed at the moment of table > > ioremap() and use the cached value. This, however, does not help drivers > > that peek at the PCIE_MSIX_FLAGS register directly. > > > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > Cc: stable@vger.kernel.org > > --- > > drivers/pci/msi/api.c | 7 ++++++- > > drivers/pci/msi/msi.c | 2 +- > > include/linux/pci.h | 1 + > > 3 files changed, 8 insertions(+), 2 deletions(-) > > I'm not security expert here, but not sure that this protects from anything. > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > to cause crashes other than changing MSI-X. Linux does NOT protect from malicious PCIe devices at this point in time, you are correct. If we wish to change that model, then we can work on that with the explict understanding that most all drivers will need to change as will the bus logic for the busses involved. To do piece-meal patches like this for no good reason is not a good idea as it achieves nothing in the end :( thanks, greg k-h
From: Marc Zyngier > Sent: 22 January 2023 10:57 > > On Sun, 22 Jan 2023 09:00:04 +0000, > Leon Romanovsky <leon@kernel.org> wrote: > > > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > > > A malicious device can change its MSIX table size between the table > > > ioremap() and subsequent accesses, resulting in a kernel page fault in > > > pci_write_msg_msix(). > > > > > > To avoid this, cache the table size observed at the moment of table > > > ioremap() and use the cached value. This, however, does not help drivers > > > that peek at the PCIE_MSIX_FLAGS register directly. > > > > > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > > Cc: stable@vger.kernel.org > > > --- > > > drivers/pci/msi/api.c | 7 ++++++- > > > drivers/pci/msi/msi.c | 2 +- > > > include/linux/pci.h | 1 + > > > 3 files changed, 8 insertions(+), 2 deletions(-) > > > > I'm not security expert here, but not sure that this protects from anything. > > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > > to cause crashes other than changing MSI-X. > > 2. Device can report large table size, kernel will cache it and > > malicious device will reduce it back. It is not handled and will cause > > to kernel crash too. > > > > Indeed, this was my exact reaction reading this patch. This only makes > sure the same (potentially wrong) value is used at all times. So while > this results in a consistent use, this doesn't give much guarantee. Yep, the device can 'choose' to error any PCIe read. > The only way to deal with this is to actually handle the resulting > fault, similar to what the kernel does when accessing userspace. Not > sure how possible this is with something like PCIe. I don't think you get a fault, the PCIe read completes with value ~0. You might get an AER indication, that may not be helpful at all. We've some x86 systems where that ends up as an NMI and panic! A more valid reason for caching the MSIX table size is to avoid doing a slow PCIe read. I'm not sure how fast they are on 'normal' hardware, but the Altera fpga we use is particularly pedestrian. I just measured back-to-back reads at 126 clocks of the internal 125MHz clock so almost exactly 1us - or 3000 clocks of a 3GHz cpu. (I added PCIe trace to the fpga so we can see what goes on.) There is actually a much more 'interesting' issue with MSIX. There are 16 bytes of data for each interrupt. The kernel doesn't even try to ensure they are written as a single PCIe TLP, and even if it did there is no real guarantee the writes aren't split before the logic that raises interrupts reads the values. It is also quite likely that the interrupt raising logic doesn't to an atomic read of all 16 bytes, so a cpu write could split the reads. This doesn't normally matter - the interrupt is enabled long after the address/data fields are initialised. But if the interrupt affinity is changed both the address and data fields are likely to be changed on an interrupt that is (nominally) enabled. It is pretty easy to imagine the new address being used with the old data (or v.v.) or even a 'torn read' of the 64bit address field. I don't remember seeing anything in the MSIX spec about requirements on the hardware - which puts the onus on the software to ensure the MSIX data is always valid. This means that changing the vector needs to: Disable the interrupt. Delay (a read from the MSIX block is probably enough). Update the address and data. Delay. Enable the interrupt. But I don't remember seeing the kernel to any of that. David. - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sun, 22 Jan 2023 11:57:58 +0100 Greg KH <gregkh@linuxfoundation.org> wrote: > On Sun, Jan 22, 2023 at 11:00:04AM +0200, Leon Romanovsky wrote: > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > > > A malicious device can change its MSIX table size between the table > > > ioremap() and subsequent accesses, resulting in a kernel page fault in > > > pci_write_msg_msix(). > > > > > > To avoid this, cache the table size observed at the moment of table > > > ioremap() and use the cached value. This, however, does not help drivers > > > that peek at the PCIE_MSIX_FLAGS register directly. > > > > > > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > > Cc: stable@vger.kernel.org > > > --- > > > drivers/pci/msi/api.c | 7 ++++++- > > > drivers/pci/msi/msi.c | 2 +- > > > include/linux/pci.h | 1 + > > > 3 files changed, 8 insertions(+), 2 deletions(-) > > > > I'm not security expert here, but not sure that this protects from anything. > > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > > to cause crashes other than changing MSI-X. > > Linux does NOT protect from malicious PCIe devices at this point in > time, you are correct. If we wish to change that model, then we can > work on that with the explict understanding that most all drivers will > need to change as will the bus logic for the busses involved. > > To do piece-meal patches like this for no good reason is not a good idea > as it achieves nothing in the end :( > > thanks, > > greg k-h If you care enough about potential malicious PCIe devices, do device attestation and reject any devices that don't support it (which means rejecting pretty much everything today ;). Or potentially limit what non attested devices are allowed to do. +CC Lukas who is working on this. Jonathan
Leon Romanovsky <leon@kernel.org> writes: > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: >> A malicious device can change its MSIX table size between the table >> ioremap() and subsequent accesses, resulting in a kernel page fault in >> pci_write_msg_msix(). >> >> To avoid this, cache the table size observed at the moment of table >> ioremap() and use the cached value. This, however, does not help drivers >> that peek at the PCIE_MSIX_FLAGS register directly. >> >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> >> Cc: stable@vger.kernel.org >> --- >> drivers/pci/msi/api.c | 7 ++++++- >> drivers/pci/msi/msi.c | 2 +- >> include/linux/pci.h | 1 + >> 3 files changed, 8 insertions(+), 2 deletions(-) > > I'm not security expert here, but not sure that this protects from anything. > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > to cause crashes other than changing MSI-X. This particular bug was preventing our fuzzing from going deeper into the code and reaching some more of the aforementioned gazillion bugs. > 2. Device can report large table size, kernel will cache it and > malicious device will reduce it back. It is not handled and will cause > to kernel crash too. How would that happen? If the device decides to have fewer vectors, they'll all still fit in the ioremapped MSIX table. The worst thing that can happen is 0xffffffff reads from the mmio space, which a device can do anyway. But that shouldn't trigger a page fault or otherwise crash. Or am I missing something? Thanks, -- Alex
Marc Zyngier <maz@kernel.org> writes: > On Sun, 22 Jan 2023 09:00:04 +0000, > Leon Romanovsky <leon@kernel.org> wrote: >> >> On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: >> > A malicious device can change its MSIX table size between the table >> > ioremap() and subsequent accesses, resulting in a kernel page fault in >> > pci_write_msg_msix(). >> > >> > To avoid this, cache the table size observed at the moment of table >> > ioremap() and use the cached value. This, however, does not help drivers >> > that peek at the PCIE_MSIX_FLAGS register directly. >> > >> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> >> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> >> > Cc: stable@vger.kernel.org >> > --- >> > drivers/pci/msi/api.c | 7 ++++++- >> > drivers/pci/msi/msi.c | 2 +- >> > include/linux/pci.h | 1 + >> > 3 files changed, 8 insertions(+), 2 deletions(-) >> >> I'm not security expert here, but not sure that this protects from anything. >> 1. Kernel relies on working and not-malicious HW. There are gazillion ways >> to cause crashes other than changing MSI-X. >> 2. Device can report large table size, kernel will cache it and >> malicious device will reduce it back. It is not handled and will cause >> to kernel crash too. >> > > Indeed, this was my exact reaction reading this patch. This only makes > sure the same (potentially wrong) value is used at all times. So while > this results in a consistent use, this doesn't give much guarantee. It guarantees that the MSIX table is big enough to fit all the vectors, so it should prevent the page faults from out-of-bounds accesses. > The only way to deal with this is to actually handle the resulting > fault, similar to what the kernel does when accessing userspace. Not > sure how possible this is with something like PCIe. Do you mean replacing MMIO accesses with exception handling accessors? That seems like a monumental effort. And then we'd have to figure out how to handle errors in the __pci_write_msi_msg() path. Preventing page faults from happening in the first place seems like a more reasonable solution, or what do you think? Thanks, -- Alex
On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote: > Leon Romanovsky <leon@kernel.org> writes: > > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > >> A malicious device can change its MSIX table size between the table > >> ioremap() and subsequent accesses, resulting in a kernel page fault in > >> pci_write_msg_msix(). > >> > >> To avoid this, cache the table size observed at the moment of table > >> ioremap() and use the cached value. This, however, does not help drivers > >> that peek at the PCIE_MSIX_FLAGS register directly. > >> > >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > >> Cc: stable@vger.kernel.org > >> --- > >> drivers/pci/msi/api.c | 7 ++++++- > >> drivers/pci/msi/msi.c | 2 +- > >> include/linux/pci.h | 1 + > >> 3 files changed, 8 insertions(+), 2 deletions(-) > > > > I'm not security expert here, but not sure that this protects from anything. > > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > > to cause crashes other than changing MSI-X. > > This particular bug was preventing our fuzzing from going deeper into > the code and reaching some more of the aforementioned gazillion bugs. Your commit message says nothing about fuzzing, but talks about malicious device. Do you see "gazillion bugs" for devices which don't change their MSI-X table size under the hood, which is main kernel assumption? If yes, you should fix these bugs. > > > 2. Device can report large table size, kernel will cache it and > > malicious device will reduce it back. It is not handled and will cause > > to kernel crash too. > > How would that happen? If the device decides to have fewer vectors, > they'll all still fit in the ioremapped MSIX table. The worst thing that > can happen is 0xffffffff reads from the mmio space, which a device can > do anyway. But that shouldn't trigger a page fault or otherwise > crash. Or am I missing something? Like I said, I'm no expert. You should tell me if it safe for all callers of pci_msix_vec_count(). Thanks > > Thanks, > -- > Alex
Leon Romanovsky <leon@kernel.org> writes: > On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote: >> Leon Romanovsky <leon@kernel.org> writes: >> >> > I'm not security expert here, but not sure that this protects from anything. >> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways >> > to cause crashes other than changing MSI-X. >> >> This particular bug was preventing our fuzzing from going deeper into >> the code and reaching some more of the aforementioned gazillion bugs. > > Your commit message says nothing about fuzzing, but talks about > malicious device. A malicious device is what the fuzzing is aiming to simulate. The fact of fuzzing process itself didn't seem relevant to the patch, so I didn't include it, going instead for the problem statement and proposed solution. Will the commit message benefit from mentioning fuzzing? > Do you see "gazillion bugs" for devices which don't change their MSI-X > table size under the hood, which is main kernel assumption? Not so far. > If yes, you should fix these bugs. That's absolutely the intention. >> > 2. Device can report large table size, kernel will cache it and >> > malicious device will reduce it back. It is not handled and will cause >> > to kernel crash too. >> >> How would that happen? If the device decides to have fewer vectors, >> they'll all still fit in the ioremapped MSIX table. The worst thing that >> can happen is 0xffffffff reads from the mmio space, which a device can >> do anyway. But that shouldn't trigger a page fault or otherwise >> crash. Or am I missing something? > > Like I said, I'm no expert. You should tell me if it safe for all > callers of pci_msix_vec_count(). Well, since you stated that the reverse will cause a kernel crash, I had to ask how. I'll include some version of the above paragraph in the commit message to indicate that we reverse situation has been considered. Regards, -- Alex
On Tue, Jan 24, 2023 at 02:42:11PM +0200, Alexander Shishkin wrote: > Leon Romanovsky <leon@kernel.org> writes: > > > On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote: > >> Leon Romanovsky <leon@kernel.org> writes: > >> > >> > I'm not security expert here, but not sure that this protects from anything. > >> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > >> > to cause crashes other than changing MSI-X. > >> > >> This particular bug was preventing our fuzzing from going deeper into > >> the code and reaching some more of the aforementioned gazillion bugs. > > > > Your commit message says nothing about fuzzing, but talks about > > malicious device. > > A malicious device is what the fuzzing is aiming to simulate. The fact > of fuzzing process itself didn't seem relevant to the patch, so I didn't > include it, going instead for the problem statement and proposed > solution. Will the commit message benefit from mentioning fuzzing? No, for most if not all kernel developers, the fuzzing means some sort of random user-space input. PCI devices are trusted in the kernel. > > > Do you see "gazillion bugs" for devices which don't change their MSI-X > > table size under the hood, which is main kernel assumption? > > Not so far. So please share them with us. > > > If yes, you should fix these bugs. > > That's absolutely the intention. So let's fix the bugs and not hide them. > > >> > 2. Device can report large table size, kernel will cache it and > >> > malicious device will reduce it back. It is not handled and will cause > >> > to kernel crash too. > >> > >> How would that happen? If the device decides to have fewer vectors, > >> they'll all still fit in the ioremapped MSIX table. The worst thing that > >> can happen is 0xffffffff reads from the mmio space, which a device can > >> do anyway. But that shouldn't trigger a page fault or otherwise > >> crash. Or am I missing something? > > > > Like I said, I'm no expert. You should tell me if it safe for all > > callers of pci_msix_vec_count(). > > Well, since you stated that the reverse will cause a kernel crash, I had > to ask how. I'll include some version of the above paragraph in the > commit message to indicate that we reverse situation has been considered. Not really. I didn't see any explanation how will it work if number of vectors (which MSI-X table represents) is completely different from seeing by PCI core. Thanks > > Regards, > -- > Alex
Leon Romanovsky <leon@kernel.org> writes: > On Tue, Jan 24, 2023 at 02:42:11PM +0200, Alexander Shishkin wrote: >> Leon Romanovsky <leon@kernel.org> writes: >> >> A malicious device is what the fuzzing is aiming to simulate. The fact >> of fuzzing process itself didn't seem relevant to the patch, so I didn't >> include it, going instead for the problem statement and proposed >> solution. Will the commit message benefit from mentioning fuzzing? > > No, for most if not all kernel developers, the fuzzing means some sort of > random user-space input. PCI devices are trusted in the kernel. Right, it's a different kind of fuzzing. Apologies, I should have made it clear. >> > Do you see "gazillion bugs" for devices which don't change their MSI-X >> > table size under the hood, which is main kernel assumption? >> >> Not so far. > > So please share them with us. We do, as soon as we find them. This patch is one such instance. >> > If yes, you should fix these bugs. >> >> That's absolutely the intention. > > So let's fix the bugs and not hide them. Yes, that's what this patch aims to achieve. Thanks, -- Alex
On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote: > Leon Romanovsky <leon@kernel.org> writes: > > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > >> A malicious device can change its MSIX table size between the table > >> ioremap() and subsequent accesses, resulting in a kernel page fault in > >> pci_write_msg_msix(). > >> > >> To avoid this, cache the table size observed at the moment of table > >> ioremap() and use the cached value. This, however, does not help drivers > >> that peek at the PCIE_MSIX_FLAGS register directly. > >> > >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > >> Cc: stable@vger.kernel.org > >> --- > >> drivers/pci/msi/api.c | 7 ++++++- > >> drivers/pci/msi/msi.c | 2 +- > >> include/linux/pci.h | 1 + > >> 3 files changed, 8 insertions(+), 2 deletions(-) > > > > I'm not security expert here, but not sure that this protects from anything. > > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > > to cause crashes other than changing MSI-X. > > This particular bug was preventing our fuzzing from going deeper into > the code and reaching some more of the aforementioned gazillion bugs. As per our documentation, if you are "fixing" things based on a tool you have, you HAVE TO document that in the changelog. Otherwise we just get to reject the patch flat out (hint, this has caused more than one group of developers to just be flat out banned for ignoring...) And what kind of tool is now fuzzing PCI config accesses with "malicious" devices? Again, this is NOT something that Linux currently even attempts to want to protect itself against. If you are wanting to change that model, wonderful, let's discuss that and work on defining the scope of your new security threat model and go from there. Don't throw random patches at us and expect us to think they are even valid. Again, Linux trusts PCI devices. If you don't want to trust PCI devices, we already have a framework in place to prevent that which is independant of this area of the PCI code. Use that, or let's discuss why that is insufficient. Note, this is NOT the first time I have told developers from Intel about this. Why you all keep ignoring this is beyond me, I think you keep thinking that if you don't send patches through me, you can just ignore my statements about this. Odd... greg k-h
> On Tue, Jan 24, 2023 at 01:52:37PM +0200, Alexander Shishkin wrote: > > Leon Romanovsky <leon@kernel.org> writes: > > > > > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote: > > >> A malicious device can change its MSIX table size between the table > > >> ioremap() and subsequent accesses, resulting in a kernel page fault in > > >> pci_write_msg_msix(). > > >> > > >> To avoid this, cache the table size observed at the moment of table > > >> ioremap() and use the cached value. This, however, does not help drivers > > >> that peek at the PCIE_MSIX_FLAGS register directly. > > >> > > >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > >> Cc: stable@vger.kernel.org > > >> --- > > >> drivers/pci/msi/api.c | 7 ++++++- > > >> drivers/pci/msi/msi.c | 2 +- > > >> include/linux/pci.h | 1 + > > >> 3 files changed, 8 insertions(+), 2 deletions(-) > > > > > > I'm not security expert here, but not sure that this protects from anything. > > > 1. Kernel relies on working and not-malicious HW. There are gazillion ways > > > to cause crashes other than changing MSI-X. > > > > This particular bug was preventing our fuzzing from going deeper into > > the code and reaching some more of the aforementioned gazillion bugs. > > As per our documentation, if you are "fixing" things based on a tool you > have, you HAVE TO document that in the changelog. Otherwise we just get > to reject the patch flat out (hint, this has caused more than one group > of developers to just be flat out banned for ignoring...) > > And what kind of tool is now fuzzing PCI config accesses with > "malicious" devices? Again, this is NOT something that Linux currently > even attempts to want to protect itself against. If you are wanting to > change that model, wonderful, let's discuss that and work on defining > the scope of your new security threat model and go from there. Don't > throw random patches at us and expect us to think they are even valid. > > Again, Linux trusts PCI devices. If you don't want to trust PCI > devices, we already have a framework in place to prevent that which is > independant of this area of the PCI code. Use that, or let's discuss > why that is insufficient. Sure, I have started a new thread on this in https://lore.kernel.org/all/DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com/ I think it is much bigger topic to discuss, so better be handled separately. Best Regards, Elena.
diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c index b8009aa11f3c..617ea1256487 100644 --- a/drivers/pci/msi/api.c +++ b/drivers/pci/msi/api.c @@ -75,8 +75,13 @@ int pci_msix_vec_count(struct pci_dev *dev) if (!dev->msix_cap) return -EINVAL; + if (dev->flags_qsize) + return dev->flags_qsize; + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); - return msix_table_size(control); + dev->flags_qsize = msix_table_size(control); + + return dev->flags_qsize; } EXPORT_SYMBOL(pci_msix_vec_count); diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 1f716624ca56..d50cd45119f1 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -715,7 +715,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ - tsize = msix_table_size(control); + tsize = pci_msix_vec_count(dev); dev->msix_base = msix_map_region(dev, tsize); if (!dev->msix_base) { ret = -ENOMEM; diff --git a/include/linux/pci.h b/include/linux/pci.h index adffd65e84b4..2e1a72a2139d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -352,6 +352,7 @@ struct pci_dev { u8 rom_base_reg; /* Config register controlling ROM */ u8 pin; /* Interrupt pin this device uses */ u16 pcie_flags_reg; /* Cached PCIe Capabilities Register */ + u16 flags_qsize; /* Cached MSIX table size */ unsigned long *dma_alias_mask;/* Mask of enabled devfn aliases */ struct pci_driver *driver; /* Driver bound to this device */