[v3] xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled

Message ID 20221118154931.1928298-1-marmarek@invisiblethingslab.com
State New
Headers
Series [v3] xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled |

Commit Message

Marek Marczykowski-Górecki Nov. 18, 2022, 3:49 p.m. UTC
  Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
the table is filled. Then it disables INTx just before clearing MASKALL
bit. Currently this approach is rejected by xen-pciback.
According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).

Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
applies to three places:
 - checking currently enabled interrupts type,
 - transition to MSI/MSI-X - where INTx would be implicitly disabled,
 - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
   enabled, as device should consider INTx disabled anyway in that case

Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
Changes in v3:
 - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
   with enabling MSI/MSI-X
Changes in v2:
 - restructure the patch to consider not only MASKALL bit, but enabling
   MSI/MSI-X generally, without explicitly disabling INTx first
---
 drivers/xen/xen-pciback/conf_space.c          | 19 +++++++++++------
 .../xen/xen-pciback/conf_space_capability.c   |  3 ++-
 drivers/xen/xen-pciback/conf_space_header.c   | 21 +++----------------
 3 files changed, 18 insertions(+), 25 deletions(-)
  

Comments

Jason Andryuk Nov. 18, 2022, 8:46 p.m. UTC | #1
On Fri, Nov 18, 2022 at 10:50 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> the table is filled. Then it disables INTx just before clearing MASKALL
> bit. Currently this approach is rejected by xen-pciback.
> According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
>
> Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> applies to three places:
>  - checking currently enabled interrupts type,
>  - transition to MSI/MSI-X - where INTx would be implicitly disabled,
>  - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
>    enabled, as device should consider INTx disabled anyway in that case
>
> Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> Changes in v3:
>  - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
>    with enabling MSI/MSI-X
> Changes in v2:
>  - restructure the patch to consider not only MASKALL bit, but enabling
>    MSI/MSI-X generally, without explicitly disabling INTx first
> ---

I was trying to test your xen-pciback v3 patch, and I am having
assignment fail consistently now.  It is actually failing to
quarantine to domIO in the first place, which matches the failure from
the other day (when I more carefully read through the logs).  It now
consistently fails to quarantine on every boot unlike the other day
where it happened once.

I added some printks and it 's getting -EBUSY from pdev_msix_assign()
which means pci_reset_msix_state() is failing:
    if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
         PCI_MSIX_FLAGS_MASKALL )
        return -EBUSY;

# lspci -vv -s 14.3
...
    Capabilities: [80] MSI-X: Enable- Count=16 Masked+
        Vector table: BAR=0 offset=00002000
        PBA: BAR=0 offset=00003000

So it looks like MASKALL is set and prevents assignment.

setpci -s 00:14.3 82.W=f
cleared that out for me and I could assign the device.

My dom0 boots, it runs flask-label-pci for a set of PCI devices
(including iwlwifi), then xl pci-assignable-add for all PCI devices
which will be passed through, then a little later it boots the
associated domains.  Dom0 does not have a driver for iwlwifi.

I'll have to investigate more to see how MASKALL is getting set.  This
had not been an issue before your recent patches.

Regards,
Jason
  
Marek Marczykowski-Górecki Nov. 18, 2022, 9:46 p.m. UTC | #2
On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> On Fri, Nov 18, 2022 at 10:50 AM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> >
> > Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> > the table is filled. Then it disables INTx just before clearing MASKALL
> > bit. Currently this approach is rejected by xen-pciback.
> > According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> > enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
> >
> > Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> > applies to three places:
> >  - checking currently enabled interrupts type,
> >  - transition to MSI/MSI-X - where INTx would be implicitly disabled,
> >  - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
> >    enabled, as device should consider INTx disabled anyway in that case
> >
> > Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > Changes in v3:
> >  - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
> >    with enabling MSI/MSI-X
> > Changes in v2:
> >  - restructure the patch to consider not only MASKALL bit, but enabling
> >    MSI/MSI-X generally, without explicitly disabling INTx first
> > ---
> 
> I was trying to test your xen-pciback v3 patch, and I am having
> assignment fail consistently now.  It is actually failing to
> quarantine to domIO in the first place, which matches the failure from
> the other day (when I more carefully read through the logs).  It now
> consistently fails to quarantine on every boot unlike the other day
> where it happened once.

Does this include the very first assignment too, or only after domain
reboot? If the latter, maybe some cleanup missed clearing MASKALL?

FWIW, the patch applied to Qubes
(https://github.com/QubesOS/qubes-linux-kernel/pull/680) seems to work
fine (the full test run is still in progress, but I see some green marks
already).

> I added some printks and it 's getting -EBUSY from pdev_msix_assign()
> which means pci_reset_msix_state() is failing:
>     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
>          PCI_MSIX_FLAGS_MASKALL )
>         return -EBUSY;
> 
> # lspci -vv -s 14.3
> ...
>     Capabilities: [80] MSI-X: Enable- Count=16 Masked+
>         Vector table: BAR=0 offset=00002000
>         PBA: BAR=0 offset=00003000
> 
> So it looks like MASKALL is set and prevents assignment.
> 
> setpci -s 00:14.3 82.W=f
> cleared that out for me and I could assign the device.
> 
> My dom0 boots, it runs flask-label-pci for a set of PCI devices
> (including iwlwifi), then xl pci-assignable-add for all PCI devices
> which will be passed through, then a little later it boots the
> associated domains.  Dom0 does not have a driver for iwlwifi.
> 
> I'll have to investigate more to see how MASKALL is getting set.  This
> had not been an issue before your recent patches.

I guess before the patches nothing set anything in MSI-X capability,
because it was hidden...

Anyway, to support my cleanup hypothesis, I tried to destroy a
PCI-having domain, and it left MSI-X enabled (at least according to the
config space). MASKALL was _not_ set, but I haven't checked masking of
individual vectors. TBH, I'm not sure what should be responsible for the
MSI-X cleanup after guest destroy. Should it be Xen? Qemu? Pciback?
Pciback calls PHYSDEVOP_{prepare,release}_msix only when
binding/unbinding from the device (so - xl pci-assignable-{add,remove}),
so this isn't the right place.
Should that be in Xen, in deassign_device() (part of
DOMCTL_deassign_device)?
  
Jason Andryuk Nov. 19, 2022, 2:36 p.m. UTC | #3
Hi, Marek,

On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > On Fri, Nov 18, 2022 at 10:50 AM Marek Marczykowski-Górecki
> > <marmarek@invisiblethingslab.com> wrote:
> > >
> > > Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> > > the table is filled. Then it disables INTx just before clearing MASKALL
> > > bit. Currently this approach is rejected by xen-pciback.
> > > According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> > > enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
> > >
> > > Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> > > applies to three places:
> > >  - checking currently enabled interrupts type,
> > >  - transition to MSI/MSI-X - where INTx would be implicitly disabled,
> > >  - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
> > >    enabled, as device should consider INTx disabled anyway in that case
> > >
> > > Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > ---
> > > Changes in v3:
> > >  - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
> > >    with enabling MSI/MSI-X
> > > Changes in v2:
> > >  - restructure the patch to consider not only MASKALL bit, but enabling
> > >    MSI/MSI-X generally, without explicitly disabling INTx first
> > > ---
> >
> > I was trying to test your xen-pciback v3 patch, and I am having
> > assignment fail consistently now.  It is actually failing to
> > quarantine to domIO in the first place, which matches the failure from
> > the other day (when I more carefully read through the logs).  It now
> > consistently fails to quarantine on every boot unlike the other day
> > where it happened once.
>
> Does this include the very first assignment too, or only after domain
> reboot? If the latter, maybe some cleanup missed clearing MASKALL?

It's the quarantine during dom0 boot that fails.  Later assignment
during VM boot fails.  I tried warm reboots and cold boots and it
happened both times.

I also modified my initrd to halt in there and checked the config
space.  MASKALL wasn't set at that time.  I need to double check -
MASKALL may have been unset after dom0 booted in that case.

I'll test more to figure when and how MASKALL is getting set.

> FWIW, the patch applied to Qubes
> (https://github.com/QubesOS/qubes-linux-kernel/pull/680) seems to work
> fine (the full test run is still in progress, but I see some green marks
> already).

Does Qubes quarantine devices explicitly, or are they in dom0 and
libvirt/libxl just assigns them when a domain boots?

> > I added some printks and it 's getting -EBUSY from pdev_msix_assign()
> > which means pci_reset_msix_state() is failing:
> >     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> >          PCI_MSIX_FLAGS_MASKALL )
> >         return -EBUSY;
> >
> > # lspci -vv -s 14.3
> > ...
> >     Capabilities: [80] MSI-X: Enable- Count=16 Masked+
> >         Vector table: BAR=0 offset=00002000
> >         PBA: BAR=0 offset=00003000
> >
> > So it looks like MASKALL is set and prevents assignment.
> >
> > setpci -s 00:14.3 82.W=f
> > cleared that out for me and I could assign the device.
> >
> > My dom0 boots, it runs flask-label-pci for a set of PCI devices
> > (including iwlwifi), then xl pci-assignable-add for all PCI devices
> > which will be passed through, then a little later it boots the
> > associated domains.  Dom0 does not have a driver for iwlwifi.
> >
> > I'll have to investigate more to see how MASKALL is getting set.  This
> > had not been an issue before your recent patches.
>
> I guess before the patches nothing set anything in MSI-X capability,
> because it was hidden...

Well, stubdom hasn't even booted when, so it would be the Xen or
pciback change to modify MASKALL?

> Anyway, to support my cleanup hypothesis, I tried to destroy a
> PCI-having domain, and it left MSI-X enabled (at least according to the
> config space). MASKALL was _not_ set, but I haven't checked masking of
> individual vectors. TBH, I'm not sure what should be responsible for the
> MSI-X cleanup after guest destroy. Should it be Xen? Qemu? Pciback?
> Pciback calls PHYSDEVOP_{prepare,release}_msix only when
> binding/unbinding from the device (so - xl pci-assignable-{add,remove}),
> so this isn't the right place.

I need to review all this code to give a meaningful response.  Would
xen-pciback set MASKALL when it binds a device?  That happens before
xl pci-assignable-add tries to quarantine (assign to to domIO).

> Should that be in Xen, in deassign_device() (part of
> DOMCTL_deassign_device)?

It seems to me that Xen needs to ultimately disable the device.

Thanks,
Jason
  
Marek Marczykowski-Górecki Nov. 19, 2022, 4:33 p.m. UTC | #4
On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> Hi, Marek,
> 
> On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> >
> > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > I was trying to test your xen-pciback v3 patch, and I am having
> > > assignment fail consistently now.  It is actually failing to
> > > quarantine to domIO in the first place, which matches the failure from
> > > the other day (when I more carefully read through the logs).  It now
> > > consistently fails to quarantine on every boot unlike the other day
> > > where it happened once.
> >
> > Does this include the very first assignment too, or only after domain
> > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> 
> It's the quarantine during dom0 boot that fails.  Later assignment
> during VM boot fails.  I tried warm reboots and cold boots and it
> happened both times.
> 
> I also modified my initrd to halt in there and checked the config
> space.  MASKALL wasn't set at that time.  I need to double check -
> MASKALL may have been unset after dom0 booted in that case.
> 
> I'll test more to figure when and how MASKALL is getting set.
> 
> > FWIW, the patch applied to Qubes
> > (https://github.com/QubesOS/qubes-linux-kernel/pull/680) seems to work
> > fine (the full test run is still in progress, but I see some green marks
> > already).
> 
> Does Qubes quarantine devices explicitly, or are they in dom0 and
> libvirt/libxl just assigns them when a domain boots?

We do quarantine them explicitly, still in initramfs.

> > > I added some printks and it 's getting -EBUSY from pdev_msix_assign()
> > > which means pci_reset_msix_state() is failing:
> > >     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> > >          PCI_MSIX_FLAGS_MASKALL )
> > >         return -EBUSY;
> > >
> > > # lspci -vv -s 14.3
> > > ...
> > >     Capabilities: [80] MSI-X: Enable- Count=16 Masked+
> > >         Vector table: BAR=0 offset=00002000
> > >         PBA: BAR=0 offset=00003000
> > >
> > > So it looks like MASKALL is set and prevents assignment.
> > >
> > > setpci -s 00:14.3 82.W=f
> > > cleared that out for me and I could assign the device.
> > >
> > > My dom0 boots, it runs flask-label-pci for a set of PCI devices
> > > (including iwlwifi), then xl pci-assignable-add for all PCI devices
> > > which will be passed through, then a little later it boots the
> > > associated domains.  Dom0 does not have a driver for iwlwifi.
> > >
> > > I'll have to investigate more to see how MASKALL is getting set.  This
> > > had not been an issue before your recent patches.
> >
> > I guess before the patches nothing set anything in MSI-X capability,
> > because it was hidden...
> 
> Well, stubdom hasn't even booted when, so it would be the Xen or
> pciback change to modify MASKALL?

Weird...

> > Anyway, to support my cleanup hypothesis, I tried to destroy a
> > PCI-having domain, and it left MSI-X enabled (at least according to the
> > config space). MASKALL was _not_ set, but I haven't checked masking of
> > individual vectors. TBH, I'm not sure what should be responsible for the
> > MSI-X cleanup after guest destroy. Should it be Xen? Qemu? Pciback?
> > Pciback calls PHYSDEVOP_{prepare,release}_msix only when
> > binding/unbinding from the device (so - xl pci-assignable-{add,remove}),
> > so this isn't the right place.
> 
> I need to review all this code to give a meaningful response.  Would
> xen-pciback set MASKALL when it binds a device?  That happens before
> xl pci-assignable-add tries to quarantine (assign to to domIO).

I don't see pciback doing that. And also, my patches shouldn't change
behaviour of pciback when binding to a device (so, if it would be doing
that, it would happen before my patches too).

Maybe that's an interaction with some other patches?

> > Should that be in Xen, in deassign_device() (part of
> > DOMCTL_deassign_device)?
> 
> It seems to me that Xen needs to ultimately disable the device.

That's my intuition too.
  
Jason Andryuk Nov. 21, 2022, 3:41 p.m. UTC | #5
On Sat, Nov 19, 2022 at 11:33 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> > Hi, Marek,
> >
> > On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> > <marmarek@invisiblethingslab.com> wrote:
> > >
> > > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > > I was trying to test your xen-pciback v3 patch, and I am having
> > > > assignment fail consistently now.  It is actually failing to
> > > > quarantine to domIO in the first place, which matches the failure from
> > > > the other day (when I more carefully read through the logs).  It now
> > > > consistently fails to quarantine on every boot unlike the other day
> > > > where it happened once.
> > >
> > > Does this include the very first assignment too, or only after domain
> > > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> >
> > It's the quarantine during dom0 boot that fails.  Later assignment
> > during VM boot fails.  I tried warm reboots and cold boots and it
> > happened both times.
> >
> > I also modified my initrd to halt in there and checked the config
> > space.  MASKALL wasn't set at that time.  I need to double check -
> > MASKALL may have been unset after dom0 booted in that case.
> >
> > I'll test more to figure when and how MASKALL is getting set.

I'm testing with a laptop without a battery.  It seems MASKALL remains
set when rebooting or when left plugged in.

From unplugged, a cold boot doesn't have MASKALL set and the network vm boots.

After that, rebooting the laptop leaves MASKALL set on the NIC when
the laptop reboots.   NIC assignment fails.

Shutdown and later boot while left plugged in keeps MASKALL set.  NIC
assignment fails.  I have only tested this scenario for short periods
of time, so I don't know if it would eventually clear after a longer
time.

Regards,
Jason
  
Marek Marczykowski-Górecki Nov. 21, 2022, 4:16 p.m. UTC | #6
On Mon, Nov 21, 2022 at 10:41:34AM -0500, Jason Andryuk wrote:
> On Sat, Nov 19, 2022 at 11:33 AM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> >
> > On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> > > Hi, Marek,
> > >
> > > On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> > > <marmarek@invisiblethingslab.com> wrote:
> > > >
> > > > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > > > I was trying to test your xen-pciback v3 patch, and I am having
> > > > > assignment fail consistently now.  It is actually failing to
> > > > > quarantine to domIO in the first place, which matches the failure from
> > > > > the other day (when I more carefully read through the logs).  It now
> > > > > consistently fails to quarantine on every boot unlike the other day
> > > > > where it happened once.
> > > >
> > > > Does this include the very first assignment too, or only after domain
> > > > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> > >
> > > It's the quarantine during dom0 boot that fails.  Later assignment
> > > during VM boot fails.  I tried warm reboots and cold boots and it
> > > happened both times.
> > >
> > > I also modified my initrd to halt in there and checked the config
> > > space.  MASKALL wasn't set at that time.  I need to double check -
> > > MASKALL may have been unset after dom0 booted in that case.
> > >
> > > I'll test more to figure when and how MASKALL is getting set.
> 
> I'm testing with a laptop without a battery.  It seems MASKALL remains
> set when rebooting or when left plugged in.
> 
> From unplugged, a cold boot doesn't have MASKALL set and the network vm boots.
> 
> After that, rebooting the laptop leaves MASKALL set on the NIC when
> the laptop reboots.   NIC assignment fails.
> 
> Shutdown and later boot while left plugged in keeps MASKALL set.  NIC
> assignment fails.  I have only tested this scenario for short periods
> of time, so I don't know if it would eventually clear after a longer
> time.

That's interesting, seems like firmware is not resetting the device
properly. Maybe related to enabled wake on lan?

Anyway, resetting the device at domain create/destroy is AFAIR normally
done by pciback (at the instruction by the toolstack). Should it maybe
be done when assigning to pciback initially too? Or maybe in this
specific case, device reset doesn't properly clear MASKALL, so pciback
should clear it explicitly (after ensuring the MSI-X enable is cleared
too)?
  
Marek Marczykowski-Górecki Nov. 28, 2022, 1:44 p.m. UTC | #7
On Mon, Nov 21, 2022 at 05:16:37PM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Nov 21, 2022 at 10:41:34AM -0500, Jason Andryuk wrote:
> > On Sat, Nov 19, 2022 at 11:33 AM Marek Marczykowski-Górecki
> > <marmarek@invisiblethingslab.com> wrote:
> > >
> > > On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> > > > Hi, Marek,
> > > >
> > > > On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> > > > <marmarek@invisiblethingslab.com> wrote:
> > > > >
> > > > > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > > > > I was trying to test your xen-pciback v3 patch, and I am having
> > > > > > assignment fail consistently now.  It is actually failing to
> > > > > > quarantine to domIO in the first place, which matches the failure from
> > > > > > the other day (when I more carefully read through the logs).  It now
> > > > > > consistently fails to quarantine on every boot unlike the other day
> > > > > > where it happened once.
> > > > >
> > > > > Does this include the very first assignment too, or only after domain
> > > > > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> > > >
> > > > It's the quarantine during dom0 boot that fails.  Later assignment
> > > > during VM boot fails.  I tried warm reboots and cold boots and it
> > > > happened both times.
> > > >
> > > > I also modified my initrd to halt in there and checked the config
> > > > space.  MASKALL wasn't set at that time.  I need to double check -
> > > > MASKALL may have been unset after dom0 booted in that case.
> > > >
> > > > I'll test more to figure when and how MASKALL is getting set.
> > 
> > I'm testing with a laptop without a battery.  It seems MASKALL remains
> > set when rebooting or when left plugged in.
> > 
> > From unplugged, a cold boot doesn't have MASKALL set and the network vm boots.
> > 
> > After that, rebooting the laptop leaves MASKALL set on the NIC when
> > the laptop reboots.   NIC assignment fails.
> > 
> > Shutdown and later boot while left plugged in keeps MASKALL set.  NIC
> > assignment fails.  I have only tested this scenario for short periods
> > of time, so I don't know if it would eventually clear after a longer
> > time.
> 
> That's interesting, seems like firmware is not resetting the device
> properly. Maybe related to enabled wake on lan?
> 
> Anyway, resetting the device at domain create/destroy is AFAIR normally
> done by pciback (at the instruction by the toolstack). Should it maybe
> be done when assigning to pciback initially too? Or maybe in this
> specific case, device reset doesn't properly clear MASKALL, so pciback
> should clear it explicitly (after ensuring the MSI-X enable is cleared
> too)?

Can you check if `echo 1 > /sys/bus/pci/devices/$SBDF/reset` clears
MASKALL on this device?

I'm tempted to add libxl__device_pci_reset() as part of
libxl__device_pci_assignable_add().
  
Jason Andryuk Nov. 30, 2022, 7:40 p.m. UTC | #8
On Mon, Nov 28, 2022 at 8:44 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Mon, Nov 21, 2022 at 05:16:37PM +0100, Marek Marczykowski-Górecki wrote:
> > On Mon, Nov 21, 2022 at 10:41:34AM -0500, Jason Andryuk wrote:
> > > On Sat, Nov 19, 2022 at 11:33 AM Marek Marczykowski-Górecki
> > > <marmarek@invisiblethingslab.com> wrote:
> > > >
> > > > On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> > > > > Hi, Marek,
> > > > >
> > > > > On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> > > > > <marmarek@invisiblethingslab.com> wrote:
> > > > > >
> > > > > > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > > > > > I was trying to test your xen-pciback v3 patch, and I am having
> > > > > > > assignment fail consistently now.  It is actually failing to
> > > > > > > quarantine to domIO in the first place, which matches the failure from
> > > > > > > the other day (when I more carefully read through the logs).  It now
> > > > > > > consistently fails to quarantine on every boot unlike the other day
> > > > > > > where it happened once.
> > > > > >
> > > > > > Does this include the very first assignment too, or only after domain
> > > > > > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> > > > >
> > > > > It's the quarantine during dom0 boot that fails.  Later assignment
> > > > > during VM boot fails.  I tried warm reboots and cold boots and it
> > > > > happened both times.
> > > > >
> > > > > I also modified my initrd to halt in there and checked the config
> > > > > space.  MASKALL wasn't set at that time.  I need to double check -
> > > > > MASKALL may have been unset after dom0 booted in that case.
> > > > >
> > > > > I'll test more to figure when and how MASKALL is getting set.
> > >
> > > I'm testing with a laptop without a battery.  It seems MASKALL remains
> > > set when rebooting or when left plugged in.
> > >
> > > From unplugged, a cold boot doesn't have MASKALL set and the network vm boots.
> > >
> > > After that, rebooting the laptop leaves MASKALL set on the NIC when
> > > the laptop reboots.   NIC assignment fails.
> > >
> > > Shutdown and later boot while left plugged in keeps MASKALL set.  NIC
> > > assignment fails.  I have only tested this scenario for short periods
> > > of time, so I don't know if it would eventually clear after a longer
> > > time.
> >
> > That's interesting, seems like firmware is not resetting the device
> > properly. Maybe related to enabled wake on lan?
> >
> > Anyway, resetting the device at domain create/destroy is AFAIR normally
> > done by pciback (at the instruction by the toolstack). Should it maybe
> > be done when assigning to pciback initially too? Or maybe in this
> > specific case, device reset doesn't properly clear MASKALL, so pciback
> > should clear it explicitly (after ensuring the MSI-X enable is cleared
> > too)?
>
> Can you check if `echo 1 > /sys/bus/pci/devices/$SBDF/reset` clears
> MASKALL on this device?

`echo 1 > ..../reset` did not clear MASKALL.

After shutting down the domain with the iwlwifi card, lspci from dom0 shows:
MSI-X: Enable+ Count=16 Masked+

Hmm, Xen logged:
(XEN) cannot disable IRQ 137: masking MSI-X on 0000:00:14.3

Oh, looking back, I see that was logged during my earlier testing of
this patch set, but I missed it.

It seems like Xen set Enable and Masked itself in __pci_disable_msix()
since memory decoding is not enabled.

I'm still investigating, but I wanted to give an update.  It seems
like Xen should clear MASKALL when booting.  Something like clearing
MASKALL in pdev_msi_init() when !ENABLE & MASKALL.  However, I have
seen the system boot with both Enable and Maskall set on the iwlwifi
nic.  Is it risky to just unilaterally clear both of those when
enumerating PCI devices?  It doesn't seem appropriate to leave them
set without a driver controlling them.

-Jason
  
Marek Marczykowski-Górecki Sept. 27, 2023, 10:33 a.m. UTC | #9
On Fri, Nov 18, 2022 at 04:49:23PM +0100, Marek Marczykowski-Górecki wrote:
> Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> the table is filled. Then it disables INTx just before clearing MASKALL
> bit. Currently this approach is rejected by xen-pciback.
> According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
> 
> Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> applies to three places:
>  - checking currently enabled interrupts type,
>  - transition to MSI/MSI-X - where INTx would be implicitly disabled,
>  - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
>    enabled, as device should consider INTx disabled anyway in that case
> 
> Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Ping?

The issue pointed out by Jason was fixed in Xen:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=913751d7af6e78d65c1e2adf4887193c827f0c5e

> ---
> Changes in v3:
>  - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
>    with enabling MSI/MSI-X
> Changes in v2:
>  - restructure the patch to consider not only MASKALL bit, but enabling
>    MSI/MSI-X generally, without explicitly disabling INTx first
> ---
>  drivers/xen/xen-pciback/conf_space.c          | 19 +++++++++++------
>  .../xen/xen-pciback/conf_space_capability.c   |  3 ++-
>  drivers/xen/xen-pciback/conf_space_header.c   | 21 +++----------------
>  3 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/xen/xen-pciback/conf_space.c b/drivers/xen/xen-pciback/conf_space.c
> index 059de92aea7d..d47eee6c5143 100644
> --- a/drivers/xen/xen-pciback/conf_space.c
> +++ b/drivers/xen/xen-pciback/conf_space.c
> @@ -288,12 +288,6 @@ int xen_pcibk_get_interrupt_type(struct pci_dev *dev)
>  	u16 val;
>  	int ret = 0;
>  
> -	err = pci_read_config_word(dev, PCI_COMMAND, &val);
> -	if (err)
> -		return err;
> -	if (!(val & PCI_COMMAND_INTX_DISABLE))
> -		ret |= INTERRUPT_TYPE_INTX;
> -
>  	/*
>  	 * Do not trust dev->msi(x)_enabled here, as enabling could be done
>  	 * bypassing the pci_*msi* functions, by the qemu.
> @@ -316,6 +310,19 @@ int xen_pcibk_get_interrupt_type(struct pci_dev *dev)
>  		if (val & PCI_MSIX_FLAGS_ENABLE)
>  			ret |= INTERRUPT_TYPE_MSIX;
>  	}
> +
> +	/*
> +	 * PCIe spec says device cannot use INTx if MSI/MSI-X is enabled,
> +	 * so check for INTx only when both are disabled.
> +	 */
> +	if (!ret) {
> +		err = pci_read_config_word(dev, PCI_COMMAND, &val);
> +		if (err)
> +			return err;
> +		if (!(val & PCI_COMMAND_INTX_DISABLE))
> +			ret |= INTERRUPT_TYPE_INTX;
> +	}
> +
>  	return ret ?: INTERRUPT_TYPE_NONE;
>  }
>  
> diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c
> index 097316a74126..eb4c1af44f5c 100644
> --- a/drivers/xen/xen-pciback/conf_space_capability.c
> +++ b/drivers/xen/xen-pciback/conf_space_capability.c
> @@ -236,10 +236,11 @@ static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value,
>  		return PCIBIOS_SET_FAILED;
>  
>  	if (new_value & field_config->enable_bit) {
> -		/* don't allow enabling together with other interrupt types */
> +		/* don't allow enabling together with other interrupt type */
>  		int int_type = xen_pcibk_get_interrupt_type(dev);
>  
>  		if (int_type == INTERRUPT_TYPE_NONE ||
> +		    int_type == INTERRUPT_TYPE_INTX ||
>  		    int_type == field_config->int_type)
>  			goto write;
>  		return PCIBIOS_SET_FAILED;
> diff --git a/drivers/xen/xen-pciback/conf_space_header.c b/drivers/xen/xen-pciback/conf_space_header.c
> index 981435103af1..fc0332645966 100644
> --- a/drivers/xen/xen-pciback/conf_space_header.c
> +++ b/drivers/xen/xen-pciback/conf_space_header.c
> @@ -104,24 +104,9 @@ static int command_write(struct pci_dev *dev, int offset, u16 value, void *data)
>  		pci_clear_mwi(dev);
>  	}
>  
> -	if (dev_data && dev_data->allow_interrupt_control) {
> -		if ((cmd->val ^ value) & PCI_COMMAND_INTX_DISABLE) {
> -			if (value & PCI_COMMAND_INTX_DISABLE) {
> -				pci_intx(dev, 0);
> -			} else {
> -				/* Do not allow enabling INTx together with MSI or MSI-X. */
> -				switch (xen_pcibk_get_interrupt_type(dev)) {
> -				case INTERRUPT_TYPE_NONE:
> -					pci_intx(dev, 1);
> -					break;
> -				case INTERRUPT_TYPE_INTX:
> -					break;
> -				default:
> -					return PCIBIOS_SET_FAILED;
> -				}
> -			}
> -		}
> -	}
> +	if (dev_data && dev_data->allow_interrupt_control &&
> +	    ((cmd->val ^ value) & PCI_COMMAND_INTX_DISABLE))
> +		pci_intx(dev, !(value & PCI_COMMAND_INTX_DISABLE));
>  
>  	cmd->val = value;
>  
> -- 
> 2.37.3
>
  
Juergen Gross Oct. 16, 2023, 6:16 a.m. UTC | #10
On 18.11.22 16:49, Marek Marczykowski-Górecki wrote:
> Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> the table is filled. Then it disables INTx just before clearing MASKALL
> bit. Currently this approach is rejected by xen-pciback.
> According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
> 
> Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> applies to three places:
>   - checking currently enabled interrupts type,
>   - transition to MSI/MSI-X - where INTx would be implicitly disabled,
>   - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
>     enabled, as device should consider INTx disabled anyway in that case
> 
> Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Acked-by: Juergen Gross <jgross@suse.com>


Juergen
  
Marek Marczykowski-Górecki Oct. 16, 2023, 1:04 p.m. UTC | #11
On Mon, Oct 16, 2023 at 11:05:10AM +0200, Roger Pau Monné wrote:
> On Fri, Nov 18, 2022 at 04:49:23PM +0100, Marek Marczykowski-Górecki wrote:
> > Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> > the table is filled. Then it disables INTx just before clearing MASKALL
> > bit. Currently this approach is rejected by xen-pciback.
> > According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> > enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
> > 
> > Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> > applies to three places:
> >  - checking currently enabled interrupts type,
> >  - transition to MSI/MSI-X - where INTx would be implicitly disabled,
> >  - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
> >    enabled, as device should consider INTx disabled anyway in that case
> 
> Is this last point strictly needed?  From the description above it
> seems Linux only cares about enabling MSI(-X) without the disable INTx
> bit set.

I'm not sure, but it seems logical to have it symmetric.

> 
> > 
> > Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > Changes in v3:
> >  - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
> >    with enabling MSI/MSI-X
> > Changes in v2:
> >  - restructure the patch to consider not only MASKALL bit, but enabling
> >    MSI/MSI-X generally, without explicitly disabling INTx first
> > ---
> >  drivers/xen/xen-pciback/conf_space.c          | 19 +++++++++++------
> >  .../xen/xen-pciback/conf_space_capability.c   |  3 ++-
> >  drivers/xen/xen-pciback/conf_space_header.c   | 21 +++----------------
> >  3 files changed, 18 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/xen/xen-pciback/conf_space.c b/drivers/xen/xen-pciback/conf_space.c
> > index 059de92aea7d..d47eee6c5143 100644
> > --- a/drivers/xen/xen-pciback/conf_space.c
> > +++ b/drivers/xen/xen-pciback/conf_space.c
> > @@ -288,12 +288,6 @@ int xen_pcibk_get_interrupt_type(struct pci_dev *dev)
> >  	u16 val;
> >  	int ret = 0;
> >  
> > -	err = pci_read_config_word(dev, PCI_COMMAND, &val);
> > -	if (err)
> > -		return err;
> > -	if (!(val & PCI_COMMAND_INTX_DISABLE))
> > -		ret |= INTERRUPT_TYPE_INTX;
> > -
> >  	/*
> >  	 * Do not trust dev->msi(x)_enabled here, as enabling could be done
> >  	 * bypassing the pci_*msi* functions, by the qemu.
> > @@ -316,6 +310,19 @@ int xen_pcibk_get_interrupt_type(struct pci_dev *dev)
> >  		if (val & PCI_MSIX_FLAGS_ENABLE)
> >  			ret |= INTERRUPT_TYPE_MSIX;
> >  	}
> 
> Since we are explicitly hiding INTx now, should we also do something
> about MSI(X) being both enabled at the same time?  The spec states:
> 
> "System configuration software sets one of these bits to enable either
> MSI or MSI-X, but never both simultaneously. Behavior is undefined if
> both MSI and MSI-X are enabled simultaneously."
> 
> So finding both MSI and MSI-X enabled likely means something has gone
> very wrong?  Likely to be done in a separate change, just realized
> while looking at the patch context.

Pciback try to prevent such situation (that's exactly the point of
checking the current interrupt type). But if you get into such situation
somehow anyway (likely bypassing pciback), then pciback will still allow
to disable one of them, so you can fix the situation (the enforcement of
"only one type at the time" is done setting the enable bit, but you can still
clear it).

If both MSI and MSI-X are enabled xen_pcibk_get_interrupt_type() will
return both bits set.

> > +
> > +	/*
> > +	 * PCIe spec says device cannot use INTx if MSI/MSI-X is enabled,
> > +	 * so check for INTx only when both are disabled.
> > +	 */
> > +	if (!ret) {
> > +		err = pci_read_config_word(dev, PCI_COMMAND, &val);
> > +		if (err)
> > +			return err;
> > +		if (!(val & PCI_COMMAND_INTX_DISABLE))
> > +			ret |= INTERRUPT_TYPE_INTX;
> > +	}
> > +
> >  	return ret ?: INTERRUPT_TYPE_NONE;
> >  }
> >  
> > diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c
> > index 097316a74126..eb4c1af44f5c 100644
> > --- a/drivers/xen/xen-pciback/conf_space_capability.c
> > +++ b/drivers/xen/xen-pciback/conf_space_capability.c
> > @@ -236,10 +236,11 @@ static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value,
> >  		return PCIBIOS_SET_FAILED;
> >  
> >  	if (new_value & field_config->enable_bit) {
> > -		/* don't allow enabling together with other interrupt types */
> > +		/* don't allow enabling together with other interrupt type */
> 
> This comment needs to be adjusted to note that we allow enabling while
> INTx is not disabled in the command register, in order to please
> Linuxes MSI(-X) startup sequence.

Ok.

> FWIW, another option would be to simply disable INTX here once MSI(-X)
> is attempted to be enabled, won't that avoid having to modify
> xen_pcibk_get_interrupt_type()?

I would rather avoid implicit changes to other bits, it may lead to hard
to debug corner cases (in this case, for example, if domU decides to
disable MSI-X later on, it would be left with INTx disabled too, so no
interrupts at all).
  

Patch

diff --git a/drivers/xen/xen-pciback/conf_space.c b/drivers/xen/xen-pciback/conf_space.c
index 059de92aea7d..d47eee6c5143 100644
--- a/drivers/xen/xen-pciback/conf_space.c
+++ b/drivers/xen/xen-pciback/conf_space.c
@@ -288,12 +288,6 @@  int xen_pcibk_get_interrupt_type(struct pci_dev *dev)
 	u16 val;
 	int ret = 0;
 
-	err = pci_read_config_word(dev, PCI_COMMAND, &val);
-	if (err)
-		return err;
-	if (!(val & PCI_COMMAND_INTX_DISABLE))
-		ret |= INTERRUPT_TYPE_INTX;
-
 	/*
 	 * Do not trust dev->msi(x)_enabled here, as enabling could be done
 	 * bypassing the pci_*msi* functions, by the qemu.
@@ -316,6 +310,19 @@  int xen_pcibk_get_interrupt_type(struct pci_dev *dev)
 		if (val & PCI_MSIX_FLAGS_ENABLE)
 			ret |= INTERRUPT_TYPE_MSIX;
 	}
+
+	/*
+	 * PCIe spec says device cannot use INTx if MSI/MSI-X is enabled,
+	 * so check for INTx only when both are disabled.
+	 */
+	if (!ret) {
+		err = pci_read_config_word(dev, PCI_COMMAND, &val);
+		if (err)
+			return err;
+		if (!(val & PCI_COMMAND_INTX_DISABLE))
+			ret |= INTERRUPT_TYPE_INTX;
+	}
+
 	return ret ?: INTERRUPT_TYPE_NONE;
 }
 
diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c
index 097316a74126..eb4c1af44f5c 100644
--- a/drivers/xen/xen-pciback/conf_space_capability.c
+++ b/drivers/xen/xen-pciback/conf_space_capability.c
@@ -236,10 +236,11 @@  static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value,
 		return PCIBIOS_SET_FAILED;
 
 	if (new_value & field_config->enable_bit) {
-		/* don't allow enabling together with other interrupt types */
+		/* don't allow enabling together with other interrupt type */
 		int int_type = xen_pcibk_get_interrupt_type(dev);
 
 		if (int_type == INTERRUPT_TYPE_NONE ||
+		    int_type == INTERRUPT_TYPE_INTX ||
 		    int_type == field_config->int_type)
 			goto write;
 		return PCIBIOS_SET_FAILED;
diff --git a/drivers/xen/xen-pciback/conf_space_header.c b/drivers/xen/xen-pciback/conf_space_header.c
index 981435103af1..fc0332645966 100644
--- a/drivers/xen/xen-pciback/conf_space_header.c
+++ b/drivers/xen/xen-pciback/conf_space_header.c
@@ -104,24 +104,9 @@  static int command_write(struct pci_dev *dev, int offset, u16 value, void *data)
 		pci_clear_mwi(dev);
 	}
 
-	if (dev_data && dev_data->allow_interrupt_control) {
-		if ((cmd->val ^ value) & PCI_COMMAND_INTX_DISABLE) {
-			if (value & PCI_COMMAND_INTX_DISABLE) {
-				pci_intx(dev, 0);
-			} else {
-				/* Do not allow enabling INTx together with MSI or MSI-X. */
-				switch (xen_pcibk_get_interrupt_type(dev)) {
-				case INTERRUPT_TYPE_NONE:
-					pci_intx(dev, 1);
-					break;
-				case INTERRUPT_TYPE_INTX:
-					break;
-				default:
-					return PCIBIOS_SET_FAILED;
-				}
-			}
-		}
-	}
+	if (dev_data && dev_data->allow_interrupt_control &&
+	    ((cmd->val ^ value) & PCI_COMMAND_INTX_DISABLE))
+		pci_intx(dev, !(value & PCI_COMMAND_INTX_DISABLE));
 
 	cmd->val = value;