[v2] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4

Message ID 20230314135128.2930580-1-sdonthineni@nvidia.com
State New
Headers
Series [v2] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4 |

Commit Message

Shanker Donthineni March 14, 2023, 1:51 p.m. UTC
  The T241 platform suffers from the T241-FABRIC-4 erratum which causes
unexpected behavior in the GIC when multiple transactions are received
simultaneously from different sources. This hardware issue impacts
NVIDIA server platforms that use more than two T241 chips
interconnected. Each chip has support for 320 {E}SPIs.

This issue occurs when multiple packets from different GICs are
incorrectly interleaved at the target chip. The erratum text below
specifies exactly what can cause multiple transfer packets susceptible
to interleaving and GIC state corruption. GIC state corruption can
lead to a range of problems, including kernel panics, and unexpected
behavior.

From the erratum text:
  "In some cases, inter-socket AXI4 Stream packets with multiple
  transfers, may be interleaved by the fabric when presented to ARM
  Generic Interrupt Controller. GIC expects all transfers of a packet
  to be delivered without any interleaving.

  The following GICv3 commands may result in multiple transfer packets
  over inter-socket AXI4 Stream interface:
   - Register reads from GICD_I* and GICD_N*
   - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
   - ITS command MOVALL

  Multiple commands in GICv4+ utilize multiple transfer packets,
  including VMOVP, VMOVI and VMAPP.

  This issue impacts system configurations with more than 2 sockets,
  that require multi-transfer packets to be sent over inter-socket
  AXI4 Stream interface between GIC instances on different sockets.
  GICv4 cannot be supported. GICv3 SW model can only be supported
  with the workaround. Single and Dual socket configurations are not
  impacted by this issue and support GICv3 and GICv4."

Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf

Writing to the chip alias region of the GICD_In{E} registers except
GICD_ICENABLERn has an equivalent effect as writing to the global
distributor. The SPI interrupt deactivate path is not impacted by
the erratum.

To fix this problem, implement a workaround that ensures read accesses
to the GICD_In{E} registers are directed to the chip that owns the
SPI, and disables GICv4.x features for KVM. To simplify code changes,
the gic_configure_irq() function uses the same alias region for both
read and write operations to GICD_ICFGR.

Co-developed-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
---
Changes since v1:
 - Use SMCCC SOC-ID API for detecting the T241 chip
 - Implement Marc's suggestions
 - Edit commit text

 Documentation/arm64/silicon-errata.rst |   2 +
 drivers/firmware/smccc/smccc.c         |  13 +++
 drivers/irqchip/irq-gic-v3.c           | 116 +++++++++++++++++++++++--
 include/linux/arm-smccc.h              |   1 +
 4 files changed, 123 insertions(+), 9 deletions(-)
  

Comments

Shanker Donthineni March 15, 2023, 12:27 p.m. UTC | #1
Hi Marc,

On 3/15/23 03:34, Marc Zyngier wrote:
> Please don't duplicate existing code. There is already the required
> infrastructure in drivers/firmware/smccc/soc_id.c. All you need to do
> is:
> 
> - disassociate the SMCCC probing from the device registration
> 
> - probe the SOC_ID early
> 
> - add accessors for the relevant data
> 
> - select ARM_SMCCC_SOD_ID/ARM_SMCCC_DISCOVERY from the GICv3 Kconfig


I have not modified soc_id.c as it expects to be loaded as a module with
the use of module_init() and module_exit() functions. The exported symbols
in soc_id driver cannot be accessed from the built-in code.

Agree, the SOD-ID discovery code was duplicated.

Please guide me if the below approach is okay?

1) Probe the SOC-ID in arm_smccc_version_init() and export two functions
arm_smccc_get_soc_id_version() and arm_smccc_get_soc_id_revision().

--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -17,9 +17,13 @@ static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;

  bool __ro_after_init smccc_trng_available = false;
  u64 __ro_after_init smccc_has_sve_hint = false;
+s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
+s32 __ro_after_init smccc_soc_id_revision = SMCCC_RET_NOT_SUPPORTED;

  void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
  {
+       struct arm_smccc_res res;
+
         smccc_version = version;
         smccc_conduit = conduit;

@@ -27,6 +31,18 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
         if (IS_ENABLED(CONFIG_ARM64_SVE) &&
             smccc_version >= ARM_SMCCC_VERSION_1_3)
                 smccc_has_sve_hint = true;
+
+       if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
+           (smccc_conduit != SMCCC_CONDUIT_NONE)) {
+               arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+                                    ARM_SMCCC_ARCH_SOC_ID, &res);
+               if ((s32)res.a0 >= 0) {
+                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
+                       smccc_soc_id_version = (s32)res.a0;
+                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 1, &res);
+                       smccc_soc_id_revision = (s32)res.a0;
+               }
+       }
  }


+s32 arm_smccc_get_soc_id_version(void)
+{
+       return smccc_soc_id_version;
+}
+EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_version);
+
+s32 arm_smccc_get_soc_id_revision(void)
+{
+       return smccc_soc_id_revision;
+}
+EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);


--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
+/**
+ * arm_smccc_get_soc_id_version()
+ *
+ * Returns the SOC ID version.
+ *
+ * When ARM_SMCCC_ARCH_SOC_ID is not present, returns SMCCC_RET_NOT_SUPPORTED.
+ */
+s32 arm_smccc_get_soc_id_version(void);

+/**
+ * arm_smccc_get_soc_id_revision()
+ *
+ * Returns the SOC ID revision.
+ *
+ * When ARM_SMCCC_ARCH_SOC_ID is not present, returns SMCCC_RET_NOT_SUPPORTED.
+ */
+s32 arm_smccc_get_soc_id_revision(void);


2) Use the exported functions in soc_id module.

--- a/drivers/firmware/smccc/soc_id.c
+++ b/drivers/firmware/smccc/soc_id.c
@@ -42,41 +42,23 @@ static int __init smccc_soc_init(void)
         if (arm_smccc_get_version() < ARM_SMCCC_VERSION_1_2)
                 return 0;

-       if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE) {
-               pr_err("%s: invalid SMCCC conduit\n", __func__);
-               return -EOPNOTSUPP;
-       }
-
-       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-                            ARM_SMCCC_ARCH_SOC_ID, &res);
-
-       if ((int)res.a0 == SMCCC_RET_NOT_SUPPORTED) {
+       soc_id_version = arm_smccc_get_soc_id_version();
+       if (soc_id_version == SMCCC_RET_NOT_SUPPORTED) {
                 pr_info("ARCH_SOC_ID not implemented, skipping ....\n");
                 return 0;
         }

-       if ((int)res.a0 < 0) {
-               pr_info("ARCH_FEATURES(ARCH_SOC_ID) returned error: %lx\n",
-                       res.a0);
-               return -EINVAL;
-       }
-
-       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
-       if ((int)res.a0 < 0) {
+       if (soc_id_version < 0) {
                 pr_err("ARCH_SOC_ID(0) returned error: %lx\n", res.a0);
                 return -EINVAL;
         }

-       soc_id_version = res.a0;
-
-       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 1, &res);
-       if ((int)res.a0 < 0) {
+       soc_id_rev = arm_smccc_get_soc_id_revision();
+       if (soc_id_rev < 0) {
                 pr_err("ARCH_SOC_ID(1) returned error: %lx\n", res.a0);
                 return -EINVAL;
         }

-       soc_id_rev = res.a0;

Kconfig:
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -35,6 +35,7 @@ config ARM_GIC_V3
         select IRQ_DOMAIN_HIERARCHY
         select PARTITION_PERCPU
         select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
+       select HAVE_ARM_SMCCC_DISCOVERY
  
kernel test robot March 15, 2023, 10:07 p.m. UTC | #2
Hi Shanker,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on tip/irq/core soc/for-next linus/master v6.3-rc2 next-20230315]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shanker-Donthineni/irqchip-gicv3-Workaround-for-NVIDIA-erratum-T241-FABRIC-4/20230314-215648
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link:    https://lore.kernel.org/r/20230314135128.2930580-1-sdonthineni%40nvidia.com
patch subject: [PATCH v2] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
config: arm-buildonly-randconfig-r003-20230312 (https://download.01.org/0day-ci/archive/20230316/202303160555.3zGOmtIL-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/intel-lab-lkp/linux/commit/f796361134151057b68a259013204e8fa5516aee
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Shanker-Donthineni/irqchip-gicv3-Workaround-for-NVIDIA-erratum-T241-FABRIC-4/20230314-215648
        git checkout f796361134151057b68a259013204e8fa5516aee
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303160555.3zGOmtIL-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/irqchip/irq-gic-v3.c:1775:21: error: call to undeclared function 'FIELD_GET'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
                                     ^
>> drivers/irqchip/irq-gic-v3.c:1786:12: error: call to undeclared function 'FIELD_PREP'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                           phys |= FIELD_PREP(T241_CHIPN_MASK, i);
                                   ^
   2 errors generated.


vim +/FIELD_GET +1775 drivers/irqchip/irq-gic-v3.c

  1760	
  1761	static bool gic_enable_quirk_nvidia_t241(void *data)
  1762	{
  1763		unsigned long chip_bmask = 0;
  1764		phys_addr_t phys;
  1765		u32 i;
  1766	
  1767		/* Check JEP106 code for NVIDIA T241 chip (036b:0241) */
  1768		if ((smccc_soc_id_version < 0) ||
  1769		    ((smccc_soc_id_version & SMCCC_SOC_ID_MASK) != SMCCC_SOC_ID_T241)) {
  1770			return false;
  1771		}
  1772	
  1773		/* Find the chips based on GICR regions PHYS addr */
  1774		for (i = 0; i < gic_data.nr_redist_regions; i++) {
> 1775			chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
  1776					  gic_data.redist_regions[i].phys_base));
  1777		}
  1778	
  1779		if (hweight32(chip_bmask) < 3)
  1780			return false;
  1781	
  1782		/* Setup GICD alias regions */
  1783		for (i = 0; i < ARRAY_SIZE(t241_dist_base_alias); i++) {
  1784			if (chip_bmask & BIT(i)) {
  1785				phys = gic_data.dist_phys_base + T241_CHIP_GICDA_OFFSET;
> 1786				phys |= FIELD_PREP(T241_CHIPN_MASK, i);
  1787				t241_dist_base_alias[i] = ioremap(phys, SZ_64K);
  1788				WARN_ON_ONCE(!t241_dist_base_alias[i]);
  1789			}
  1790		}
  1791		static_branch_enable(&gic_nvidia_t241_erratum);
  1792		return true;
  1793	}
  1794
  
kernel test robot March 15, 2023, 10:48 p.m. UTC | #3
Hi Shanker,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on tip/irq/core soc/for-next linus/master v6.3-rc2 next-20230315]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shanker-Donthineni/irqchip-gicv3-Workaround-for-NVIDIA-erratum-T241-FABRIC-4/20230314-215648
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link:    https://lore.kernel.org/r/20230314135128.2930580-1-sdonthineni%40nvidia.com
patch subject: [PATCH v2] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
config: arm-randconfig-r046-20230315 (https://download.01.org/0day-ci/archive/20230316/202303160647.R16cnZkc-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/f796361134151057b68a259013204e8fa5516aee
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Shanker-Donthineni/irqchip-gicv3-Workaround-for-NVIDIA-erratum-T241-FABRIC-4/20230314-215648
        git checkout f796361134151057b68a259013204e8fa5516aee
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/irqchip/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303160647.R16cnZkc-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/bits.h:6,
                    from include/linux/ioport.h:13,
                    from include/linux/acpi.h:12,
                    from drivers/irqchip/irq-gic-v3.c:9:
   drivers/irqchip/irq-gic-v3.c: In function 'gic_enable_quirk_nvidia_t241':
>> drivers/irqchip/irq-gic-v3.c:1775:35: error: implicit declaration of function 'FIELD_GET'; did you mean 'FOLL_GET'? [-Werror=implicit-function-declaration]
    1775 |                 chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
         |                                   ^~~~~~~~~
   include/vdso/bits.h:7:44: note: in definition of macro 'BIT'
       7 | #define BIT(nr)                 (UL(1) << (nr))
         |                                            ^~
>> drivers/irqchip/irq-gic-v3.c:1786:33: error: implicit declaration of function 'FIELD_PREP' [-Werror=implicit-function-declaration]
    1786 |                         phys |= FIELD_PREP(T241_CHIPN_MASK, i);
         |                                 ^~~~~~~~~~
   cc1: some warnings being treated as errors


vim +1775 drivers/irqchip/irq-gic-v3.c

  1760	
  1761	static bool gic_enable_quirk_nvidia_t241(void *data)
  1762	{
  1763		unsigned long chip_bmask = 0;
  1764		phys_addr_t phys;
  1765		u32 i;
  1766	
  1767		/* Check JEP106 code for NVIDIA T241 chip (036b:0241) */
  1768		if ((smccc_soc_id_version < 0) ||
  1769		    ((smccc_soc_id_version & SMCCC_SOC_ID_MASK) != SMCCC_SOC_ID_T241)) {
  1770			return false;
  1771		}
  1772	
  1773		/* Find the chips based on GICR regions PHYS addr */
  1774		for (i = 0; i < gic_data.nr_redist_regions; i++) {
> 1775			chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
  1776					  gic_data.redist_regions[i].phys_base));
  1777		}
  1778	
  1779		if (hweight32(chip_bmask) < 3)
  1780			return false;
  1781	
  1782		/* Setup GICD alias regions */
  1783		for (i = 0; i < ARRAY_SIZE(t241_dist_base_alias); i++) {
  1784			if (chip_bmask & BIT(i)) {
  1785				phys = gic_data.dist_phys_base + T241_CHIP_GICDA_OFFSET;
> 1786				phys |= FIELD_PREP(T241_CHIPN_MASK, i);
  1787				t241_dist_base_alias[i] = ioremap(phys, SZ_64K);
  1788				WARN_ON_ONCE(!t241_dist_base_alias[i]);
  1789			}
  1790		}
  1791		static_branch_enable(&gic_nvidia_t241_erratum);
  1792		return true;
  1793	}
  1794
  
Sudeep Holla March 16, 2023, 3:10 p.m. UTC | #4
On Wed, Mar 15, 2023 at 07:27:14AM -0500, Shanker Donthineni wrote:
> Hi Marc,
> 
> On 3/15/23 03:34, Marc Zyngier wrote:
> > Please don't duplicate existing code. There is already the required
> > infrastructure in drivers/firmware/smccc/soc_id.c. All you need to do
> > is:
> > 
> > - disassociate the SMCCC probing from the device registration
> > 
> > - probe the SOC_ID early
> > 
> > - add accessors for the relevant data
> > 
> > - select ARM_SMCCC_SOD_ID/ARM_SMCCC_DISCOVERY from the GICv3 Kconfig
> 
> 
> I have not modified soc_id.c as it expects to be loaded as a module with
> the use of module_init() and module_exit() functions. The exported symbols
> in soc_id driver cannot be accessed from the built-in code.
> 
> Agree, the SOD-ID discovery code was duplicated.
> 
> Please guide me if the below approach is okay?
> 
> 1) Probe the SOC-ID in arm_smccc_version_init() and export two functions
> arm_smccc_get_soc_id_version() and arm_smccc_get_soc_id_revision().
> 
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
> @@ -17,9 +17,13 @@ static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
> 
>  bool __ro_after_init smccc_trng_available = false;
>  u64 __ro_after_init smccc_has_sve_hint = false;
> +s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
> +s32 __ro_after_init smccc_soc_id_revision = SMCCC_RET_NOT_SUPPORTED;
> 
>  void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
>  {
> +       struct arm_smccc_res res;
> +
>         smccc_version = version;
>         smccc_conduit = conduit;
> 
> @@ -27,6 +31,18 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
>         if (IS_ENABLED(CONFIG_ARM64_SVE) &&
>             smccc_version >= ARM_SMCCC_VERSION_1_3)
>                 smccc_has_sve_hint = true;
> +
> +       if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
> +           (smccc_conduit != SMCCC_CONDUIT_NONE)) {
> +               arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +                                    ARM_SMCCC_ARCH_SOC_ID, &res);
> +               if ((s32)res.a0 >= 0) {
> +                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
> +                       smccc_soc_id_version = (s32)res.a0;
> +                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 1, &res);
> +                       smccc_soc_id_revision = (s32)res.a0;
> +               }
> +       }
>  }
> 
> 
> +s32 arm_smccc_get_soc_id_version(void)
> +{
> +       return smccc_soc_id_version;
> +}
> +EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_version);
> +
> +s32 arm_smccc_get_soc_id_revision(void)
> +{
> +       return smccc_soc_id_revision;
> +}
> +EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);
> 
>

Overall, it looks OK to me. However I see neither the gic nor the soc_id
can be build as module atm. So do we really need the export symbols if no
other modules are using it ?
  
Marc Zyngier March 16, 2023, 4 p.m. UTC | #5
On 2023-03-16 15:10, Sudeep Holla wrote:
> On Wed, Mar 15, 2023 at 07:27:14AM -0500, Shanker Donthineni wrote:
>> Hi Marc,
>> 
>> On 3/15/23 03:34, Marc Zyngier wrote:
>> > Please don't duplicate existing code. There is already the required
>> > infrastructure in drivers/firmware/smccc/soc_id.c. All you need to do
>> > is:
>> >
>> > - disassociate the SMCCC probing from the device registration
>> >
>> > - probe the SOC_ID early
>> >
>> > - add accessors for the relevant data
>> >
>> > - select ARM_SMCCC_SOD_ID/ARM_SMCCC_DISCOVERY from the GICv3 Kconfig
>> 
>> 
>> I have not modified soc_id.c as it expects to be loaded as a module 
>> with
>> the use of module_init() and module_exit() functions. The exported 
>> symbols
>> in soc_id driver cannot be accessed from the built-in code.
>> 
>> Agree, the SOD-ID discovery code was duplicated.
>> 
>> Please guide me if the below approach is okay?
>> 
>> 1) Probe the SOC-ID in arm_smccc_version_init() and export two 
>> functions
>> arm_smccc_get_soc_id_version() and arm_smccc_get_soc_id_revision().
>> 
>> --- a/drivers/firmware/smccc/smccc.c
>> +++ b/drivers/firmware/smccc/smccc.c
>> @@ -17,9 +17,13 @@ static enum arm_smccc_conduit smccc_conduit = 
>> SMCCC_CONDUIT_NONE;
>> 
>>  bool __ro_after_init smccc_trng_available = false;
>>  u64 __ro_after_init smccc_has_sve_hint = false;
>> +s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
>> +s32 __ro_after_init smccc_soc_id_revision = SMCCC_RET_NOT_SUPPORTED;
>> 
>>  void __init arm_smccc_version_init(u32 version, enum 
>> arm_smccc_conduit conduit)
>>  {
>> +       struct arm_smccc_res res;
>> +
>>         smccc_version = version;
>>         smccc_conduit = conduit;
>> 
>> @@ -27,6 +31,18 @@ void __init arm_smccc_version_init(u32 version, 
>> enum arm_smccc_conduit conduit)
>>         if (IS_ENABLED(CONFIG_ARM64_SVE) &&
>>             smccc_version >= ARM_SMCCC_VERSION_1_3)
>>                 smccc_has_sve_hint = true;
>> +
>> +       if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
>> +           (smccc_conduit != SMCCC_CONDUIT_NONE)) {
>> +               arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
>> +                                    ARM_SMCCC_ARCH_SOC_ID, &res);
>> +               if ((s32)res.a0 >= 0) {
>> +                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, 
>> &res);
>> +                       smccc_soc_id_version = (s32)res.a0;
>> +                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 1, 
>> &res);
>> +                       smccc_soc_id_revision = (s32)res.a0;
>> +               }
>> +       }
>>  }
>> 
>> 
>> +s32 arm_smccc_get_soc_id_version(void)
>> +{
>> +       return smccc_soc_id_version;
>> +}
>> +EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_version);
>> +
>> +s32 arm_smccc_get_soc_id_revision(void)
>> +{
>> +       return smccc_soc_id_revision;
>> +}
>> +EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);
>> 
>> 
> 
> Overall, it looks OK to me. However I see neither the gic nor the 
> soc_id
> can be build as module atm. So do we really need the export symbols if 
> no
> other modules are using it ?

It really shouldn't be exported. Having accessors should be enough.

Thanks,

         M.
  
Shanker Donthineni March 16, 2023, 10:41 p.m. UTC | #6
On 3/16/23 11:00, Marc Zyngier wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-16 15:10, Sudeep Holla wrote:
>> On Wed, Mar 15, 2023 at 07:27:14AM -0500, Shanker Donthineni wrote:
>>> Hi Marc,
>>>
>>> On 3/15/23 03:34, Marc Zyngier wrote:
>>> > Please don't duplicate existing code. There is already the required
>>> > infrastructure in drivers/firmware/smccc/soc_id.c. All you need to do
>>> > is:
>>> >
>>> > - disassociate the SMCCC probing from the device registration
>>> >
>>> > - probe the SOC_ID early
>>> >
>>> > - add accessors for the relevant data
>>> >
>>> > - select ARM_SMCCC_SOD_ID/ARM_SMCCC_DISCOVERY from the GICv3 Kconfig
>>>
>>>
>>> I have not modified soc_id.c as it expects to be loaded as a module
>>> with
>>> the use of module_init() and module_exit() functions. The exported
>>> symbols
>>> in soc_id driver cannot be accessed from the built-in code.
>>>
>>> Agree, the SOD-ID discovery code was duplicated.
>>>
>>> Please guide me if the below approach is okay?
>>>
>>> 1) Probe the SOC-ID in arm_smccc_version_init() and export two
>>> functions
>>> arm_smccc_get_soc_id_version() and arm_smccc_get_soc_id_revision().
>>>
>>> --- a/drivers/firmware/smccc/smccc.c
>>> +++ b/drivers/firmware/smccc/smccc.c
>>> @@ -17,9 +17,13 @@ static enum arm_smccc_conduit smccc_conduit =
>>> SMCCC_CONDUIT_NONE;
>>>
>>>  bool __ro_after_init smccc_trng_available = false;
>>>  u64 __ro_after_init smccc_has_sve_hint = false;
>>> +s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
>>> +s32 __ro_after_init smccc_soc_id_revision = SMCCC_RET_NOT_SUPPORTED;
>>>
>>>  void __init arm_smccc_version_init(u32 version, enum
>>> arm_smccc_conduit conduit)
>>>  {
>>> +       struct arm_smccc_res res;
>>> +
>>>         smccc_version = version;
>>>         smccc_conduit = conduit;
>>>
>>> @@ -27,6 +31,18 @@ void __init arm_smccc_version_init(u32 version,
>>> enum arm_smccc_conduit conduit)
>>>         if (IS_ENABLED(CONFIG_ARM64_SVE) &&
>>>             smccc_version >= ARM_SMCCC_VERSION_1_3)
>>>                 smccc_has_sve_hint = true;
>>> +
>>> +       if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
>>> +           (smccc_conduit != SMCCC_CONDUIT_NONE)) {
>>> +               arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
>>> +                                    ARM_SMCCC_ARCH_SOC_ID, &res);
>>> +               if ((s32)res.a0 >= 0) {
>>> +                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0,
>>> &res);
>>> +                       smccc_soc_id_version = (s32)res.a0;
>>> +                       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 1,
>>> &res);
>>> +                       smccc_soc_id_revision = (s32)res.a0;
>>> +               }
>>> +       }
>>>  }
>>>
>>>
>>> +s32 arm_smccc_get_soc_id_version(void)
>>> +{
>>> +       return smccc_soc_id_version;
>>> +}
>>> +EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_version);
>>> +
>>> +s32 arm_smccc_get_soc_id_revision(void)
>>> +{
>>> +       return smccc_soc_id_revision;
>>> +}
>>> +EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);
>>>
>>>
>>
>> Overall, it looks OK to me. However I see neither the gic nor the
>> soc_id
>> can be build as module atm. So do we really need the export symbols if
>> no
>> other modules are using it ?
> 
> It really shouldn't be exported. Having accessors should be enough.
> 

Thanks, I'll remove in v3 patch.


-Shanker
  

Patch

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index ec5f889d76819..e31f6c0687041 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -172,6 +172,8 @@  stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | Carmel Core     | N/A             | NVIDIA_CARMEL_CNP_ERRATUM   |
 +----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA         | T241 GICv3/4.x  | T241-FABRIC-4   | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
 | Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index 60ccf3e90d7de..cb688121270b8 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -17,9 +17,12 @@  static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
 
 bool __ro_after_init smccc_trng_available = false;
 u64 __ro_after_init smccc_has_sve_hint = false;
+s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
 
 void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
 {
+	struct arm_smccc_res res;
+
 	smccc_version = version;
 	smccc_conduit = conduit;
 
@@ -27,6 +30,16 @@  void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    smccc_version >= ARM_SMCCC_VERSION_1_3)
 		smccc_has_sve_hint = true;
+
+	if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
+	    (smccc_conduit != SMCCC_CONDUIT_NONE)) {
+		arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+				     ARM_SMCCC_ARCH_SOC_ID, &res);
+		if ((s32)res.a0 >= 0) {
+			arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
+			smccc_soc_id_version = (s32)res.a0;
+		}
+	}
 }
 
 enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 997104d4338e7..02c699ce92b12 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -24,6 +24,7 @@ 
 #include <linux/irqchip/arm-gic-common.h>
 #include <linux/irqchip/arm-gic-v3.h>
 #include <linux/irqchip/irq-partition-percpu.h>
+#include <linux/arm-smccc.h>
 
 #include <asm/cputype.h>
 #include <asm/exception.h>
@@ -57,8 +58,14 @@  struct gic_chip_data {
 	bool			has_rss;
 	unsigned int		ppi_nr;
 	struct partition_desc	**ppi_descs;
+	phys_addr_t		dist_phys_base;
 };
 
+
+#define T241_CHIPS_MAX		4
+static void __iomem *t241_dist_base_alias[T241_CHIPS_MAX];
+static DEFINE_STATIC_KEY_FALSE(gic_nvidia_t241_erratum);
+
 static struct gic_chip_data gic_data __read_mostly;
 static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
 
@@ -188,6 +195,34 @@  static inline bool gic_irq_in_rdist(struct irq_data *d)
 	}
 }
 
+static inline void __iomem *gic_dist_base_alias(irq_hw_number_t intid)
+{
+	u32 chip;
+
+	if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
+		/**
+		 *  {E}SPI mappings for all 4 chips
+		 *    Chip0 = 32-351
+		 *    Chip1 = 352-671
+		 *    Chip2 = 672-991
+		 *    Chip3 = 4096-4415
+		 */
+		switch (__get_intid_range(intid)) {
+		case SPI_RANGE:
+			chip = (intid - 32) / 320;
+			break;
+		case ESPI_RANGE:
+			chip = 3;
+			break;
+		default:
+			unreachable();
+		}
+		return t241_dist_base_alias[chip];
+	}
+
+	return gic_data.dist_base;
+}
+
 static inline void __iomem *gic_dist_base(struct irq_data *d)
 {
 	switch (get_intid_range(d)) {
@@ -343,10 +378,14 @@  static int gic_peek_irq(struct irq_data *d, u32 offset)
 	offset = convert_offset_index(d, offset, &index);
 	mask = 1 << (index % 32);
 
+	/**
+	 * For the erratum T241-FABRIC-4, read accesses to the GICD_In{E}
+	 * registers are directed to the chip that owns the SPI.
+	 */
 	if (gic_irq_in_rdist(d))
 		base = gic_data_rdist_sgi_base();
 	else
-		base = gic_data.dist_base;
+		base = gic_dist_base_alias(irqd_to_hwirq(d));
 
 	return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask);
 }
@@ -594,13 +633,17 @@  static int gic_set_type(struct irq_data *d, unsigned int type)
 	    type != IRQ_TYPE_LEVEL_HIGH && type != IRQ_TYPE_EDGE_RISING)
 		return -EINVAL;
 
+	/**
+	 * For the erratum T241-FABRIC-4, read accesses to the GICD_In{E}
+	 * registers are directed to the chip that owns the SPI. Use the
+	 * same alias region for GICD_ICFGR writes to simplify code.
+	 */
 	if (gic_irq_in_rdist(d))
 		base = gic_data_rdist_sgi_base();
 	else
-		base = gic_data.dist_base;
+		base = gic_dist_base_alias(irqd_to_hwirq(d));
 
 	offset = convert_offset_index(d, GICD_ICFGR, &index);
-
 	ret = gic_configure_irq(index, type, base + offset, NULL);
 	if (ret && (range == PPI_RANGE || range == EPPI_RANGE)) {
 		/* Misconfigured PPIs are usually not fatal */
@@ -1719,6 +1762,45 @@  static bool gic_enable_quirk_hip06_07(void *data)
 	return false;
 }
 
+#define T241_CHIPN_MASK		GENMASK_ULL(45, 44)
+#define T241_CHIP_GICDA_OFFSET	0x1580000
+#define SMCCC_SOC_ID_MASK	0x7f7fffff
+#define SMCCC_SOC_ID_T241	0x036b0241
+
+static bool gic_enable_quirk_nvidia_t241(void *data)
+{
+	unsigned long chip_bmask = 0;
+	phys_addr_t phys;
+	u32 i;
+
+	/* Check JEP106 code for NVIDIA T241 chip (036b:0241) */
+	if ((smccc_soc_id_version < 0) ||
+	    ((smccc_soc_id_version & SMCCC_SOC_ID_MASK) != SMCCC_SOC_ID_T241)) {
+		return false;
+	}
+
+	/* Find the chips based on GICR regions PHYS addr */
+	for (i = 0; i < gic_data.nr_redist_regions; i++) {
+		chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
+				  gic_data.redist_regions[i].phys_base));
+	}
+
+	if (hweight32(chip_bmask) < 3)
+		return false;
+
+	/* Setup GICD alias regions */
+	for (i = 0; i < ARRAY_SIZE(t241_dist_base_alias); i++) {
+		if (chip_bmask & BIT(i)) {
+			phys = gic_data.dist_phys_base + T241_CHIP_GICDA_OFFSET;
+			phys |= FIELD_PREP(T241_CHIPN_MASK, i);
+			t241_dist_base_alias[i] = ioremap(phys, SZ_64K);
+			WARN_ON_ONCE(!t241_dist_base_alias[i]);
+		}
+	}
+	static_branch_enable(&gic_nvidia_t241_erratum);
+	return true;
+}
+
 static const struct gic_quirk gic_quirks[] = {
 	{
 		.desc	= "GICv3: Qualcomm MSM8996 broken firmware",
@@ -1750,6 +1832,12 @@  static const struct gic_quirk gic_quirks[] = {
 		.mask	= 0xe8f00fff,
 		.init	= gic_enable_quirk_cavium_38539,
 	},
+	{
+		.desc	= "GICv3: NVIDIA erratum T241-FABRIC-4",
+		.iidr	= 0x0402043b,
+		.mask	= 0xffffffff,
+		.init	= gic_enable_quirk_nvidia_t241,
+	},
 	{
 	}
 };
@@ -1821,7 +1909,8 @@  static int __init gic_init_bases(void __iomem *dist_base,
 				 struct redist_region *rdist_regs,
 				 u32 nr_redist_regions,
 				 u64 redist_stride,
-				 struct fwnode_handle *handle)
+				 struct fwnode_handle *handle,
+				 phys_addr_t dist_phys_base)
 {
 	u32 typer;
 	int err;
@@ -1834,6 +1923,7 @@  static int __init gic_init_bases(void __iomem *dist_base,
 
 	gic_data.fwnode = handle;
 	gic_data.dist_base = dist_base;
+	gic_data.dist_phys_base = dist_phys_base;
 	gic_data.redist_regions = rdist_regs;
 	gic_data.nr_redist_regions = nr_redist_regions;
 	gic_data.redist_stride = redist_stride;
@@ -2072,11 +2162,12 @@  static int __init gic_of_init(struct device_node *node, struct device_node *pare
 	void __iomem *dist_base;
 	struct redist_region *rdist_regs;
 	struct resource res;
+	struct resource dist_res;
 	u64 redist_stride;
 	u32 nr_redist_regions;
 	int err, i;
 
-	dist_base = gic_of_iomap(node, 0, "GICD", &res);
+	dist_base = gic_of_iomap(node, 0, "GICD", &dist_res);
 	if (IS_ERR(dist_base)) {
 		pr_err("%pOF: unable to map gic dist registers\n", node);
 		return PTR_ERR(dist_base);
@@ -2114,7 +2205,7 @@  static int __init gic_of_init(struct device_node *node, struct device_node *pare
 	gic_enable_of_quirks(node, gic_quirks, &gic_data);
 
 	err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions,
-			     redist_stride, &node->fwnode);
+			     redist_stride, &node->fwnode, dist_res.start);
 	if (err)
 		goto out_unmap_rdist;
 
@@ -2377,8 +2468,14 @@  static void __init gic_acpi_setup_kvm_info(void)
 		vcpu->end = vcpu->start + ACPI_GICV2_VCPU_MEM_SIZE - 1;
 	}
 
-	gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
-	gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
+	/* Disable GICv4.x features for the erratum T241-FABRIC-4 */
+	if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
+		gic_v3_kvm_info.has_v4 = false;
+		gic_v3_kvm_info.has_v4_1 = false;
+	} else {
+		gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
+		gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
+	}
 	vgic_set_kvm_info(&gic_v3_kvm_info);
 }
 
@@ -2431,7 +2528,8 @@  gic_acpi_init(union acpi_subtable_headers *header, const unsigned long end)
 	}
 
 	err = gic_init_bases(acpi_data.dist_base, acpi_data.redist_regs,
-			     acpi_data.nr_redist_regions, 0, gsi_domain_handle);
+			     acpi_data.nr_redist_regions, 0, gsi_domain_handle,
+			     dist->base_address);
 	if (err)
 		goto out_fwhandle_free;
 
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 220c8c60e021a..f74e13938693e 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -225,6 +225,7 @@  u32 arm_smccc_get_version(void);
 void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit);
 
 extern u64 smccc_has_sve_hint;
+extern s32 smccc_soc_id_version;
 
 /**
  * struct arm_smccc_res - Result from SMC/HVC call