Message ID | 20240201184238.2542695-1-b-brnich@ti.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-48716-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp388594dyb; Thu, 1 Feb 2024 11:14:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IFYWzEXHMIC2xJ/KdbSdvevjIIy6hI6S5Oketl/srEwPZZP6uCbu5bQJgfqALtic4jTp3hf X-Received: by 2002:a17:902:ebce:b0:1d7:17e6:44df with SMTP id p14-20020a170902ebce00b001d717e644dfmr6822977plg.14.1706814851806; Thu, 01 Feb 2024 11:14:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706814851; cv=pass; d=google.com; s=arc-20160816; b=y6rM1yjJdX7fNUBVyeYHvF/nivFduLcsJjetwsIqn7ZmmIg6HMmvqHAbsgLbckYrkr RIa2N7ZTMcUMxY1uQYW8rwZUBRuHz2RXpmjHH5VzrWONJKIkFU0iGComYlarF3/TGpRj X8Q2ngJChsmHW5LO8Nek+vaTf66HATzbFj3idvqso3J69RsG1Jyyfr8t1pMRfkEG1RPa dFwILP3tsQtDbXD8xtH31NRB9yRVk+eZoFvprn1EsuWu5epUHjCeVrjSQhKXDescgo2i dxi6qk/60K9Re37vRhIF0cPeJ+v9Dnr2aPet3CUOXKnBIVJTFigoGX7HdAFjE7o0C+Z4 PtGQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=Ew1MvrMW2/yaRtaTroFMXsF9Ix1hzxk1BU3jjawkr2k=; fh=V+tN+p5Q0PDJNlS2uOi0XYMG3yLbVZQmVOYf10wlqh0=; b=PbGBa7D8QiBrUYcKXPQj6re7saPxgnlGbco2KHIX1c5KbpV1MVWWa5uwojJZaDzr8H eyXXx+QHLglIp8sucRhtfT37MQbgElpwkUuEA2QwaDCtYPhGZit6zwFV/kKQ/ZOqShPS e0QewEIdheLf5Z3mok39UCWff5Aai/d/7qN385A/0DFXC33PH7Q66Jcfc34DkNTdcASl pe7xgWmr/1Z+4S7p69xzpDiGstevjnxes4d+C15Voz6piRNzPjpfirt2b2AGnZc6ssLa ygL7QTm1J+jHXZSrC562xd2PBxw1pd8v97vh3kRa5OVs2kC7qIVmR7ljmS4D3yvYh8Ji G9NQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=sTWw38OL; arc=pass (i=1 spf=pass spfdomain=ti.com dkim=pass dkdomain=ti.com dmarc=pass fromdomain=ti.com); spf=pass (google.com: domain of linux-kernel+bounces-48716-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48716-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com X-Forwarded-Encrypted: i=1; AJvYcCUPnyr9dtN2RILgcQdhAMiRYQJ1M3t2ycYnr1OngV9QPiO9YPcr92cjOMGN4bzj/kYxXaZBeRviQLcTfjgbzHMhe7GCsg== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id o8-20020a17090ac70800b002960647b16esi2698609pjt.42.2024.02.01.11.14.11 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 11:14:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48716-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=sTWw38OL; arc=pass (i=1 spf=pass spfdomain=ti.com dkim=pass dkdomain=ti.com dmarc=pass fromdomain=ti.com); spf=pass (google.com: domain of linux-kernel+bounces-48716-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48716-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id DB09FB27A51 for <ouuuleilei@gmail.com>; Thu, 1 Feb 2024 18:43:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A94D084FB5; Thu, 1 Feb 2024 18:42:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=ti.com header.i=@ti.com header.b="sTWw38OL" Received: from lelv0142.ext.ti.com (lelv0142.ext.ti.com [198.47.23.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60AFA12FB08; Thu, 1 Feb 2024 18:42:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.47.23.249 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706812972; cv=none; b=IvoB8RP3R5yyKXp3sN7hixpqh+LBZ0vKOVq2LXnQKPwoUWxUCCRx3pNB6D+zEkBdDTohRPxAIG7CcOMPFG2soyqCLOt+I2kcaVKgmKGYCPKmkiZqkwaNPqabXzPQWK2nwKv37eUKwL2rz+VWl4y9uN1GnVz8pELg6ukprBObyvY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706812972; c=relaxed/simple; bh=VIvy4lTH+SG5Ye0Xk/ugj+PgutpBZ54WPeCaP7NfPJ8=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=q0FIgEMx7hoSXYOYMd9RdQ2JEEoRuUywdnL1b+NR6zdKNh6LRrf1HG9rlsRVzPHxCsiwz8LCKOB58C/KDTuYLGEqMKNEDs0FrmGggncFz+YODFqe9j413xtrD9LRL+wZVqWKF03vAV9Ff7fA1qBRsgVc8T0efZUIAfK8ljJSq0A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=ti.com; spf=pass smtp.mailfrom=ti.com; dkim=pass (1024-bit key) header.d=ti.com header.i=@ti.com header.b=sTWw38OL; arc=none smtp.client-ip=198.47.23.249 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=ti.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ti.com Received: from lelv0265.itg.ti.com ([10.180.67.224]) by lelv0142.ext.ti.com (8.15.2/8.15.2) with ESMTP id 411IgfX5006990; Thu, 1 Feb 2024 12:42:41 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ti.com; s=ti-com-17Q1; t=1706812961; bh=Ew1MvrMW2/yaRtaTroFMXsF9Ix1hzxk1BU3jjawkr2k=; h=From:To:CC:Subject:Date; b=sTWw38OL1IluL6GvAACoKlFW9X7dmgmrVfjQ6iXvAx7qAmOkhgfJpJgRhc2sLmVx7 usO1ypP0yaNcBblMI6fb4F+3IO9Sbj7AXTz5toapXM4Ob9lqAv0m5DsDLTOjWJob6t oEEMP9hisLqEBssr/O1GcH7ZwzYH/5yQ6fyElgog= Received: from DFLE115.ent.ti.com (dfle115.ent.ti.com [10.64.6.36]) by lelv0265.itg.ti.com (8.15.2/8.15.2) with ESMTPS id 411Igf1r023903 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 1 Feb 2024 12:42:41 -0600 Received: from DFLE114.ent.ti.com (10.64.6.35) by DFLE115.ent.ti.com (10.64.6.36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2507.23; Thu, 1 Feb 2024 12:42:40 -0600 Received: from lelvsmtp5.itg.ti.com (10.180.75.250) by DFLE114.ent.ti.com (10.64.6.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2507.23 via Frontend Transport; Thu, 1 Feb 2024 12:42:40 -0600 Received: from udba0500997.dhcp.ti.com (udba0500997.dhcp.ti.com [128.247.81.249]) by lelvsmtp5.itg.ti.com (8.15.2/8.15.2) with ESMTP id 411IgeuD042789; Thu, 1 Feb 2024 12:42:40 -0600 From: Brandon Brnich <b-brnich@ti.com> To: Nas Chung <nas.chung@chipsnmedia.com>, Jackson Lee <jackson.lee@chipsnmedia.com>, Mauro Carvalho Chehab <mchehab@kernel.org>, Rob Herring <robh+dt@kernel.org>, Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>, Conor Dooley <conor+dt@kernel.org>, <linux-media@vger.kernel.org>, <devicetree@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Nishanth Menon <nm@ti.com>, Vignesh Raghavendra <vigneshr@ti.com> CC: Darren Etheridge <detheridge@ti.com>, Brandon Brnich <b-brnich@ti.com> Subject: [PATCH v2] dt-bindings: media: Add sram-size Property for Wave5 Date: Thu, 1 Feb 2024 12:42:38 -0600 Message-ID: <20240201184238.2542695-1-b-brnich@ti.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-EXCLAIMER-MD-CONFIG: e1e8a2fd-e40a-4ac6-ac9b-f7e9cc9ee180 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789725090089531632 X-GMAIL-MSGID: 1789725090089531632 |
Series |
[v2] dt-bindings: media: Add sram-size Property for Wave5
|
|
Commit Message
Brandon Brnich
Feb. 1, 2024, 6:42 p.m. UTC
Wave521c has capability to use SRAM carveout to store reference data with
purpose of reducing memory bandwidth. To properly use this pool, the driver
expects to have an sram and sram-size node. Without sram-size node, driver
will default value to zero, making sram node irrelevant.
Signed-off-by: Brandon Brnich <b-brnich@ti.com>
---
Documentation/devicetree/bindings/media/cnm,wave521c.yaml | 7 +++++++
1 file changed, 7 insertions(+)
Comments
On 01/02/2024 19:42, Brandon Brnich wrote: > Wave521c has capability to use SRAM carveout to store reference data with > purpose of reducing memory bandwidth. To properly use this pool, the driver > expects to have an sram and sram-size node. Without sram-size node, driver > will default value to zero, making sram node irrelevant. I am sorry, but what driver expects should not be rationale for new property. This justification suggests clearly it is not a property for DT. > > Signed-off-by: Brandon Brnich <b-brnich@ti.com> > --- This is v2, so where is the changelog? > Documentation/devicetree/bindings/media/cnm,wave521c.yaml | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/devicetree/bindings/media/cnm,wave521c.yaml b/Documentation/devicetree/bindings/media/cnm,wave521c.yaml > index 6a11c1d11fb5..ea5469eb38f9 100644 > --- a/Documentation/devicetree/bindings/media/cnm,wave521c.yaml > +++ b/Documentation/devicetree/bindings/media/cnm,wave521c.yaml > @@ -43,6 +43,12 @@ properties: > storing it on DMA memory. It is mainly used for the purpose of reducing > bandwidth. > > + sram-size: Does not look like standard property. You would need vendor prefix or is it documented anywhere already? > + $ref: /schemas/types.yaml#/definitions/uint32 > + description: > + SRAM size reserved for VPU operations. If not specified, size will default > + to zero. Lack of sram property means 0, doesn't it? > + > required: > - compatible > - reg > @@ -58,4 +64,5 @@ examples: > clocks = <&clks 42>; > interrupts = <42>; > sram = <&sram>; > + sram-size = <0x1234>; Was this patch tested? Since nothing changed from v1, I assume it also fails. > }; Best regards, Krzysztof
On 11:47-20240202, Krzysztof Kozlowski wrote: > On 01/02/2024 19:42, Brandon Brnich wrote: > > Wave521c has capability to use SRAM carveout to store reference data with > > purpose of reducing memory bandwidth. To properly use this pool, the driver > > expects to have an sram and sram-size node. Without sram-size node, driver > > will default value to zero, making sram node irrelevant. > > I am sorry, but what driver expects should not be rationale for new > property. This justification suggests clearly it is not a property for DT. > Yup, the argumentation in the commit message is from the wrong perspective. bindings are OS agnostic hardware description, and what driver does with the description is driver's problem. I will at least paraphrase my understanding: In this case, however, the hardware block will limp along with the usage of DDR (as is the current description), due to the latencies involved for DDR accesses. However, the hardware block has capability to use a substantially lower latency SRAM to provide proper performance and hence for example, deal with higher resolution data streams. This SRAM is instantiated at SoC level rather than embedded within the hardware block itself.
On 02/02/2024 13:52, Nishanth Menon wrote: > On 11:47-20240202, Krzysztof Kozlowski wrote: >> On 01/02/2024 19:42, Brandon Brnich wrote: >>> Wave521c has capability to use SRAM carveout to store reference data with >>> purpose of reducing memory bandwidth. To properly use this pool, the driver >>> expects to have an sram and sram-size node. Without sram-size node, driver >>> will default value to zero, making sram node irrelevant. >> >> I am sorry, but what driver expects should not be rationale for new >> property. This justification suggests clearly it is not a property for DT. >> > > Yup, the argumentation in the commit message is from the wrong > perspective. bindings are OS agnostic hardware description, and what > driver does with the description is driver's problem. > > I will at least paraphrase my understanding: > In this case, however, the hardware block will limp along with > the usage of DDR (as is the current description), due to the > latencies involved for DDR accesses. However, the hardware block > has capability to use a substantially lower latency SRAM to provide > proper performance and hence for example, deal with higher resolution > data streams. This SRAM is instantiated at SoC level rather than > embedded within the hardware block itself. That sounds like OS policy. Why would different boards with the same component have this set differently? Based on amount of available memory? This, I believe, is runtime configuration because it might depend on user-space you run. Based on purpose (e.g. optimize for decoding or general usage)? Again, run-time because same hardware board can be used for different purposes. Best regards, Krzysztof
On 14:56-20240202, Krzysztof Kozlowski wrote: > On 02/02/2024 13:52, Nishanth Menon wrote: > > On 11:47-20240202, Krzysztof Kozlowski wrote: > >> On 01/02/2024 19:42, Brandon Brnich wrote: > >>> Wave521c has capability to use SRAM carveout to store reference data with > >>> purpose of reducing memory bandwidth. To properly use this pool, the driver > >>> expects to have an sram and sram-size node. Without sram-size node, driver > >>> will default value to zero, making sram node irrelevant. > >> > >> I am sorry, but what driver expects should not be rationale for new > >> property. This justification suggests clearly it is not a property for DT. > >> > > > > Yup, the argumentation in the commit message is from the wrong > > perspective. bindings are OS agnostic hardware description, and what > > driver does with the description is driver's problem. > > > > I will at least paraphrase my understanding: > > In this case, however, the hardware block will limp along with > > the usage of DDR (as is the current description), due to the > > latencies involved for DDR accesses. However, the hardware block > > has capability to use a substantially lower latency SRAM to provide > > proper performance and hence for example, deal with higher resolution > > data streams. This SRAM is instantiated at SoC level rather than > > embedded within the hardware block itself. > > That sounds like OS policy. Why would different boards with the same > component have this set differently? Based on amount of available > memory? This, I believe, is runtime configuration because it might > depend on user-space you run. Based on purpose (e.g. optimize for > decoding or general usage)? Again, run-time because same hardware board > can be used for different purposes. > Why is this OS policy? It is a hardware capability. Traditionally many similar hardware blocks would have allocated local SRAM for worst case inside the hardware block itself and don't need to use DDR in the first place. However, for this hardware block, it has capability to use some part of one of the many SRAM blocks in the SoC, not be shared for some part of the system - so from a hardware description perspective, we will need to call that out as to which SRAM is available for the hardware block. Why would different boards need this differently? simply because different cameras have different resolution and framerates - and you dont want to pay the worst case sram penalty for all product configuration. Further, Linux is not the only thing that runs on these SoCs.. these are mixed systems with autonomous operations of uC cores who may or maynot (typically not) even need to communicate with MPU to state which part of resource they are hogging (hence the board level definition).
On 02/02/2024 16:58, Nishanth Menon wrote: > On 14:56-20240202, Krzysztof Kozlowski wrote: >> On 02/02/2024 13:52, Nishanth Menon wrote: >>> On 11:47-20240202, Krzysztof Kozlowski wrote: >>>> On 01/02/2024 19:42, Brandon Brnich wrote: >>>>> Wave521c has capability to use SRAM carveout to store reference data with >>>>> purpose of reducing memory bandwidth. To properly use this pool, the driver >>>>> expects to have an sram and sram-size node. Without sram-size node, driver >>>>> will default value to zero, making sram node irrelevant. >>>> >>>> I am sorry, but what driver expects should not be rationale for new >>>> property. This justification suggests clearly it is not a property for DT. >>>> >>> >>> Yup, the argumentation in the commit message is from the wrong >>> perspective. bindings are OS agnostic hardware description, and what >>> driver does with the description is driver's problem. >>> >>> I will at least paraphrase my understanding: >>> In this case, however, the hardware block will limp along with >>> the usage of DDR (as is the current description), due to the >>> latencies involved for DDR accesses. However, the hardware block >>> has capability to use a substantially lower latency SRAM to provide >>> proper performance and hence for example, deal with higher resolution >>> data streams. This SRAM is instantiated at SoC level rather than >>> embedded within the hardware block itself. >> >> That sounds like OS policy. Why would different boards with the same >> component have this set differently? Based on amount of available >> memory? This, I believe, is runtime configuration because it might >> depend on user-space you run. Based on purpose (e.g. optimize for >> decoding or general usage)? Again, run-time because same hardware board >> can be used for different purposes. >> > > Why is this OS policy? It is a hardware capability. How amount of SRAM size is hardware capability? Each hardware can work probably with 1, 2 or 100 pages. > Traditionally > many similar hardware blocks would have allocated local SRAM for > worst case inside the hardware block itself and don't need to use > DDR in the first place. However, for this hardware block, it has > capability to use some part of one of the many SRAM blocks in the SoC, > not be shared for some part of the system - so from a hardware > description perspective, we will need to call that out as to which > SRAM is available for the hardware block. Just because more than one device wants some memory, does not mean this is hardware property. Drivers can ask how much memory available there is. OS knows how many users of memory there is, so knows how much to allocate for each device. > > Why would different boards need this differently? simply because > different cameras have different resolution and framerates - and you > dont want to pay the worst case sram penalty for all product > configuration. Choice of resolution and framerate is runtime choice or use-case dependent, not board level configuration, therefore amount of SRAM size to use is as well. > > Further, Linux is not the only thing that runs on these SoCs.. these are > mixed systems with autonomous operations of uC cores who may or maynot > (typically not) even need to communicate with MPU to state which part of > resource they are hogging (hence the board level definition). OK that could be the case but I could also say choice of RTOS or any other is also board-independent. Best regards, Krzysztof
On Thu, Feb 01, 2024 at 12:42:38PM -0600, Brandon Brnich wrote: > Wave521c has capability to use SRAM carveout to store reference data with > purpose of reducing memory bandwidth. To properly use this pool, the driver > expects to have an sram and sram-size node. Without sram-size node, driver > will default value to zero, making sram node irrelevant. The mmio-sram binding already defines how to carve up shared SRAM. Also, sram-size here is property, not a node. Rob
On 17:08-20240202, Krzysztof Kozlowski wrote: > On 02/02/2024 16:58, Nishanth Menon wrote: > > On 14:56-20240202, Krzysztof Kozlowski wrote: > >> On 02/02/2024 13:52, Nishanth Menon wrote: > >>> On 11:47-20240202, Krzysztof Kozlowski wrote: > >>>> On 01/02/2024 19:42, Brandon Brnich wrote: > >>>>> Wave521c has capability to use SRAM carveout to store reference data with > >>>>> purpose of reducing memory bandwidth. To properly use this pool, the driver > >>>>> expects to have an sram and sram-size node. Without sram-size node, driver > >>>>> will default value to zero, making sram node irrelevant. > >>>> > >>>> I am sorry, but what driver expects should not be rationale for new > >>>> property. This justification suggests clearly it is not a property for DT. > >>>> > >>> > >>> Yup, the argumentation in the commit message is from the wrong > >>> perspective. bindings are OS agnostic hardware description, and what > >>> driver does with the description is driver's problem. > >>> > >>> I will at least paraphrase my understanding: > >>> In this case, however, the hardware block will limp along with > >>> the usage of DDR (as is the current description), due to the > >>> latencies involved for DDR accesses. However, the hardware block > >>> has capability to use a substantially lower latency SRAM to provide > >>> proper performance and hence for example, deal with higher resolution > >>> data streams. This SRAM is instantiated at SoC level rather than > >>> embedded within the hardware block itself. > >> > >> That sounds like OS policy. Why would different boards with the same > >> component have this set differently? Based on amount of available > >> memory? This, I believe, is runtime configuration because it might > >> depend on user-space you run. Based on purpose (e.g. optimize for > >> decoding or general usage)? Again, run-time because same hardware board > >> can be used for different purposes. > >> > > > > Why is this OS policy? It is a hardware capability. > > How amount of SRAM size is hardware capability? Each hardware can work > probably with 1, 2 or 100 pages. > > > Traditionally > > many similar hardware blocks would have allocated local SRAM for > > worst case inside the hardware block itself and don't need to use > > DDR in the first place. However, for this hardware block, it has > > capability to use some part of one of the many SRAM blocks in the SoC, > > not be shared for some part of the system - so from a hardware > > description perspective, we will need to call that out as to which > > SRAM is available for the hardware block. > > Just because more than one device wants some memory, does not mean this > is hardware property. Drivers can ask how much memory available there > is. OS knows how many users of memory there is, so knows how much to > allocate for each device. > > > > > Why would different boards need this differently? simply because > > different cameras have different resolution and framerates - and you > > dont want to pay the worst case sram penalty for all product > > configuration. > > Choice of resolution and framerate is runtime choice or use-case > dependent, not board level configuration, therefore amount of SRAM size > to use is as well. I am arguing this is similar to what we have for remote-procs. Yes, usecases usage come to a conclusion that sram size X is needed. Sure. Lets even argue that sram = <&sram> has X+100 bytes available, so as far as allocator is concerned, it can allocate. But how does the driver know that 1k of that sram is already used by a remote core or some other function? This is no different from a remote proc usecase, following which, I wonder if "memory-region" is the right way to describe this usage? That would be a very specific bucket that is made available to the driver. And I'd say sram and memory-region are two mutually exclusive option? > > > > > Further, Linux is not the only thing that runs on these SoCs.. these are > > mixed systems with autonomous operations of uC cores who may or maynot > > (typically not) even need to communicate with MPU to state which part of > > resource they are hogging (hence the board level definition). > > OK that could be the case but I could also say choice of RTOS or any > other is also board-independent.
On 2/5/24 8:12 AM, Nishanth Menon wrote: > On 17:08-20240202, Krzysztof Kozlowski wrote: >> On 02/02/2024 16:58, Nishanth Menon wrote: >>> On 14:56-20240202, Krzysztof Kozlowski wrote: >>>> On 02/02/2024 13:52, Nishanth Menon wrote: >>>>> On 11:47-20240202, Krzysztof Kozlowski wrote: >>>>>> On 01/02/2024 19:42, Brandon Brnich wrote: >>>>>>> Wave521c has capability to use SRAM carveout to store reference data with >>>>>>> purpose of reducing memory bandwidth. To properly use this pool, the driver >>>>>>> expects to have an sram and sram-size node. Without sram-size node, driver >>>>>>> will default value to zero, making sram node irrelevant. >>>>>> >>>>>> I am sorry, but what driver expects should not be rationale for new >>>>>> property. This justification suggests clearly it is not a property for DT. >>>>>> >>>>> >>>>> Yup, the argumentation in the commit message is from the wrong >>>>> perspective. bindings are OS agnostic hardware description, and what >>>>> driver does with the description is driver's problem. >>>>> >>>>> I will at least paraphrase my understanding: >>>>> In this case, however, the hardware block will limp along with >>>>> the usage of DDR (as is the current description), due to the >>>>> latencies involved for DDR accesses. However, the hardware block >>>>> has capability to use a substantially lower latency SRAM to provide >>>>> proper performance and hence for example, deal with higher resolution >>>>> data streams. This SRAM is instantiated at SoC level rather than >>>>> embedded within the hardware block itself. >>>> >>>> That sounds like OS policy. Why would different boards with the same >>>> component have this set differently? Based on amount of available >>>> memory? This, I believe, is runtime configuration because it might >>>> depend on user-space you run. Based on purpose (e.g. optimize for >>>> decoding or general usage)? Again, run-time because same hardware board >>>> can be used for different purposes. >>>> >>> >>> Why is this OS policy? It is a hardware capability. >> >> How amount of SRAM size is hardware capability? Each hardware can work >> probably with 1, 2 or 100 pages. >> >>> Traditionally >>> many similar hardware blocks would have allocated local SRAM for >>> worst case inside the hardware block itself and don't need to use >>> DDR in the first place. However, for this hardware block, it has >>> capability to use some part of one of the many SRAM blocks in the SoC, >>> not be shared for some part of the system - so from a hardware >>> description perspective, we will need to call that out as to which >>> SRAM is available for the hardware block. >> >> Just because more than one device wants some memory, does not mean this >> is hardware property. Drivers can ask how much memory available there >> is. OS knows how many users of memory there is, so knows how much to >> allocate for each device. >> >>> >>> Why would different boards need this differently? simply because >>> different cameras have different resolution and framerates - and you >>> dont want to pay the worst case sram penalty for all product >>> configuration. >> >> Choice of resolution and framerate is runtime choice or use-case >> dependent, not board level configuration, therefore amount of SRAM size >> to use is as well. > > I am arguing this is similar to what we have for remote-procs. Yes, > usecases usage come to a conclusion that sram size X is needed. Sure. > Lets even argue that sram = <&sram> has X+100 bytes available, so as > far as allocator is concerned, it can allocate. But how does the driver > know that 1k of that sram is already used by a remote core or some other > function? The driver could ask the SRAM provider all that info (gen_pool_avail()). The driver should at runtime determine the amount of SRAM it needs and only then attempt to allocate the amount it needed. (and free it after it is done too) There is no need to hardcode the amount we think our usecase will need into DT. Sure it may fail to get as much as it wants if we don't carveout some out up front, but that is true for any memory allocation. Let's not turn DT into a static memory allocator tool.. > > This is no different from a remote proc usecase, following which, I > wonder if "memory-region" is the right way to describe this usage? That > would be a very specific bucket that is made available to the driver. > And I'd say sram and memory-region are two mutually exclusive option? > Our use of "memory-region" for remote proc memory was also a questionable thing to do IMHO. Something we will have to cleanup at some point. Might be good to not call too much attention to that :) Andrew >> >>> >>> Further, Linux is not the only thing that runs on these SoCs.. these are >>> mixed systems with autonomous operations of uC cores who may or maynot >>> (typically not) even need to communicate with MPU to state which part of >>> resource they are hogging (hence the board level definition). >> >> OK that could be the case but I could also say choice of RTOS or any >> other is also board-independent.
On 08:12-20240205, Nishanth Menon wrote: > On 17:08-20240202, Krzysztof Kozlowski wrote: > > On 02/02/2024 16:58, Nishanth Menon wrote: > > > On 14:56-20240202, Krzysztof Kozlowski wrote: > > >> On 02/02/2024 13:52, Nishanth Menon wrote: > > >>> On 11:47-20240202, Krzysztof Kozlowski wrote: > > >>>> On 01/02/2024 19:42, Brandon Brnich wrote: > > >>>>> Wave521c has capability to use SRAM carveout to store reference data with > > >>>>> purpose of reducing memory bandwidth. To properly use this pool, the driver > > >>>>> expects to have an sram and sram-size node. Without sram-size node, driver > > >>>>> will default value to zero, making sram node irrelevant. > > >>>> > > >>>> I am sorry, but what driver expects should not be rationale for new > > >>>> property. This justification suggests clearly it is not a property for DT. > > >>>> > > >>> > > >>> Yup, the argumentation in the commit message is from the wrong > > >>> perspective. bindings are OS agnostic hardware description, and what > > >>> driver does with the description is driver's problem. > > >>> > > >>> I will at least paraphrase my understanding: > > >>> In this case, however, the hardware block will limp along with > > >>> the usage of DDR (as is the current description), due to the > > >>> latencies involved for DDR accesses. However, the hardware block > > >>> has capability to use a substantially lower latency SRAM to provide > > >>> proper performance and hence for example, deal with higher resolution > > >>> data streams. This SRAM is instantiated at SoC level rather than > > >>> embedded within the hardware block itself. > > >> > > >> That sounds like OS policy. Why would different boards with the same > > >> component have this set differently? Based on amount of available > > >> memory? This, I believe, is runtime configuration because it might > > >> depend on user-space you run. Based on purpose (e.g. optimize for > > >> decoding or general usage)? Again, run-time because same hardware board > > >> can be used for different purposes. > > >> > > > > > > Why is this OS policy? It is a hardware capability. > > > > How amount of SRAM size is hardware capability? Each hardware can work > > probably with 1, 2 or 100 pages. > > > > > Traditionally > > > many similar hardware blocks would have allocated local SRAM for > > > worst case inside the hardware block itself and don't need to use > > > DDR in the first place. However, for this hardware block, it has > > > capability to use some part of one of the many SRAM blocks in the SoC, > > > not be shared for some part of the system - so from a hardware > > > description perspective, we will need to call that out as to which > > > SRAM is available for the hardware block. > > > > Just because more than one device wants some memory, does not mean this > > is hardware property. Drivers can ask how much memory available there > > is. OS knows how many users of memory there is, so knows how much to > > allocate for each device. > > > > > > > > Why would different boards need this differently? simply because > > > different cameras have different resolution and framerates - and you > > > dont want to pay the worst case sram penalty for all product > > > configuration. > > > > Choice of resolution and framerate is runtime choice or use-case > > dependent, not board level configuration, therefore amount of SRAM size > > to use is as well. > > I am arguing this is similar to what we have for remote-procs. Yes, > usecases usage come to a conclusion that sram size X is needed. Sure. > Lets even argue that sram = <&sram> has X+100 bytes available, so as > far as allocator is concerned, it can allocate. But how does the driver > know that 1k of that sram is already used by a remote core or some other > function? > > This is no different from a remote proc usecase, following which, I > wonder if "memory-region" is the right way to describe this usage? That > would be a very specific bucket that is made available to the driver. > And I'd say sram and memory-region are two mutually exclusive option? Wouldn't this just be a static allocation of the SRAM then? I believe the correct way to do this is highlighted in Rob's[0] response. This is also something we have done in the past[1], but I thought dynamic allocation was preferred method so that the VPU didn't hog a portion of SRAM. Seems wasteful in cases where the VPU is not being used. The device itself has the capability of doing runtime allocation before any decoder/encoder stream instances are opened. All of these opened streams will share this one allocated pool, meaning first open stream allocates and the rest share. Because of this, the goal is to allocate enough to meet maximum use case of VPU (4K60) or max case supported by the SoC itself if the SoC is unable to handle 4K60. Is there preferred method for dynamic SRAM allocation? I understand point that framerate and resolution are runtime choice, but these properties are not guaranteed to be known when streams are being opened. If static SRAM allocation is the correct method to go, then this series can be ignored and I will add section in device tree and remove check for parameter in driver as that will now be a bug. [0]: https://patchwork.kernel.org/project/linux-media/patch/20240201184238.2542695-1-b-brnich@ti.com/#25696671 [1]: https://patchwork.kernel.org/project/linux-media/patch/20231108-wave5-v14-rebased-v14-8-0b4af1258656@collabora.com/ > > > > > > > > Further, Linux is not the only thing that runs on these SoCs.. these are > > > mixed systems with autonomous operations of uC cores who may or maynot > > > (typically not) even need to communicate with MPU to state which part of > > > resource they are hogging (hence the board level definition). > > > > OK that could be the case but I could also say choice of RTOS or any > > other is also board-independent. > -- > Regards, > Nishanth Menon > Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D Thanks, Brandon
Hi Brandon, et-al, On 06/02/24 00:50, Brandon Brnich wrote: > On 08:12-20240205, Nishanth Menon wrote: >> On 17:08-20240202, Krzysztof Kozlowski wrote: >>> On 02/02/2024 16:58, Nishanth Menon wrote: >>>> On 14:56-20240202, Krzysztof Kozlowski wrote: >>>>> On 02/02/2024 13:52, Nishanth Menon wrote: >>>>>> On 11:47-20240202, Krzysztof Kozlowski wrote: >>>>>>> On 01/02/2024 19:42, Brandon Brnich wrote: >>>>>>>> Wave521c has capability to use SRAM carveout to store reference data with >>>>>>>> purpose of reducing memory bandwidth. To properly use this pool, the driver >>>>>>>> expects to have an sram and sram-size node. Without sram-size node, driver >>>>>>>> will default value to zero, making sram node irrelevant. >>>>>>> >>>>>>> I am sorry, but what driver expects should not be rationale for new >>>>>>> property. This justification suggests clearly it is not a property for DT. >>>>>>> >>>>>> >>>>>> Yup, the argumentation in the commit message is from the wrong >>>>>> perspective. bindings are OS agnostic hardware description, and what >>>>>> driver does with the description is driver's problem. >>>>>> >>>>>> I will at least paraphrase my understanding: >>>>>> In this case, however, the hardware block will limp along with >>>>>> the usage of DDR (as is the current description), due to the >>>>>> latencies involved for DDR accesses. However, the hardware block >>>>>> has capability to use a substantially lower latency SRAM to provide >>>>>> proper performance and hence for example, deal with higher resolution >>>>>> data streams. This SRAM is instantiated at SoC level rather than >>>>>> embedded within the hardware block itself. >>>>> >>>>> That sounds like OS policy. Why would different boards with the same >>>>> component have this set differently? Based on amount of available >>>>> memory? This, I believe, is runtime configuration because it might >>>>> depend on user-space you run. Based on purpose (e.g. optimize for >>>>> decoding or general usage)? Again, run-time because same hardware board >>>>> can be used for different purposes. >>>>> >>>> >>>> Why is this OS policy? It is a hardware capability. >>> >>> How amount of SRAM size is hardware capability? Each hardware can work >>> probably with 1, 2 or 100 pages. >>> >>>> Traditionally >>>> many similar hardware blocks would have allocated local SRAM for >>>> worst case inside the hardware block itself and don't need to use >>>> DDR in the first place. However, for this hardware block, it has >>>> capability to use some part of one of the many SRAM blocks in the SoC, >>>> not be shared for some part of the system - so from a hardware >>>> description perspective, we will need to call that out as to which >>>> SRAM is available for the hardware block. >>> >>> Just because more than one device wants some memory, does not mean this >>> is hardware property. Drivers can ask how much memory available there >>> is. OS knows how many users of memory there is, so knows how much to >>> allocate for each device. >>> >>>> >>>> Why would different boards need this differently? simply because >>>> different cameras have different resolution and framerates - and you >>>> dont want to pay the worst case sram penalty for all product >>>> configuration. >>> >>> Choice of resolution and framerate is runtime choice or use-case >>> dependent, not board level configuration, therefore amount of SRAM size >>> to use is as well. >> >> I am arguing this is similar to what we have for remote-procs. Yes, >> usecases usage come to a conclusion that sram size X is needed. Sure. >> Lets even argue that sram = <&sram> has X+100 bytes available, so as >> far as allocator is concerned, it can allocate. But how does the driver >> know that 1k of that sram is already used by a remote core or some other >> function? >> >> This is no different from a remote proc usecase, following which, I >> wonder if "memory-region" is the right way to describe this usage? That >> would be a very specific bucket that is made available to the driver. >> And I'd say sram and memory-region are two mutually exclusive option? > > Wouldn't this just be a static allocation of the SRAM then? I believe > the correct way to do this is highlighted in Rob's[0] response. This is > also something we have done in the past[1], but I thought dynamic > allocation was preferred method so that the VPU didn't hog a portion of > SRAM. Seems wasteful in cases where the VPU is not being used. > I think even with the approach selected in [1] i.e. referring the mmio-sram node using DT property, you can still use dynamic SRAM allocation. The driver can still allocate from global sram pool dynamically using of_gen_pool API as being explained here [3] i.e allocate when first instance is opened and free up later when no instances are running. But I agree with Nishanth's point too that we may not want to give all of SRAM to VPU. For e.g. on AM62A we have 64KiB SRAM and a 1080p use-case requires 48KiB and even higher for 4K so if there is another peripheral who is referring this sram node, then it may not get enough as VPU will hog the major chunk (or all) of it while it is running and this is where an optional property like sram-size will help to cap the max sram usage for VPU and so this helps especially on platforms with limited SRAM availability. As I understand, the sram size allocation is dependent on resolution and once programmed can't be changed until all instances of VPU are done, and we can't predict how many instances user will launch and with what resolutions. So here's the flow we had thought of some time back : 1) Define worst case sram size (per 4K use-case as I believe that's the max for CnM wave521c) as a macro in driver Then the condition for determining sram size to be allocated should be as below : 2) When first instance of VPU is opened, allocate as per sram-size if sram-size property is specified. 3) If sram-size is not specified then : -> Allocate as per worst case size macro defined in driver from sram pool, -> If worst case size of SRAM > max SRAM size, then allocate max SRAM size 4). When all of the instances of VPU are closed, then free up all allocated SRAM. [3] : https://wiki.analog.com/resources/tools-software/linuxdsp/docs/linux-kernel-and-drivers/sram Regards Devarsh > The device itself has the capability of doing runtime allocation before > any decoder/encoder stream instances are opened. All of these opened > streams will share this one allocated pool, meaning first open stream > allocates and the rest share. Because of this, the goal is to allocate > enough to meet maximum use case of VPU (4K60) or max case supported by > the SoC itself if the SoC is unable to handle 4K60. > > Is there preferred method for dynamic SRAM allocation? I understand > point that framerate and resolution are runtime choice, but these > properties are not guaranteed to be known when streams are being opened. > > If static SRAM allocation is the correct method to go, then this series > can be ignored and I will add section in device tree and remove check > for parameter in driver as that will now be a bug. > > [0]: https://patchwork.kernel.org/project/linux-media/patch/20240201184238.2542695-1-b-brnich@ti.com/#25696671 > [1]: https://patchwork.kernel.org/project/linux-media/patch/20231108-wave5-v14-rebased-v14-8-0b4af1258656@collabora.com/ > >>> >>>> >>>> Further, Linux is not the only thing that runs on these SoCs.. these are >>>> mixed systems with autonomous operations of uC cores who may or maynot >>>> (typically not) even need to communicate with MPU to state which part of >>>> resource they are hogging (hence the board level definition). >>> >>> OK that could be the case but I could also say choice of RTOS or any >>> other is also board-independent. >> -- >> Regards, >> Nishanth Menon >> Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D > > Thanks, > Brandon >
Le vendredi 02 février 2024 à 06:52 -0600, Nishanth Menon a écrit : > On 11:47-20240202, Krzysztof Kozlowski wrote: > > On 01/02/2024 19:42, Brandon Brnich wrote: > > > Wave521c has capability to use SRAM carveout to store reference data with > > > purpose of reducing memory bandwidth. To properly use this pool, the driver > > > expects to have an sram and sram-size node. Without sram-size node, driver > > > will default value to zero, making sram node irrelevant. > > > > I am sorry, but what driver expects should not be rationale for new > > property. This justification suggests clearly it is not a property for DT. > > > > Yup, the argumentation in the commit message is from the wrong > perspective. bindings are OS agnostic hardware description, and what > driver does with the description is driver's problem. > > I will at least paraphrase my understanding: > In this case, however, the hardware block will limp along with > the usage of DDR (as is the current description), due to the > latencies involved for DDR accesses. However, the hardware block > has capability to use a substantially lower latency SRAM to provide > proper performance and hence for example, deal with higher resolution nit: I'd suggest to refer to "higher bandwidth" to generalize the combination formed by resolution and frame rate. The resolution is always fixed with this IP, regardless if its fast enough of not. It may not match the stream creator intended frame rate though, which such optimization may fix. > data streams. This SRAM is instantiated at SoC level rather than > embedded within the hardware block itself. > >
Le lundi 05 février 2024 à 13:20 -0600, Brandon Brnich a écrit : > On 08:12-20240205, Nishanth Menon wrote: > > On 17:08-20240202, Krzysztof Kozlowski wrote: > > > On 02/02/2024 16:58, Nishanth Menon wrote: > > > > On 14:56-20240202, Krzysztof Kozlowski wrote: > > > > > On 02/02/2024 13:52, Nishanth Menon wrote: > > > > > > On 11:47-20240202, Krzysztof Kozlowski wrote: > > > > > > > On 01/02/2024 19:42, Brandon Brnich wrote: > > > > > > > > Wave521c has capability to use SRAM carveout to store reference data with > > > > > > > > purpose of reducing memory bandwidth. To properly use this pool, the driver > > > > > > > > expects to have an sram and sram-size node. Without sram-size node, driver > > > > > > > > will default value to zero, making sram node irrelevant. > > > > > > > > > > > > > > I am sorry, but what driver expects should not be rationale for new > > > > > > > property. This justification suggests clearly it is not a property for DT. > > > > > > > > > > > > > > > > > > > Yup, the argumentation in the commit message is from the wrong > > > > > > perspective. bindings are OS agnostic hardware description, and what > > > > > > driver does with the description is driver's problem. > > > > > > > > > > > > I will at least paraphrase my understanding: > > > > > > In this case, however, the hardware block will limp along with > > > > > > the usage of DDR (as is the current description), due to the > > > > > > latencies involved for DDR accesses. However, the hardware block > > > > > > has capability to use a substantially lower latency SRAM to provide > > > > > > proper performance and hence for example, deal with higher resolution > > > > > > data streams. This SRAM is instantiated at SoC level rather than > > > > > > embedded within the hardware block itself. > > > > > > > > > > That sounds like OS policy. Why would different boards with the same > > > > > component have this set differently? Based on amount of available > > > > > memory? This, I believe, is runtime configuration because it might > > > > > depend on user-space you run. Based on purpose (e.g. optimize for > > > > > decoding or general usage)? Again, run-time because same hardware board > > > > > can be used for different purposes. > > > > > > > > > > > > > Why is this OS policy? It is a hardware capability. > > > > > > How amount of SRAM size is hardware capability? Each hardware can work > > > probably with 1, 2 or 100 pages. > > > > > > > Traditionally > > > > many similar hardware blocks would have allocated local SRAM for > > > > worst case inside the hardware block itself and don't need to use > > > > DDR in the first place. However, for this hardware block, it has > > > > capability to use some part of one of the many SRAM blocks in the SoC, > > > > not be shared for some part of the system - so from a hardware > > > > description perspective, we will need to call that out as to which > > > > SRAM is available for the hardware block. > > > > > > Just because more than one device wants some memory, does not mean this > > > is hardware property. Drivers can ask how much memory available there > > > is. OS knows how many users of memory there is, so knows how much to > > > allocate for each device. > > > > > > > > > > > Why would different boards need this differently? simply because > > > > different cameras have different resolution and framerates - and you > > > > dont want to pay the worst case sram penalty for all product > > > > configuration. > > > > > > Choice of resolution and framerate is runtime choice or use-case > > > dependent, not board level configuration, therefore amount of SRAM size > > > to use is as well. > > > > I am arguing this is similar to what we have for remote-procs. Yes, > > usecases usage come to a conclusion that sram size X is needed. Sure. > > Lets even argue that sram = <&sram> has X+100 bytes available, so as > > far as allocator is concerned, it can allocate. But how does the driver > > know that 1k of that sram is already used by a remote core or some other > > function? > > > > This is no different from a remote proc usecase, following which, I > > wonder if "memory-region" is the right way to describe this usage? That > > would be a very specific bucket that is made available to the driver. > > And I'd say sram and memory-region are two mutually exclusive option? > > Wouldn't this just be a static allocation of the SRAM then? I believe > the correct way to do this is highlighted in Rob's[0] response. This is > also something we have done in the past[1], but I thought dynamic > allocation was preferred method so that the VPU didn't hog a portion of > SRAM. Seems wasteful in cases where the VPU is not being used. > > The device itself has the capability of doing runtime allocation before > any decoder/encoder stream instances are opened. All of these opened > streams will share this one allocated pool, meaning first open stream > allocates and the rest share. Because of this, the goal is to allocate > enough to meet maximum use case of VPU (4K60) or max case supported by > the SoC itself if the SoC is unable to handle 4K60. > > Is there preferred method for dynamic SRAM allocation? I understand > point that framerate and resolution are runtime choice, but these > properties are not guaranteed to be known when streams are being opened. Indeed, the driver won't know until the decoding process, the processing of the stream headers, will have started. At this point, it is too late to introduce usage of SRAM. A "static" probe time allocation base on the selected board capabilities seems like a fair direction, you may combine with driver parameter perhaps to allow per use case customization. At least, you don't wastefully allocate sram even when there is no driver on your OS or if the driver is never loaded. > > If static SRAM allocation is the correct method to go, then this series > can be ignored and I will add section in device tree and remove check > for parameter in driver as that will now be a bug. > > [0]: https://patchwork.kernel.org/project/linux-media/patch/20240201184238.2542695-1-b-brnich@ti.com/#25696671 > [1]: https://patchwork.kernel.org/project/linux-media/patch/20231108-wave5-v14-rebased-v14-8-0b4af1258656@collabora.com/ Ack. > > > > > > > > > > > > Further, Linux is not the only thing that runs on these SoCs.. these are > > > > mixed systems with autonomous operations of uC cores who may or maynot > > > > (typically not) even need to communicate with MPU to state which part of > > > > resource they are hogging (hence the board level definition). > > > > > > OK that could be the case but I could also say choice of RTOS or any > > > other is also board-independent. > > -- > > Regards, > > Nishanth Menon > > Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D > > Thanks, > Brandon >
Le jeudi 08 février 2024 à 11:52 +0530, Devarsh Thakkar a écrit : > I think even with the approach selected in [1] i.e. referring the > mmio-sram node using DT property, you can still use dynamic SRAM > allocation. > The driver can still allocate from global sram pool dynamically using > of_gen_pool API as being explained here [3] i.e allocate when first > instance is opened and free up later when no instances are running. > > But I agree with Nishanth's point too that we may not want to give all > of SRAM to VPU. For e.g. on AM62A we have 64KiB SRAM and a 1080p > use-case requires 48KiB and even higher for 4K so if there is another > peripheral who is referring this sram node, then it may not get enough > as VPU will hog the major chunk (or all) of it while it is running and > this is where an optional property like sram-size will help to cap the > max sram usage for VPU and so this helps especially on platforms with > limited SRAM availability. > > As I understand, the sram size allocation is dependent on resolution and > once programmed can't be changed until all instances of VPU are done, > and we can't predict how many instances user will launch and with what > resolutions. > > So here's the flow we had thought of some time back : > 1) Define worst case sram size (per 4K use-case as I believe that's the > max for CnM wave521c) as a macro in driver > > Then the condition for determining sram size to be allocated should be > as below : > > 2) When first instance of VPU is opened, allocate as per sram-size if > sram-size property is specified. > > 3) If sram-size is not specified then : > -> Allocate as per worst case size macro defined in driver from sram > pool, > -> If worst case size of SRAM > max SRAM size, then allocate > max SRAM size > > 4). When all of the instances of VPU are closed, then free up all > allocated SRAM. > > [3] : > https://wiki.analog.com/resources/tools-software/linuxdsp/docs/linux-kernel-and-drivers/sram Only issue here is that DT is not a use case configuration file. That DT parameter is meant for HW that simply cannot be operated without it. This is also edgy, because it also means that it should only be used if that information is not static and vary unpredictably per SoC, which seems generally unlikely. The Wave5 IP *can* work without it, so it should resort to something more dynamic. User configuration should be sorted out at the OS level. Nicolas
Hi Nicolas, On 09/02/24 23:12, Nicolas Dufresne wrote: > Le jeudi 08 février 2024 à 11:52 +0530, Devarsh Thakkar a écrit : >> I think even with the approach selected in [1] i.e. referring the >> mmio-sram node using DT property, you can still use dynamic SRAM >> allocation. >> The driver can still allocate from global sram pool dynamically using >> of_gen_pool API as being explained here [3] i.e allocate when first >> instance is opened and free up later when no instances are running. >> >> But I agree with Nishanth's point too that we may not want to give all >> of SRAM to VPU. For e.g. on AM62A we have 64KiB SRAM and a 1080p >> use-case requires 48KiB and even higher for 4K so if there is another >> peripheral who is referring this sram node, then it may not get enough >> as VPU will hog the major chunk (or all) of it while it is running and >> this is where an optional property like sram-size will help to cap the >> max sram usage for VPU and so this helps especially on platforms with >> limited SRAM availability. >> >> As I understand, the sram size allocation is dependent on resolution and >> once programmed can't be changed until all instances of VPU are done, >> and we can't predict how many instances user will launch and with what >> resolutions. >> >> So here's the flow we had thought of some time back : >> 1) Define worst case sram size (per 4K use-case as I believe that's the >> max for CnM wave521c) as a macro in driver >> >> Then the condition for determining sram size to be allocated should be >> as below : >> >> 2) When first instance of VPU is opened, allocate as per sram-size if >> sram-size property is specified. >> >> 3) If sram-size is not specified then : >> -> Allocate as per worst case size macro defined in driver from sram >> pool, >> -> If worst case size of SRAM > max SRAM size, then allocate >> max SRAM size >> >> 4). When all of the instances of VPU are closed, then free up all >> allocated SRAM. >> >> [3] : >> https://wiki.analog.com/resources/tools-software/linuxdsp/docs/linux-kernel-and-drivers/sram > > Only issue here is that DT is not a use case configuration file. That DT > parameter is meant for HW that simply cannot be operated without it. This is > also edgy, because it also means that it should only be used if that information > is not static and vary unpredictably per SoC, which seems generally unlikely. > > The Wave5 IP *can* work without it, so it should resort to something more > dynamic. User configuration should be sorted out at the OS level. > > Nicolas Thanks, yeah that makes sense. The sram allocation size is dependent on resolution so I think we should employ a fallback model for sram allocation as described below : 1) Driver code should enumerate required sram sizes for each of the standard resolutions (for e.g. #define 1080P_SRAM_SIZE 48128, #define 4K_SRAM_SIZE 66368 and so on...) and try to allocate per highest enumerated resolution first) from sram pool using gen_pool_alloc when first instance of VPU is opened. 2) If the allocation fails, then driver should try to fallback to lower resolution sram allocation size (viz. 1080P_SRAM_SIZE) for sram allocation. 3) When all of the VPU instances get closed, then free up all allocated SRAM back to the pool so that other peripherals can use it. Regards Devarsh
diff --git a/Documentation/devicetree/bindings/media/cnm,wave521c.yaml b/Documentation/devicetree/bindings/media/cnm,wave521c.yaml index 6a11c1d11fb5..ea5469eb38f9 100644 --- a/Documentation/devicetree/bindings/media/cnm,wave521c.yaml +++ b/Documentation/devicetree/bindings/media/cnm,wave521c.yaml @@ -43,6 +43,12 @@ properties: storing it on DMA memory. It is mainly used for the purpose of reducing bandwidth. + sram-size: + $ref: /schemas/types.yaml#/definitions/uint32 + description: + SRAM size reserved for VPU operations. If not specified, size will default + to zero. + required: - compatible - reg @@ -58,4 +64,5 @@ examples: clocks = <&clks 42>; interrupts = <42>; sram = <&sram>; + sram-size = <0x1234>; };