From patchwork Wed Oct 12 19:23:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Lapshin X-Patchwork-Id: 1967 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2768141wrs; Wed, 12 Oct 2022 12:24:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4u9gkgBXTqnBqac6IHMJXBAdAXl2j9aQuu8trld2CjFieQBnxOKdu86Pri4KkHhJxQn5Fk X-Received: by 2002:a05:6402:3496:b0:459:9cb5:78e5 with SMTP id v22-20020a056402349600b004599cb578e5mr29175683edc.236.1665602680642; Wed, 12 Oct 2022 12:24:40 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ch16-20020a0564021bd000b00458ee128628si558464edb.470.2022.10.12.12.24.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Oct 2022 12:24:40 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=bXMizOOh; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6F9983853559 for ; Wed, 12 Oct 2022 19:24:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6F9983853559 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665602679; bh=9FQls+ojumi4nlSVzTSi7taj5YhvS/v6QgqVKC3NiBU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=bXMizOOhOXSZ2kZh/9bMRyRiGFDCHbSkRwokq5+UZwtEjq5qp9LGtv29Nit3R5WOZ h5TT6ttlXBA1pqJhJKwVaP9nfvqkDRxL7tB6OwN+W6yzPksZEgOLMj1F5RKIAdZknl he3ZcIEV/h0V0FG3uwaIUPOCXsxh8G45aR6oYBxM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from APC01-TYZ-obe.outbound.protection.outlook.com (mail-tyzapc01on2093.outbound.protection.outlook.com [40.107.117.93]) by sourceware.org (Postfix) with ESMTPS id 53BD73858290 for ; Wed, 12 Oct 2022 19:23:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 53BD73858290 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gXbwByuODmZjH7pL0c+zMfMpggXQFDkDlodbxVlc4/hSYs7ctM96JFdwvUsXkIelMSFBTtukaF+twvyw4LBGm4ZzdeUdWOUhgSGC+WllwjkML7q/C0gUymBUPJwXmN0LnDp8gLn8peX+efvGVAtSkWXACNGWM9KO/bDQS4PEBUiEGh1xjNa7qa/z8R0Xuqxd9yRkJBNkocNsl2PJFPLqZHFdE17jAqx6otlNmghoM03qKMv+zwltVC525ST09fKvR8WjVJo8xOth/njnKKLLek6/2nwBwM5lpkw3/+WJ97u4Vvlv6iev/aQqicIk+NfUPyf3LEijVFogpgF3Sxqp/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9FQls+ojumi4nlSVzTSi7taj5YhvS/v6QgqVKC3NiBU=; b=dlIc+Xu3a7Edn66foLZHJD/uSmS4c4CgcDEPT8I/oUcKm8Ivb6FJvroSiHwvvoWJdeGZfOxZ22rLNlrH535rWbC1kqHJm6VOIJHPQDGGy8DLgAt0Mk3i3C6AI9oMaCvNLVcZQbUK+7VZQy9bmRHhIIwMeg+eaHPmYBaBZdv6352wAKIjF9Fceus1/MOkH7e0bDenwN5+WMoRCrj7CTrv+daBrX7wYd54SMvt7LISaWjCj6FpKRSIrfxcyTxakVBKr1QN/gr/nMOjn+u16WTHC/kNk+wZw9RL6YDNlr5fHG0ClrJkOJhbAuYgRg6h28WrDfMLTHpmbpIIB0u1C16qog== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=espressif.com; dmarc=pass action=none header.from=espressif.com; dkim=pass header.d=espressif.com; arc=none Received: from TYZPR04MB5736.apcprd04.prod.outlook.com (2603:1096:400:1fa::7) by SI2PR04MB4170.apcprd04.prod.outlook.com (2603:1096:4:f8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct 2022 19:23:47 +0000 Received: from TYZPR04MB5736.apcprd04.prod.outlook.com ([fe80::e64d:5c85:a83c:5898]) by TYZPR04MB5736.apcprd04.prod.outlook.com ([fe80::e64d:5c85:a83c:5898%4]) with mapi id 15.20.5709.015; Wed, 12 Oct 2022 19:23:47 +0000 To: "gcc-patches@gcc.gnu.org" Subject: [PATCH] xtensa: Add workaround for pSRAM cache issue in ESP32 Thread-Topic: [PATCH] xtensa: Add workaround for pSRAM cache issue in ESP32 Thread-Index: AQHY3nAmwijFfLxPlEqUKtUd613CwQ== Date: Wed, 12 Oct 2022 19:23:46 +0000 Message-ID: <1d246717a8e33db0760aaa4d5ce614489b4dab80.camel@espressif.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-traffictypediagnostic: TYZPR04MB5736:EE_|SI2PR04MB4170:EE_ x-ms-office365-filtering-correlation-id: 7560c21b-5bb8-43dc-484e-08daac8748e1 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: bttcJGvNamqP+XBebjXQthKWltiFCnLf/8C35fEzDsiYiY6JIBqkMntl87SpAJllpWvMLJlXzTCo3X/gC7AHVctQImnSMPfsddS3XDq1Q+Z4iRVb6FXvwGtTamjKrCMXCWAQTOObeI3R77ckEmfo4ltscwNDeCQJFD4tvfyAxQINx3uv/scoc19hAZFZ2q++0iLy4vr/2an5HstY8OHUqLbyR/B0EM6ZIISLhDCvXAQdOATM5Db0ABjdGv0hWPSTLSX0biYdulPSeUo2yEWgUzowNVetnrCJeLNGAixIG2JhJA38I9UmCbQWkcgfhorbCa3nua0xXBWSgM4Sw9FM/qeFNAx/IFDpTDnDKIXssD3Jd+WbxTkAelg6EyDVZwetVuoNUFTGhQ/Rfk9IvYzFEhwbEvRAP9EjTEKIZetpe1pN8P4NDu1jkibW2T7++hp+sye56kH562FmcXgHJKqgE8ESDa6foHr/s47xI2qANdh67D9WE+RUa52z9ob3pICjpI2Pk/6jvLXCP4fs4LxqB8+4d9xQHNITPjGJ+gJekyPPrT9lMAp6sXcMWzAJjd8RH/kaYI7w24GyJOngHfkLUnUOGlSJQ+4HJaw5OooFyd86OgXlKLyQw+1bPaMM5yD8NC1mGAZSB8ukBcKMmeKWqx6hDUoLLMuOIrXP2nl1pZLH8BakLhHfJ6xKaE4nkdPwv6N5FIlEUrk5AvDPXz8LP05QmSrUgR3VXXdIQV7L7FeOR4p6g0H+ErPoH25etmO3Wku4H8lJjJreK6KhYieZz+5RjIfT7kvtMfGS30gevVEOE25rqkRXzI63OnIIXpgc x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYZPR04MB5736.apcprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(396003)(366004)(346002)(136003)(39850400004)(376002)(451199015)(36756003)(478600001)(38100700002)(2906002)(5660300002)(8936002)(44832011)(41300700001)(30864003)(64756008)(66476007)(76116006)(91956017)(8676002)(66556008)(4326008)(66446008)(66946007)(122000001)(6916009)(54906003)(6512007)(316002)(83380400001)(71200400001)(26005)(186003)(38070700005)(2616005)(107886003)(6486002)(86362001)(53546011)(6506007)(2004002)(559001)(579004); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?utf-8?q?2krPLHb+ErPBiBLfOFuCC9VxrLTg?= =?utf-8?q?DOYSirTFAj4YWQqd3E/mOk5NQBSGAc8r9W9HuSDJgb9QG1+8k9rdWGyiSzWJA5GSf?= =?utf-8?q?LrK22Ss6dOt2xYDudC79NIMODBrp46WTFTup3hYTHH21rZPxk6easRhsQQmGMTQrS?= =?utf-8?q?dDpnq21vst9xDE7dckS/uAj3Xfh3zCI3P3k7JqrbFnvyz1vlbAdcL9AIWhp+zu+lZ?= =?utf-8?q?S1v48ad8KpVeKJ3XAIiXzv74vUgEHdY9Hp/2w/4OxfARiHb5FEd/Dxm9Sx7TKFNQA?= =?utf-8?q?YGqSuim56FbC39jHHebUGq1dY94hxsAr/2y8ieYUfrofY2yiVBb4BF5w2h1zwnlCA?= =?utf-8?q?E0BEnP848avVswKKpyYjlHU/s4kpY9xWKtobQ6JfHuB+AhtXYVIWaap3Psr+VmMb7?= =?utf-8?q?NjV8mMJ4TS6VIJ8OkhlmMeiCSZK2jkBR6u71U9Fx8ejDpbCwGV5F0wxRrfKm/u4TI?= =?utf-8?q?ajYfmlCdgCrg3eG28+fOfpDWBLb1CfNYZ78dgCEh4wNfdJ9I0QXH670hzM2k18XY9?= =?utf-8?q?4IRTO6EYJWvqSWRodOJrcM8hWUIvShaePNI0mMUNs5dAilkIo6oBr+5o7PEIBoyii?= =?utf-8?q?zT3NV/hD2unq7bacBFkqqS3ULdksrKDl1xEfhURyoutkTf8GkL5VQysrYViRd/7Sl?= =?utf-8?q?Ve2URH72NjppuRZQLpxlNqn+sn/REygvh7plUGtOW6RPoR/A8aWxClbmGO9gdks8C?= =?utf-8?q?rch+mLoHWYZaJOpVUG1QmGDzvmhjyabv+IsPcFKmIWjZRfMYjFBd6E5I0+PayvSw/?= =?utf-8?q?acoFBuPbKEdU4k7ZsPslSBWeqvTh11SYI7mAX0dXMPaJT4F7gV3KpYBWNOYiKEIl6?= =?utf-8?q?foUUiA6cvE/64wEUsxV/cWo/P8/oqVdUGLcdiMAlARkieP0ownFtdf3LxLhl5VAf7?= =?utf-8?q?TTymnax7js1CRZWl0Np5ZLo3dMnlU/vTve3ugowNCl5zQkJ5qu5H2si/J8p6CfGS6?= =?utf-8?q?UNkB9qXh53kOmzIXG2XrYloiqYTcFqvUuttFLGxOPeWHpIaXMhAy8KWQFwA/ppuNl?= =?utf-8?q?bfCxszjPTDUDwypzPgfz1uXNftaNx9YVFQ1n9Vqe8gUYgRMSvAMh7vAJpV2sbSrrq?= =?utf-8?q?3f9Gy+CzntMirOAuaKBMA1Pd5Dis+DG6Zu3+POygmUQy9jV+PH+9JL0frR492muAm?= =?utf-8?q?BEPJcUvAXzOklprRuBNg4KaE/lUsCUQzzDsIGv77BPwt+szNEskKnvjIhtB44GLcx?= =?utf-8?q?jK1w6PiuvY/im9Yt7kw9aRas3W10ZmqK1cBIHP59T4SQVMkKXzvp5h6uv1QJgvIoV?= =?utf-8?q?CtB0x2lWgV2+zJsqysRwRZ1FJRB9fqEiHLJI4GhKCvynMpmE0V9HV5CFAN5/hoHN+?= =?utf-8?q?Tw2WcSnUFcDaYd6Z0nVwyV/mcV2Ym59IbYE+d2aHmOfnwUYOBdY7buupHyuJ0FrSX?= =?utf-8?q?4BYUoKT6dE1i4t/MG9fscCYFkyqfik5D+M7076nvAnLxYb+CtZKQKf5O1pNtHiowB?= =?utf-8?q?9JLxTr/iVec1WnH3w+u9JtvtSOPJK16LobJ31J8oqbYWM5zhaXzBZ756xc7Yint/o?= =?utf-8?q?dSqR74n/k3cTkRkZzMXujGcRTSBirkjSsMHc8QJ4ssr0tmu2dy9bAxI=3D?= Content-ID: <927362724E6D7C40977CAF1D632F1BDC@apcprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: espressif.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYZPR04MB5736.apcprd04.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7560c21b-5bb8-43dc-484e-08daac8748e1 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Oct 2022 19:23:46.9476 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 5faf27fd-3557-4294-9545-8ea74a409f39 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: F672z2x2gNwAces4Vp9+x6+5JpE13OC2QWSAnNwSKilKzQnPkqwngIPdZug4rvFVK7TKWg+ZXn53RQbYG5/25K9y2kdCd45wujQN7u4P4Zs= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SI2PR04MB4170 X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Alexey Lapshin via Gcc-patches From: Alexey Lapshin Reply-To: Alexey Lapshin Cc: Ivan Grokhotkov , Alexey Gerenkov , Anton Maklakov Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746510996411604476?= X-GMAIL-MSGID: =?utf-8?q?1746510996411604476?= From a2b425031f5b06dd51cd3ca34fe4f3620b93a944 Mon Sep 17 00:00:00 2001 From: Jeroen Domburg Date: Sat, 12 Aug 2017 23:10:12 +0800 Subject: [PATCH] xtensa: Add workaround for pSRAM cache issue in ESP32 Xtensa does a load/store inversion when a load and a store to the same address is found in the 5 affected stages of the pipeline: with a load done _after_ the store in code, the Xtensa will move it _before_ the store in execution. Unfortunately, the ESP32 pSRAM cache messes up handling these when an interrupt happens during these. This reorg step inserts NOPs between loads and stores so this never occurs. Workarounds: ESP32_PSRAM_FIX_NOPS: The handling issue also shows up when doing a store to an 8 or 16- bit memory location followed by a larger (16 or 32-bit) sized load from that location within the time it takes to grab a cache line from external RAM (which is at least 80 cycles). The cache will confuse the load and store, resulting in the bytes not set by the store to be read as garbage. To fix this, we insert a memory barrier with NOP instructions after each 8/16-bit store that isn't followed by another store. ESP32_PSRAM_FIX_MEMW (default): Explicitly insert a memory barrier instead of nops. Slower than nops, but faster than just adding memws everywhere. ESP32_PSRAM_FIX_DUPLDST: Explicitly insert a load after every store: - Instruction is s32i: Insert l32i from that address to the source register immediately after, plus a duplicated s32i after that. - Instruction is s8i/s16i: Note and insert a memw before a load. (The same as ESP32_PSRAM_FIX_MEMW) - If any of the args are volatile, no touch: The memw resulting from that will fix everything. --- gcc/config.gcc | 5 + gcc/config/xtensa/t-esp32-psram-fix | 22 ++ gcc/config/xtensa/xtensa-opts.h | 34 +++ gcc/config/xtensa/xtensa.cc | 444 ++++++++++++++++++++++++++++ gcc/config/xtensa/xtensa.h | 1 + gcc/config/xtensa/xtensa.md | 46 ++- gcc/config/xtensa/xtensa.opt | 31 ++ 7 files changed, 580 insertions(+), 3 deletions(-) create mode 100644 gcc/config/xtensa/t-esp32-psram-fix create mode 100644 gcc/config/xtensa/xtensa-opts.h places. Default workaround. + +EnumValue +Enum(esp32_psram_fix_type) String(nops) Value(ESP32_PSRAM_FIX_NOPS) +Fix esp32 psram cache issue by inserting NOPs in critical places. -- 2.34.1 diff --git a/gcc/config.gcc b/gcc/config.gcc index e73cb848c2d..a407e8407f0 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3457,6 +3457,11 @@ xstormy16-*-elf) extra_options=stormy16/stormy16.opt tmake_file="stormy16/t-stormy16" ;; +xtensa*-esp32-elf*) + tm_file="${tm_file} elfos.h newlib-stdint.h xtensa/elf.h" + tmake_file="${tmake_file} xtensa/t-esp32-psram-fix" + extra_options="${extra_options} xtensa/elf.opt" + ;; xtensa*-*-elf*) tm_file="${tm_file} elfos.h newlib-stdint.h xtensa/elf.h" extra_options="${extra_options} xtensa/elf.opt" diff --git a/gcc/config/xtensa/t-esp32-psram-fix b/gcc/config/xtensa/t- esp32-psram-fix new file mode 100644 index 00000000000..78fe54d4852 --- /dev/null +++ b/gcc/config/xtensa/t-esp32-psram-fix @@ -0,0 +1,22 @@ +# Copyright (C) 2022 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +$(out_object_file): gt-xtensa.h + +MULTILIB_OPTIONS = mfix-esp32-psram-cache-issue +MULTILIB_DIRNAMES = esp32-psram diff --git a/gcc/config/xtensa/xtensa-opts.h b/gcc/config/xtensa/xtensa-opts.h new file mode 100644 index 00000000000..73c2015a016 --- /dev/null +++ b/gcc/config/xtensa/xtensa-opts.h @@ -0,0 +1,34 @@ +/* Definitions of option handling for Tensilica's Xtensa target machine. + Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Espressif + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + + + +#ifndef XTENSA_OPTS_H +#define XTENSA_OPTS_H + +enum esp32_psram_fix_type +{ + ESP32_PSRAM_FIX_DUPLDST, + ESP32_PSRAM_FIX_MEMW, + ESP32_PSRAM_FIX_NOPS +}; + + +#endif /* XTENSA_OPTS_H */ diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index 828c7642b7c..61ef14b1c57 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -55,6 +55,8 @@ along with GCC; see the file COPYING3. If not see #include "dumpfile.h" #include "hw-doloop.h" #include "rtl-iter.h" +#include "tree-pass.h" +#include "context.h" #include "insn-attr.h" /* This file should be included last. */ @@ -2636,6 +2638,435 @@ xtensa_return_in_msb (const_tree valtype) } +#define USEFUL_INSN_P(INSN) \ + (NONDEBUG_INSN_P (INSN) && GET_CODE (PATTERN (INSN)) != USE \ + && GET_CODE (PATTERN (INSN)) != CLOBBER) + +/* If INSN is a delayed branch sequence, return the first instruction + in the sequence, otherwise return INSN itself. */ +#define SEQ_BEGIN(INSN) \ + (INSN_P (INSN) && GET_CODE (PATTERN (INSN)) == SEQUENCE \ + ? as_a (XVECEXP (PATTERN (INSN), 0, 0)) \ + : (INSN)) + +/* Likewise for the last instruction in a delayed branch sequence. */ +#define SEQ_END(INSN) \ + (INSN_P (INSN) && GET_CODE (PATTERN (INSN)) == SEQUENCE ? as_a \ + (XVECEXP (PATTERN (INSN), 0, XVECLEN (PATTERN (INSN), 0) - 1)) : (INSN)) + + +/* Execute the following loop body with SUBINSN set to each instruction + between SEQ_BEGIN (INSN) and SEQ_END (INSN) inclusive. */ +#define FOR_EACH_SUBINSN(SUBINSN, INSN) \ + for ((SUBINSN) = SEQ_BEGIN (INSN); (SUBINSN) != NEXT_INSN (SEQ_END (INSN)); \ + (SUBINSN) = NEXT_INSN (SUBINSN)) + + +/* Xtensa does a load/store inversion when a load and a store to the same + address is found in the 5 affected stages of the pipeline: with a load done + _after_ the store in code, the Xtensa will move it _before_ the store in + execution. + Unfortunately, the ESP32 PSRAM cache messes up handling these + when an interrupt happens during these. This reorg step inserts NOPs + between loads and stores so this never occurs. + + The handling issue also shows up when doing a store to an 8 or 16- bit + memory location followed by a larger (16 or 32-bit) sized load from that + location within the time it takes to grab a cacheline from external RAM + (which is at least 80 cycles). The cache will confuse the load and store, + resulting in the bytes not set by the store to be read as garbage. To fix + this, we insert amemory barrier after each 8/16-bit store that isn't + followed by another store. */ + +/* Affected piece of pipeline is 5 entries long; + The load/store itself fills one. */ +#define LOAD_STORE_OFF 4 + +static int insns_since_store = 0; +static rtx_insn *store_insn = NULL; +static rtx_insn *last_hiqi_store = NULL; + +static void +handle_fix_reorg_insn (rtx_insn *insn) +{ + enum attr_type attr_type = get_attr_type (insn); + if (attr_type == TYPE_STORE || attr_type == TYPE_FSTORE) + { + rtx x = XEXP (PATTERN (insn), 0); + /* Store */ + insns_since_store = 0; + store_insn = insn; + if (attr_type == TYPE_STORE + && (GET_MODE (x) == HImode || GET_MODE (x) == QImode)) + { + /* This is an 8/16-bit store, record it. */ + last_hiqi_store = insn; + } + else + { + /* 32-bit store. This store undoes the possibility of badness in + earlier 8/16-bit stores because it forces those stores to + finish. */ + last_hiqi_store = NULL; + } + } + else if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD) + { + /* Load */ + if (store_insn) + { + while (insns_since_store++ < LOAD_STORE_OFF) + { + emit_insn_before (gen_nop (), insn); + } + } + } + else if (attr_type == TYPE_JUMP || attr_type == TYPE_CALL) + { + enum attr_condjmp attr_condjmp = get_attr_condjmp (insn); + if (attr_condjmp == CONDJMP_UNCOND) + { + /* Pipeline gets cleared; any load is inconsequential. */ + store_insn = NULL; + } + } + else + { + insns_since_store++; + } + if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD + || attr_type == TYPE_JUMP || attr_type == TYPE_CALL) + { + if (last_hiqi_store) + { + /* Need to memory barrier the s8i/s16i instruction. */ + emit_insn_after (gen_memory_barrier (), last_hiqi_store); + last_hiqi_store = NULL; + } + } +} + +static void +xtensa_psram_cache_fix_nop_reorg () +{ + rtx_insn *insn, *subinsn, *next_insn; + for (insn = get_insns (); insn != 0; insn = next_insn) + { + next_insn = NEXT_INSN (insn); + int length = get_attr_length (insn); + + if (USEFUL_INSN_P (insn) && length > 0) + { + FOR_EACH_SUBINSN (subinsn, insn) + { + handle_fix_reorg_insn (subinsn); + } + } + } +} + +/* Alternative fix to xtensa_psram_cache_fix_reorg. Tries to solve the 32-bit + load/store inversion by explicitly inserting a memory barrier instead of + nops. + Slower than nops, but faster than just adding memws everywhere. */ + +static void +handle_fix_reorg_memw (rtx_insn *insn) +{ + enum attr_type attr_type = get_attr_type (insn); + rtx x = XEXP (PATTERN (insn), 0); + if (attr_type == TYPE_STORE || attr_type == TYPE_FSTORE) + { + /* Store */ + insns_since_store = 0; + store_insn = insn; + if (attr_type == TYPE_STORE + && (GET_MODE (x) == HImode || GET_MODE (x) == QImode)) + { + /* This is an 8/16-bit store, record it if it's not volatile + already. */ + if (!MEM_VOLATILE_P (x)) + last_hiqi_store = insn; + } + } + else if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD) + { + /* Load */ + if (MEM_P (x) && (!MEM_VOLATILE_P (x))) + { + if (store_insn) + { + emit_insn_before (gen_memory_barrier (), insn); + store_insn = NULL; + } + } + } + else if (attr_type == TYPE_JUMP || attr_type == TYPE_CALL) + { + enum attr_condjmp attr_condjmp = get_attr_condjmp (insn); + if (attr_condjmp == CONDJMP_UNCOND) + { + /* jump or return + Unconditional jumps seem to not clear the pipeline, and there may + be a load after. Need to memw if earlier code had a store. */ + if (store_insn) + { + emit_insn_before (gen_memory_barrier (), insn); + store_insn = NULL; + } + } + } + else + { + insns_since_store++; + } + if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD + || attr_type == TYPE_JUMP || attr_type == TYPE_CALL) + { + if (last_hiqi_store) + { + /* Need to memory barrier the s8i/s16i instruction. */ + emit_insn_after (gen_memory_barrier (), last_hiqi_store); + last_hiqi_store = NULL; + } + } +} + +static void +xtensa_psram_cache_fix_memw_reorg () +{ + rtx_insn *insn, *subinsn, *next_insn; + for (insn = get_insns (); insn != 0; insn = next_insn) + { + next_insn = NEXT_INSN (insn); + int length = get_attr_length (insn); + + if (USEFUL_INSN_P (insn) && length > 0) + { + FOR_EACH_SUBINSN (subinsn, insn) + { + handle_fix_reorg_memw (subinsn); + } + } + } +} + +/* Alternative fix to xtensa_psram_cache_fix_reorg. Tries to solve the 32-bit + load/store inversion by explicitly inserting a load after every store. + + For now, the logic is: + - Instruction is s32i: + Insert l32i from that address to the source register immediately after, + plus a duplicated s32i after that. + - Instruction is s8i/s16i: + Note and insert a memw before a load. + (The same as xtensa_psram_cache_fix_reorg) + - If any of the args are volatile, no touch: + The memw resulting from that will fix everything. + + Note: debug_rtx(insn) can dump an insn in lisp-like format. +*/ + +static void +handle_fix_dupldst_store (rtx_insn *insn, enum attr_type attr_type) +{ + rtx x = XEXP (PATTERN (insn), 0); + /* Store */ + if (attr_type == TYPE_STORE + && (GET_MODE (x) == HImode || GET_MODE (x) == QImode)) + { + /* This is an 8/16-bit store, record it if it's not volatile already. */ + if (!MEM_VOLATILE_P (x)) + last_hiqi_store = insn; + } + else + { + /* 32-bit store. + Add a load-after-store to fix psram issues *if* var is not volatile */ + if (MEM_P (x) && (!MEM_VOLATILE_P (x))) + { + rtx y = XEXP (PATTERN (insn), 1); + if (REG_P (y) && XINT (y, 0) == 1) + { + /* Store SP in mem? Can't movsi that back. + Insert memory barrier instead. */ + emit_insn_after (gen_memory_barrier (), insn); + } + else + { + /* Add the load/store. + Note: the instructions will be added in the OPPOSITE order as + the instructions are added between the s32i and the next + instruction: + 1: + s32i(insn), s32i; + 2: + s32i(insn), l32i, s32i; */ + /* Store again */ + emit_insn_after (gen_movsi (x, y), insn); + /* Load */ + emit_insn_after (gen_movsi (x, y), insn); + } + } + } +} + +static void +handle_fix_dupldst_reorg (rtx_insn *insn) +{ + enum attr_type attr_type = get_attr_type (insn); + if (attr_type == TYPE_STORE || attr_type == TYPE_FSTORE) + { + handle_fix_dupldst_store (insn, attr_type); + } + + if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD + || attr_type == TYPE_JUMP || attr_type == TYPE_CALL) + { + if (last_hiqi_store) + { + /* Need to memory barrier the s8i/s16i instruction. */ + emit_insn_after (gen_memory_barrier (), last_hiqi_store); + last_hiqi_store = NULL; + } + } +} + +static void +xtensa_psram_cache_fix_dupldst_reorg () +{ + rtx_insn *insn, *subinsn, *next_insn; + last_hiqi_store = NULL; + for (insn = get_insns (); insn != 0; insn = next_insn) + { + next_insn = NEXT_INSN (insn); + int length = get_attr_length (insn); + + if (USEFUL_INSN_P (insn) && length > 0) + { + FOR_EACH_SUBINSN (subinsn, insn) + { + handle_fix_dupldst_reorg (insn); + } + } + } +} + +/* Emits a memw before every load/store instruction. + Hard-handed approach to get rid of any pipeline/memory issues... */ +static void +xtensa_insert_memw_reorg () +{ + rtx_insn *insn, *subinsn, *next_insn; + int had_memw = 0; + for (insn = get_insns (); insn != 0; insn = next_insn) + { + next_insn = NEXT_INSN (insn); + int length = get_attr_length (insn); + + if (USEFUL_INSN_P (insn) && length > 0) + { + FOR_EACH_SUBINSN (subinsn, insn) + { + rtx x = XEXP (PATTERN (subinsn), 0); + enum attr_type attr_type = get_attr_type (subinsn); + if (attr_type == TYPE_STORE) + { + if (MEM_P (x) && (!MEM_VOLATILE_P (x))) + { + emit_insn_after (gen_memory_barrier (), subinsn); + } + had_memw = 1; + } + else if (attr_type == TYPE_LOAD) + { + if (MEM_P (x) && (!MEM_VOLATILE_P (x)) && !had_memw) + { + emit_insn_before (gen_memory_barrier (), subinsn); + } + had_memw = 0; + } + else + { + had_memw = 0; + } + } + } + } +} + +static unsigned int +xtensa_machine_reorg (void) +{ + if (TARGET_ESP32_ALWAYS_MEMBARRIER) + { + xtensa_insert_memw_reorg (); + } + if (TARGET_ESP32_PSRAM_FIX_ENA) + { + if (esp32_psram_fix_strat == ESP32_PSRAM_FIX_DUPLDST) + { + xtensa_psram_cache_fix_dupldst_reorg (); + } + else if (esp32_psram_fix_strat == ESP32_PSRAM_FIX_MEMW) + { + xtensa_psram_cache_fix_memw_reorg (); + } + else if (esp32_psram_fix_strat == ESP32_PSRAM_FIX_NOPS) + { + xtensa_psram_cache_fix_nop_reorg (); + } + else + { + /* default to memw (note: 5.2.x defaulted to nops) */ + xtensa_psram_cache_fix_memw_reorg (); + } + } + return 0; +} + +namespace +{ + +const pass_data pass_data_xtensa_psram_nops = +{ + RTL_PASS, /* type */ + "xtensa-psram-adj", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_MACH_DEP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_xtensa_psram_nops : public rtl_opt_pass +{ +public: + pass_xtensa_psram_nops (gcc::context *ctxt) + : rtl_opt_pass (pass_data_xtensa_psram_nops, ctxt) + { + } + + /* opt_pass methods: */ + virtual unsigned int + execute (function *) + { + return xtensa_machine_reorg (); + } + +}; /* class pass_mips_machine_reorg2 */ + +} /* anon namespace */ + +rtl_opt_pass * +make_pass_xtensa_psram_nops (gcc::context *ctxt) +{ + return new pass_xtensa_psram_nops (ctxt); +} + + static void xtensa_option_override (void) { @@ -2707,6 +3138,19 @@ xtensa_option_override (void) if (flag_pic && !flag_pie) flag_shlib = 1; + /* Register machine specific reorg for optional nop insertion to + fix psram cache bug on esp32 v0/v1 silicon */ + opt_pass *new_pass = make_pass_xtensa_psram_nops (g); + struct register_pass_info insert_pass_xtensa_psram_nops = + { + new_pass, /* pass */ + "dbr", /* reference_pass_name */ + 1, /* ref_pass_instance_number */ + PASS_POS_INSERT_AFTER /* po_op */ + }; + register_pass (&insert_pass_xtensa_psram_nops); + + /* Hot/cold partitioning does not work on this architecture, because of constant pools (the load instruction cannot necessarily reach that far). Therefore disable it on this architecture. */ diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h index 16e3d55e896..21c038ca3d7 100644 --- a/gcc/config/xtensa/xtensa.h +++ b/gcc/config/xtensa/xtensa.h @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3. If not see /* Get Xtensa configuration settings */ #include "xtensa-config.h" +#include "xtensa-opts.h" /* External variables defined in xtensa.cc. */ diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 608110c20bc..e8013987dbf 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -97,6 +97,10 @@ "unknown,none,QI,HI,SI,DI,SF,DF,BL" (const_string "unknown")) +(define_attr "condjmp" + "na,cond,uncond" + (const_string "na")) + (define_attr "length" "" (const_int 1)) ;; Describe a user's asm statement. @@ -115,14 +119,38 @@ ;; reservations in the pipeline description below. The Xtensa can ;; issue one instruction per cycle, so defining CPU units is unnecessary. +(define_cpu_unit "loadstore") + (define_insn_reservation "xtensa_any_insn" 1 - (eq_attr "type" "!load,fload,rsr,mul16,mul32,fmadd,fconv") + (eq_attr "type" "!load,fload,store,fstore,rsr,mul16,mul32,fmadd,fconv") + "nothing") + +(define_insn_reservation "xtensa_memory_load" 2 + (and (not (match_test "TARGET_ESP32_PSRAM_FIX_ENA")) + (eq_attr "type" "load,fload")) "nothing") -(define_insn_reservation "xtensa_memory" 2 - (eq_attr "type" "load,fload") +(define_insn_reservation "xtensa_memory_store" 1 + (and (not (match_test "TARGET_ESP32_PSRAM_FIX_ENA")) + (eq_attr "type" "store,fstore")) "nothing") +;; If psram cache issue needs fixing, it's better to keep +;; stores far from loads from the same address. We cannot encode +;; that behaviour entirely here (or maybe we can, but at least +;; not easily), but we can try to get everything that smells like +;; load or store up to a pipeline length apart from each other. + +(define_insn_reservation "xtensa_memory_load_psram_fix" 2 + (and (match_test "TARGET_ESP32_PSRAM_FIX_ENA") + (eq_attr "type" "load,fload")) + "loadstore*5") + +(define_insn_reservation "xtensa_memory_store_psram_fix" 1 + (and (match_test "TARGET_ESP32_PSRAM_FIX_ENA") + (eq_attr "type" "store,fstore")) + "loadstore*5") + (define_insn_reservation "xtensa_sreg" 2 (eq_attr "type" "rsr") "nothing") @@ -1616,6 +1644,7 @@ } [(set_attr "type" "jump,jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3,3")]) (define_insn "*ubtrue" @@ -1631,6 +1660,7 @@ } [(set_attr "type" "jump,jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3,3")]) ;; Branch patterns for bit testing @@ -1665,6 +1695,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3")]) (define_insn "*masktrue" @@ -1686,6 +1717,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3")]) (define_insn "*masktrue_bitcmpl" @@ -1707,6 +1739,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3")]) (define_insn_and_split "*masktrue_const_bitcmpl" @@ -1932,6 +1965,7 @@ "loop\t%0, %l1_LEND" [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3")]) (define_insn "zero_cost_loop_end" @@ -1949,6 +1983,7 @@ "#" [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "0")]) (define_insn "loop_end" @@ -1968,6 +2003,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "0")]) (define_split @@ -2303,6 +2339,7 @@ } [(set_attr "type" "call") (set_attr "mode" "none") + (set_attr "condjmp" "uncond") (set_attr "length" "3")]) (define_expand "untyped_call" @@ -2347,6 +2384,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "uncond") (set (attr "length") (if_then_else (match_test "TARGET_DENSITY") (const_int 2) @@ -2653,6 +2691,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3")]) (define_insn "*boolfalse" @@ -2671,6 +2710,7 @@ } [(set_attr "type" "jump") (set_attr "mode" "none") + (set_attr "condjmp" "cond") (set_attr "length" "3")]) diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt index 08338e39060..3696a7dd5fe 100644 --- a/gcc/config/xtensa/xtensa.opt +++ b/gcc/config/xtensa/xtensa.opt @@ -18,6 +18,9 @@ ; along with GCC; see the file COPYING3. If not see ; . +HeaderInclude +config/xtensa/xtensa-opts.h + mconst16 Target Mask(CONST16) Use CONST16 instruction to load constants. @@ -60,3 +63,31 @@ Use call0 ABI. mabi=windowed Target RejectNegative Var(xtensa_windowed_abi, 1) Use windowed registers ABI. + +malways-memw +Target Mask(ESP32_ALWAYS_MEMBARRIER) +Always emit a MEMW before a load and after a store operation. Used to debug memory coherency issues. + +mfix-esp32-psram-cache-issue +Target Mask(ESP32_PSRAM_FIX_ENA) +Work around a PSRAM cache issue in the ESP32 ECO1 chips. + +mfix-esp32-psram-cache-strategy= +Target RejectNegative JoinedOrMissing Enum(esp32_psram_fix_type) Var(esp32_psram_fix_strat) Init(ESP32_PSRAM_FIX_MEMW) +Specify a psram cache fix strategy. + +Enum +Name(esp32_psram_fix_type) Type(enum esp32_psram_fix_type) +Psram cache fix strategies (for use with -mfix-esp32-psram-cache- strategy= option): + +EnumValue +Enum(esp32_psram_fix_type) String(dupldst) Value(ESP32_PSRAM_FIX_DUPLDST) +Fix esp32 psram cache issue by duplicating stores and non-word loads. + +EnumValue +Enum(esp32_psram_fix_type) String(memw) Value(ESP32_PSRAM_FIX_MEMW) +Fix esp32 psram cache issue by inserting memory barriers in critical