From patchwork Wed Nov 15 14:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 165393 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp2562412vqg; Wed, 15 Nov 2023 06:11:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IFakfFyVXDsGT2qoEkITllKBLuD7+QD21VgDZ5TtpKII7RCnjsxtV1yDglUcoVdZ8kYe6Lv X-Received: by 2002:a05:620a:1a14:b0:773:a8d0:b33f with SMTP id bk20-20020a05620a1a1400b00773a8d0b33fmr6453202qkb.59.1700057484867; Wed, 15 Nov 2023 06:11:24 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700057484; cv=pass; d=google.com; s=arc-20160816; b=IgeRh2JC4ydeAnGfG84YqQk9/RVXngOdc7ioS29BaXmGtH0RopcR58UfuRHL8rRIe/ nHEsunmbEtV9VZFp5zGZjP6yyIMg3aKDyEtaJ0JQ9ckaPBk18cYagqv56ZwxZiTqFRI/ Q5Q6h0b2KVXhULK9tdabC83kZk6Dwfpq7i3OgaoKqD9zofBE3p7jVQzrpwSbfj82f2sR 7PZjPxo4+AZNxSXB8qH7Cmfvk3rhb9bUQb73wyQvsKQF1msGkEm07GepbaphaTMHG7kJ 4kDcNoaOcNJNlDqrwM9LnoylcHUG9dyFN6tUaBSASc+apuVLjV9fesJ2Vcdj5KYHZ9xN +EOw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:to:content-language:subject :from:user-agent:mime-version:date:message-id:ironport-sdr :arc-filter:dmarc-filter:delivered-to; bh=kroqA6XnvKeveCPpNhvekblu/fACCtWfs3UE8AokLxg=; fh=XNn3asQvIblazGK92GBt13dVv+YmGV3pBS0JC29ZQco=; b=L8MeoRanN9sofjFO6uUcvGlkXS57EYq3kr1dCEH2u2MYow88vaCp6O9H0xterwcwmi mWZqNBkBDqT0Qku5DQGQQXjAzhRTgV1UIElazpXXz2mG5+vLPNV5fb76QeK9T3KJWaV6 yQ1UfmCpc928mdmb3JwHIide30xRy7DlAVNiVl3rfNYaBAAt4CIV+0ASoA4VMzA9x4Q+ +8Uno8vYhNMkrnwvxVRBj8SZypV+vQfHldcMXuO6pLIQptHjtJmSil3mtM4axN4zOOw4 GN0XErB9rbDRD/jJ4bDmBeo6bHCUAsRLiRlDvy92segZbydsPtPQ1RgEnpca+xUalpdv rNjg== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id oo17-20020a05620a531100b00767ccf42f41si8922150qkn.29.2023.11.15.06.11.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 06:11:24 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 82F773858C56 for ; Wed, 15 Nov 2023 14:11:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id E4CEE3858D20 for ; Wed, 15 Nov 2023 14:10:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4CEE3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E4CEE3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.129.153 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700057459; cv=none; b=GyHVMpcmCwWjXcJppi1r/DsYxJPFyJi78np2VZKnxuLe7AhWZqAA9OeNkfl0beM0DuCRzsU7sez8H9VX92pPzdv5UJYq84VYFYi51ssuFw/FBDJ6SUFDobte3HqKuwStMB5wf69jewesAlC7aI5TPaVcmD8yM7enVTLTVbZb/hc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700057459; c=relaxed/simple; bh=sy2gqDshC0CdvPd1pNclY6DTKsJfsgqFjtqbuo5mYqY=; h=Message-ID:Date:MIME-Version:From:Subject:To; b=wXwaud1qBG9a7oLELBSvPM2/RagJXOan2oqHjIIK25d4TQdjldV70n57dU3efwtHV5Ro3ZjmMeSRB1C1fjdIrxp70bZ/ab75nyF9kMr2iFO4X3JxKGlMbeAYIIwihk7ORVRqSt3I89wzDREKyYdlFORvmopDg5UO6PgJUQMGDk4= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: s1mJ9swpRBKhTPK0vFF18Q== X-CSE-MsgGUID: YPhxDYZ5RgSJSfT8iulsWQ== X-IronPort-AV: E=Sophos;i="6.03,305,1694764800"; d="scan'208";a="25804644" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 15 Nov 2023 06:10:52 -0800 IronPort-SDR: fSNiHtF6594jQEy+gWujJLTiaq8F3hjAHLCGzMETK+ZxNu3aOdp53yD+kR0VkM3qZ3zOSUJSUs TOVosaKf1z3s5YE4wg3rkPIpAmSS9plCux7X9xmWkBCWIAU4NSJaalj3HqSyi+yK234+yQD49g Ko03TBfN0ynfgHXWkFdY4uM53HbkwIDHScE2wn8ta933jZesbfYEiv1/3DNmbWJPQ0HEpPVwDR YCgDD+WpQ5nZMu0nOuGhew1I92MYrf3HJ5x8niW0shUI0+nLizYUBDEpsnRtsujK3HsODW09eW 5C8= Message-ID: Date: Wed, 15 Nov 2023 14:10:47 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Andrew Stubbs Subject: [committed] amdgcn: Add Accelerator VGPR registers Content-Language: en-GB To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_35_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782639477484832651 X-GMAIL-MSGID: 1782639477484832651 AMD GPUs since CDNA1 have had a new register file with an additional 256 32-bit-by-64-lane vector registers. This doubles the number of vector registers on the device, compared to previous models. The way the hardware works is that the register file is divided between all the running threads, so a single thread cannot use all this capacity without limiting parallism; doubling the number makes this much nicer. The new registers can only be used for selected operations (mostly related to matrices), none of which GCC supports easily, but we can use them as spill space and avoid costly stack accesses for very large registers. In CDNA2 there were additional instruction encodings added for load and store to and from these new registers, so that opens up more possibilities for optimzations. This patch adds the new registers as CALL_USED (so they will never add to function call overhead), configures them as spill space and load/store targets (CDNA2 only), and provides the necessary move instructions. There are many tweaks to the target hooks to handle the new cases, but there are not intended to be any functional changes to any other registers or instructions. The original work was done by Andrew Jenner, and I've finished off the task with debug and tidy-up. Andrew amdgcn: Add Accelerator VGPR registers Add the new CDNA register file. We don't support any of the specialized instructions that use these registers, but they're useful to relieve register pressure without spilling to stack. Co-authored-by: Andrew Jenner gcc/ChangeLog: * config/gcn/constraints.md: Add "a" AVGPR constraint. * config/gcn/gcn-valu.md (*mov): Add AVGPR alternatives. (*mov_4reg): Likewise. (@mov_sgprbase): Likewise. (gather_insn_1offset): Likewise. (gather_insn_1offset_ds): Likewise. (gather_insn_2offsets): Likewise. (scatter_expr): Likewise. (scatter_insn_1offset_ds): Likewise. (scatter_insn_2offsets): Likewise. * config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define. (gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS. (gcn_hard_regno_mode_ok): Likewise. (gcn_regno_reg_class): Likewise. (gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS. (gcn_sgpr_move_p): Handle AVGPRs. (gcn_secondary_reload): Reload AVGPRs via VGPRs. (gcn_conditional_register_usage): Handle AVGPRs. (gcn_vgpr_equivalent_register_operand): New function. (gcn_valid_move_p): Check for validity of AVGPR moves. (gcn_compute_frame_offsets): Handle AVGPRs. (gcn_memory_move_cost): Likewise. (gcn_register_move_cost): Likewise. (gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI. (gcn_md_reorg): Handle AVGPRs. (gcn_hsa_declare_function_name): Likewise. (print_reg): Likewise. (gcn_dwarf_register_number): Likewise. * config/gcn/gcn.h (FIRST_AVGPR_REG): Define. (AVGPR_REGNO): Define. (LAST_AVGPR_REG): Define. (SOFT_ARG_REG): Update. (FRAME_POINTER_REGNUM): Update. (DWARF_LINK_REGISTER): Update. (FIRST_PSEUDO_REGISTER): Update. (AVGPR_REGNO_P): Define. (enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS. (REG_CLASS_CONTENTS): Add new register classes and add entries for AVGPRs to all classes. (REGISTER_NAMES): Add AVGPRs. * config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define. (AP_REGNUM, FP_REGNUM): Update. (define_attr "type"): Add vop3p_mai. (define_attr "unit"): Handle vop3p_mai. (define_attr "gcn_version"): Add "cdna2". (define_attr "enabled"): Handle cdna2. (*mov_insn): Add AVGPR alternatives. (*movti_insn): Likewise. * config/gcn/mkoffload.cc (isa_has_combined_avgprs): New. (process_asm): Process avgpr_count. * config/gcn/predicates.md (gcn_avgpr_register_operand): New. (gcn_avgpr_hard_register_operand): New. * doc/md.texi: Document the "a" constraint. gcc/testsuite/ChangeLog: * gcc.target/gcn/avgpr-mem-double.c: New test. * gcc.target/gcn/avgpr-mem-int.c: New test. * gcc.target/gcn/avgpr-mem-long.c: New test. * gcc.target/gcn/avgpr-mem-short.c: New test. * gcc.target/gcn/avgpr-spill-double.c: New test. * gcc.target/gcn/avgpr-spill-int.c: New test. * gcc.target/gcn/avgpr-spill-long.c: New test. * gcc.target/gcn/avgpr-spill-short.c: New test. libgomp/ChangeLog: * plugin/plugin-gcn.c (max_isa_vgprs): New. (run_kernel): CDNA2 devices have more VGPRs. diff --git a/gcc/config/gcn/constraints.md b/gcc/config/gcn/constraints.md index efe462a0bd6..b29dc5b6643 100644 --- a/gcc/config/gcn/constraints.md +++ b/gcc/config/gcn/constraints.md @@ -77,6 +77,9 @@ (define_constraint "Y" (define_register_constraint "v" "VGPR_REGS" "VGPR registers") +(define_register_constraint "a" "TARGET_CDNA1_PLUS ? AVGPR_REGS : NO_REGS" + "Accumulator VGPR registers") + (define_register_constraint "Sg" "SGPR_REGS" "SGPR registers") diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 8dc93e8c82e..23f2bbe454b 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -449,12 +449,16 @@ (define_insn "mov_unspec" (set_attr "length" "0")]) (define_insn "*mov" - [(set (match_operand:V_1REG 0 "nonimmediate_operand" "=v,v") - (match_operand:V_1REG 1 "general_operand" "vA,B"))] - "" - "v_mov_b32\t%0, %1" - [(set_attr "type" "vop1,vop1") - (set_attr "length" "4,8")]) + [(set (match_operand:V_1REG 0 "nonimmediate_operand") + (match_operand:V_1REG 1 "general_operand"))] + "" + {@ [cons: =0, 1; attrs: type, length, gcn_version] + [v ,vA;vop1 ,4,* ] v_mov_b32\t%0, %1 + [v ,B ;vop1 ,8,* ] ^ + [v ,a ;vop3p_mai,8,* ] v_accvgpr_read_b32\t%0, %1 + [$a ,v ;vop3p_mai,8,* ] v_accvgpr_write_b32\t%0, %1 + [a ,a ;vop1 ,4,cdna2] v_accvgpr_mov_b32\t%0, %1 + }) (define_insn "mov_exec" [(set (match_operand:V_1REG 0 "nonimmediate_operand") @@ -493,17 +497,29 @@ (define_insn "mov_exec" ; (set_attr "length" "4,8,16,16")]) (define_insn "*mov" - [(set (match_operand:V_2REG 0 "nonimmediate_operand" "=v") - (match_operand:V_2REG 1 "general_operand" "vDB"))] + [(set (match_operand:V_2REG 0 "nonimmediate_operand" "=v, v,$a,a") + (match_operand:V_2REG 1 "general_operand" "vDB,a, v,a"))] "" - { - if (!REG_P (operands[1]) || REGNO (operands[0]) <= REGNO (operands[1])) - return "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1"; - else - return "v_mov_b32\t%H0, %H1\;v_mov_b32\t%L0, %L1"; - } - [(set_attr "type" "vmult") - (set_attr "length" "16")]) + "@ + * if (!REG_P (operands[1]) || REGNO (operands[0]) <= REGNO (operands[1])) \ + return \"v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\"; \ + else \ + return \"v_mov_b32\t%H0, %H1\;v_mov_b32\t%L0, %L1\"; + * if (REGNO (operands[0]) <= REGNO (operands[1])) \ + return \"v_accvgpr_read_b32\t%L0, %L1\;v_accvgpr_read_b32\t%H0, %H1\"; \ + else \ + return \"v_accvgpr_read_b32\t%H0, %H1\;v_accvgpr_read_b32\t%L0, %L1\"; + * if (REGNO (operands[0]) <= REGNO (operands[1])) \ + return \"v_accvgpr_write_b32\t%L0, %L1\;v_accvgpr_write_b32\t%H0, %H1\"; \ + else \ + return \"v_accvgpr_write_b32\t%H0, %H1\;v_accvgpr_write_b32\t%L0, %L1\"; + * if (REGNO (operands[0]) <= REGNO (operands[1])) \ + return \"v_accvgpr_mov_b32\t%L0, %L1\;v_accvgpr_mov_b32\t%H0, %H1\"; \ + else \ + return \"v_accvgpr_mov_b32\t%H0, %H1\;v_accvgpr_mov_b32\t%L0, %L1\";" + [(set_attr "type" "vmult,vmult,vmult,vmult") + (set_attr "length" "16,16,16,8") + (set_attr "gcn_version" "*,*,*,cdna2")]) (define_insn "mov_exec" [(set (match_operand:V_2REG 0 "nonimmediate_operand" "= v, v, v, v, m") @@ -546,17 +562,15 @@ (define_insn "mov_exec" (set_attr "length" "16,16,16,16,16")]) (define_insn "*mov_4reg" - [(set (match_operand:V_4REG 0 "nonimmediate_operand" "=v") - (match_operand:V_4REG 1 "general_operand" "vDB"))] + [(set (match_operand:V_4REG 0 "nonimmediate_operand") + (match_operand:V_4REG 1 "general_operand"))] "" - { - return "v_mov_b32\t%L0, %L1\;" - "v_mov_b32\t%H0, %H1\;" - "v_mov_b32\t%J0, %J1\;" - "v_mov_b32\t%K0, %K1\;"; - } - [(set_attr "type" "vmult") - (set_attr "length" "16")]) + {@ [cons: =0, 1; attrs: type, length, gcn_version] + [v,vDB;vmult,16,* ] v_mov_b32\t%L0, %L1\; v_mov_b32\t%H0, %H1\; v_mov_b32\t%J0, %J1\; v_mov_b32\t%K0, %K1 + [v,a ;vmult,32,* ] v_accvgpr_read_b32\t%L0, %L1\; v_accvgpr_read_b32\t%H0, %H1\; v_accvgpr_read_b32\t%J0, %J1\; v_accvgpr_read_b32\t%K0, %K1 + [a,v ;vmult,32,* ] v_accvgpr_write_b32\t%L0, %L1\;v_accvgpr_write_b32\t%H0, %H1\;v_accvgpr_write_b32\t%J0, %J1\;v_accvgpr_write_b32\t%K0, %K1 + [a,a ;vmult,32,cdna2] v_accvgpr_mov_b32\t%L0, %L1\; v_accvgpr_mov_b32\t%H0, %H1\; v_accvgpr_mov_b32\t%J0, %J1\; v_accvgpr_mov_b32\t%K0, %K1 + }) (define_insn "mov_exec" [(set (match_operand:V_4REG 0 "nonimmediate_operand" "= v, v, v, v, m") @@ -648,19 +662,21 @@ (define_insn "@mov_sgprbase" UNSPEC_SGPRBASE)) (clobber (match_operand: 2 "register_operand"))] "lra_in_progress || reload_completed" - {@ [cons: =0, 1, =2; attrs: type, length] - [v,vA,&v;vop1,4 ] v_mov_b32\t%0, %1 - [v,vB,&v;vop1,8 ] ^ - [v,m ,&v;* ,12] # - [m,v ,&v;* ,12] # + {@ [cons: =0, 1, =2; attrs: type, length, gcn_version] + [v,vA,&v;vop1,4 ,* ] v_mov_b32\t%0, %1 + [v,vB,&v;vop1,8 ,* ] ^ + [v,m ,&v;* ,12,* ] # + [m,v ,&v;* ,12,* ] # + [a,m ,&v;* ,12,cdna2] # + [m,a ,&v;* ,12,cdna2] # }) (define_insn "@mov_sgprbase" - [(set (match_operand:V_2REG 0 "nonimmediate_operand" "= v, v, m") + [(set (match_operand:V_2REG 0 "nonimmediate_operand" "= v, v, m, a, m") (unspec:V_2REG - [(match_operand:V_2REG 1 "general_operand" "vDB, m, v")] + [(match_operand:V_2REG 1 "general_operand" "vDB, m, v, m, a")] UNSPEC_SGPRBASE)) - (clobber (match_operand: 2 "register_operand" "=&v,&v,&v"))] + (clobber (match_operand: 2 "register_operand" "=&v,&v,&v,&v,&v"))] "lra_in_progress || reload_completed" "@ * if (!REG_P (operands[1]) || REGNO (operands[0]) <= REGNO (operands[1])) \ @@ -668,9 +684,12 @@ (define_insn "@mov_sgprbase" else \ return \"v_mov_b32\t%H0, %H1\;v_mov_b32\t%L0, %L1\"; # + # + # #" - [(set_attr "type" "vmult,*,*") - (set_attr "length" "8,12,12")]) + [(set_attr "type" "vmult,*,*,*,*") + (set_attr "length" "8,12,12,12,12") + (set_attr "gcn_version" "*,*,*,cdna2,cdna2")]) (define_insn "@mov_sgprbase" [(set (match_operand:V_4REG 0 "nonimmediate_operand") @@ -1126,13 +1145,13 @@ (define_expand "gather_expr" {}) (define_insn "gather_insn_1offset" - [(set (match_operand:V_MOV 0 "register_operand" "=v") + [(set (match_operand:V_MOV 0 "register_operand" "=v,a") (unspec:V_MOV - [(plus: (match_operand: 1 "register_operand" " v") + [(plus: (match_operand: 1 "register_operand" " v,v") (vec_duplicate: - (match_operand 2 "immediate_operand" " n"))) - (match_operand 3 "immediate_operand" " n") - (match_operand 4 "immediate_operand" " n") + (match_operand 2 "immediate_operand" " n,n"))) + (match_operand 3 "immediate_operand" " n,n") + (match_operand 4 "immediate_operand" " n,n") (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_FLAT_P (INTVAL (operands[3])) @@ -1162,16 +1181,17 @@ (define_insn "gather_insn_1offset" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "gcn_version" "*,cdna2")]) (define_insn "gather_insn_1offset_ds" - [(set (match_operand:V_MOV 0 "register_operand" "=v") + [(set (match_operand:V_MOV 0 "register_operand" "=v,a") (unspec:V_MOV - [(plus: (match_operand: 1 "register_operand" " v") + [(plus: (match_operand: 1 "register_operand" " v,v") (vec_duplicate: - (match_operand 2 "immediate_operand" " n"))) - (match_operand 3 "immediate_operand" " n") - (match_operand 4 "immediate_operand" " n") + (match_operand 2 "immediate_operand" " n,n"))) + (match_operand 3 "immediate_operand" " n,n") + (match_operand 4 "immediate_operand" " n,n") (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_ANY_DS_P (INTVAL (operands[3])) @@ -1184,20 +1204,22 @@ (define_insn "gather_insn_1offset_ds" return buf; } [(set_attr "type" "ds") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "gcn_version" "*,cdna2")]) (define_insn "gather_insn_2offsets" - [(set (match_operand:V_MOV 0 "register_operand" "=v") + [(set (match_operand:V_MOV 0 "register_operand" "=v,a") (unspec:V_MOV [(plus: (plus: (vec_duplicate: - (match_operand:DI 1 "register_operand" "Sv")) + (match_operand:DI 1 "register_operand" "Sv,Sv")) (sign_extend: - (match_operand: 2 "register_operand" " v"))) - (vec_duplicate: (match_operand 3 "immediate_operand" " n"))) - (match_operand 4 "immediate_operand" " n") - (match_operand 5 "immediate_operand" " n") + (match_operand: 2 "register_operand" " v,v"))) + (vec_duplicate: (match_operand 3 "immediate_operand" + " n,n"))) + (match_operand 4 "immediate_operand" " n,n") + (match_operand 5 "immediate_operand" " n,n") (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_GLOBAL_P (INTVAL (operands[4])) @@ -1216,7 +1238,8 @@ (define_insn "gather_insn_2offsets" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "gcn_version" "*,cdna2")]) (define_expand "scatter_store" [(match_operand:DI 0 "register_operand") @@ -1255,12 +1278,12 @@ (define_expand "scatter_expr" (define_insn "scatter_insn_1offset" [(set (mem:BLK (scratch)) (unspec:BLK - [(plus: (match_operand: 0 "register_operand" "v") + [(plus: (match_operand: 0 "register_operand" "v,v") (vec_duplicate: - (match_operand 1 "immediate_operand" "n"))) - (match_operand:V_MOV 2 "register_operand" "v") - (match_operand 3 "immediate_operand" "n") - (match_operand 4 "immediate_operand" "n")] + (match_operand 1 "immediate_operand" "n,n"))) + (match_operand:V_MOV 2 "register_operand" "v,a") + (match_operand 3 "immediate_operand" "n,n") + (match_operand 4 "immediate_operand" "n,n")] UNSPEC_SCATTER))] "(AS_FLAT_P (INTVAL (operands[3])) && (INTVAL(operands[1]) == 0 @@ -1288,17 +1311,18 @@ (define_insn "scatter_insn_1offset" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "gcn_version" "*,cdna2")]) (define_insn "scatter_insn_1offset_ds" [(set (mem:BLK (scratch)) (unspec:BLK - [(plus: (match_operand: 0 "register_operand" "v") + [(plus: (match_operand: 0 "register_operand" "v,v") (vec_duplicate: - (match_operand 1 "immediate_operand" "n"))) - (match_operand:V_MOV 2 "register_operand" "v") - (match_operand 3 "immediate_operand" "n") - (match_operand 4 "immediate_operand" "n")] + (match_operand 1 "immediate_operand" "n,n"))) + (match_operand:V_MOV 2 "register_operand" "v,a") + (match_operand 3 "immediate_operand" "n,n") + (match_operand 4 "immediate_operand" "n,n")] UNSPEC_SCATTER))] "(AS_ANY_DS_P (INTVAL (operands[3])) && ((unsigned HOST_WIDE_INT)INTVAL(operands[1]) < 0x10000))" @@ -1310,7 +1334,8 @@ (define_insn "scatter_insn_1offset_ds" return buf; } [(set_attr "type" "ds") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "gcn_version" "*,cdna2")]) (define_insn "scatter_insn_2offsets" [(set (mem:BLK (scratch)) @@ -1318,13 +1343,13 @@ (define_insn "scatter_insn_2offsets" [(plus: (plus: (vec_duplicate: - (match_operand:DI 0 "register_operand" "Sv")) + (match_operand:DI 0 "register_operand" "Sv,Sv")) (sign_extend: - (match_operand: 1 "register_operand" " v"))) - (vec_duplicate: (match_operand 2 "immediate_operand" " n"))) - (match_operand:V_MOV 3 "register_operand" " v") - (match_operand 4 "immediate_operand" " n") - (match_operand 5 "immediate_operand" " n")] + (match_operand: 1 "register_operand" "v,v"))) + (vec_duplicate: (match_operand 2 "immediate_operand" "n,n"))) + (match_operand:V_MOV 3 "register_operand" "v,a") + (match_operand 4 "immediate_operand" "n,n") + (match_operand 5 "immediate_operand" "n,n")] UNSPEC_SCATTER))] "(AS_GLOBAL_P (INTVAL (operands[4])) && (((unsigned HOST_WIDE_INT)INTVAL(operands[2]) + 0x1000) < 0x2000))" @@ -1341,7 +1366,8 @@ (define_insn "scatter_insn_2offsets" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "gcn_version" "*,cdna2")]) ;; }}} ;; {{{ Permutations diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 28065c50bfd..52c8a0e409c 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -96,6 +96,7 @@ static hash_map lds_allocs; #define MAX_NORMAL_SGPR_COUNT 62 // i.e. 64 with VCC #define MAX_NORMAL_VGPR_COUNT 24 +#define MAX_NORMAL_AVGPR_COUNT 24 /* }}} */ /* {{{ Initialization and options. */ @@ -483,7 +484,8 @@ gcn_class_max_nregs (reg_class_t rclass, machine_mode mode) { /* Scalar registers are 32bit, vector registers are in fact tuples of 64 lanes. */ - if (rclass == VGPR_REGS) + if (rclass == VGPR_REGS || rclass == AVGPR_REGS + || rclass == ALL_VGPR_REGS) { if (vgpr_1reg_mode_p (mode)) return 1; @@ -583,7 +585,7 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return (sgpr_1reg_mode_p (mode) || (!((regno - FIRST_SGPR_REG) & 1) && sgpr_2reg_mode_p (mode)) || (((regno - FIRST_SGPR_REG) & 3) == 0 && mode == TImode)); - if (VGPR_REGNO_P (regno)) + if (VGPR_REGNO_P (regno) || (AVGPR_REGNO_P (regno) && TARGET_CDNA1_PLUS)) /* Vector instructions do not care about the alignment of register pairs, but where there is no 64-bit instruction, many of the define_split do not work if the input and output registers partially @@ -623,6 +625,8 @@ gcn_regno_reg_class (int regno) } if (VGPR_REGNO_P (regno)) return VGPR_REGS; + if (AVGPR_REGNO_P (regno)) + return AVGPR_REGS; if (SGPR_REGNO_P (regno)) return SGPR_REGS; if (regno < FIRST_VGPR_REG) @@ -813,7 +817,7 @@ gcn_spill_class (reg_class_t c, machine_mode /*mode */ ) || c == VCC_CONDITIONAL_REG || c == EXEC_MASK_REG) return SGPR_REGS; else - return NO_REGS; + return c == VGPR_REGS && TARGET_CDNA1_PLUS ? AVGPR_REGS : NO_REGS; } /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS. @@ -2348,12 +2352,15 @@ gcn_sgpr_move_p (rtx op0, rtx op1) return true; if (MEM_P (op1) && AS_SCALAR_FLAT_P (MEM_ADDR_SPACE (op1))) return true; - if (!REG_P (op0) || REGNO (op0) >= FIRST_PSEUDO_REGISTER - || VGPR_REGNO_P (REGNO (op0))) + if (!REG_P (op0) + || REGNO (op0) >= FIRST_PSEUDO_REGISTER + || VGPR_REGNO_P (REGNO (op0)) + || AVGPR_REGNO_P (REGNO (op0))) return false; if (REG_P (op1) && REGNO (op1) < FIRST_PSEUDO_REGISTER - && !VGPR_REGNO_P (REGNO (op1))) + && !VGPR_REGNO_P (REGNO (op1)) + && !AVGPR_REGNO_P (REGNO (op1))) return true; return immediate_operand (op1, VOIDmode) || memory_operand (op1, VOIDmode); } @@ -2424,6 +2431,11 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t rclass, result = (rclass == VGPR_REGS ? NO_REGS : VGPR_REGS); break; } + + /* CDNA1 doesn't have an instruction for going between the accumulator + registers and memory. Go via a VGPR in this case. */ + if (TARGET_CDNA1 && rclass == AVGPR_REGS && result != VGPR_REGS) + result = VGPR_REGS; } if (dump_file && (dump_flags & TDF_DETAILS)) @@ -2445,7 +2457,8 @@ gcn_conditional_register_usage (void) if (cfun->machine->normal_function) { - /* Restrict the set of SGPRs and VGPRs used by non-kernel functions. */ + /* Restrict the set of SGPRs, VGPRs and AVGPRs used by non-kernel + functions. */ for (int i = SGPR_REGNO (MAX_NORMAL_SGPR_COUNT); i <= LAST_SGPR_REG; i++) fixed_regs[i] = 1, call_used_regs[i] = 1; @@ -2454,6 +2467,9 @@ gcn_conditional_register_usage (void) i <= LAST_VGPR_REG; i++) fixed_regs[i] = 1, call_used_regs[i] = 1; + for (int i = AVGPR_REGNO (MAX_NORMAL_AVGPR_COUNT); + i <= LAST_AVGPR_REG; i++) + fixed_regs[i] = 1, call_used_regs[i] = 1; return; } @@ -2507,6 +2523,16 @@ gcn_conditional_register_usage (void) fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG]] = 1; } +static bool +gcn_vgpr_equivalent_register_operand (rtx x, machine_mode mode) +{ + if (gcn_vgpr_register_operand (x, mode)) + return true; + if (TARGET_CDNA2_PLUS && gcn_avgpr_register_operand (x, mode)) + return true; + return false; +} + /* Determine if a load or store is valid, according to the register classes and address space. Used primarily by the machine description to decide when to split a move into two steps. */ @@ -2515,21 +2541,36 @@ bool gcn_valid_move_p (machine_mode mode, rtx dest, rtx src) { if (!MEM_P (dest) && !MEM_P (src)) - return true; + { + if (gcn_vgpr_register_operand (src, mode) + && gcn_avgpr_register_operand (dest, mode)) + return true; + if (gcn_avgpr_register_operand (src, mode) + && gcn_vgpr_register_operand (dest, mode)) + return true; + if (TARGET_CDNA2_PLUS + && gcn_avgpr_register_operand (src, mode) + && gcn_avgpr_register_operand (dest, mode)) + return true; + if (gcn_avgpr_hard_register_operand (src, mode) + || gcn_avgpr_hard_register_operand (dest, mode)) + return false; + return true; + } if (MEM_P (dest) && AS_FLAT_P (MEM_ADDR_SPACE (dest)) && (gcn_flat_address_p (XEXP (dest, 0), mode) || GET_CODE (XEXP (dest, 0)) == SYMBOL_REF || GET_CODE (XEXP (dest, 0)) == LABEL_REF) - && gcn_vgpr_register_operand (src, mode)) + && gcn_vgpr_equivalent_register_operand (src, mode)) return true; else if (MEM_P (src) && AS_FLAT_P (MEM_ADDR_SPACE (src)) && (gcn_flat_address_p (XEXP (src, 0), mode) || GET_CODE (XEXP (src, 0)) == SYMBOL_REF || GET_CODE (XEXP (src, 0)) == LABEL_REF) - && gcn_vgpr_register_operand (dest, mode)) + && gcn_vgpr_equivalent_register_operand (dest, mode)) return true; if (MEM_P (dest) @@ -2537,14 +2578,14 @@ gcn_valid_move_p (machine_mode mode, rtx dest, rtx src) && (gcn_global_address_p (XEXP (dest, 0)) || GET_CODE (XEXP (dest, 0)) == SYMBOL_REF || GET_CODE (XEXP (dest, 0)) == LABEL_REF) - && gcn_vgpr_register_operand (src, mode)) + && gcn_vgpr_equivalent_register_operand (src, mode)) return true; else if (MEM_P (src) && AS_GLOBAL_P (MEM_ADDR_SPACE (src)) && (gcn_global_address_p (XEXP (src, 0)) || GET_CODE (XEXP (src, 0)) == SYMBOL_REF || GET_CODE (XEXP (src, 0)) == LABEL_REF) - && gcn_vgpr_register_operand (dest, mode)) + && gcn_vgpr_equivalent_register_operand (dest, mode)) return true; if (MEM_P (dest) @@ -2565,12 +2606,12 @@ gcn_valid_move_p (machine_mode mode, rtx dest, rtx src) if (MEM_P (dest) && AS_ANY_DS_P (MEM_ADDR_SPACE (dest)) && gcn_ds_address_p (XEXP (dest, 0)) - && gcn_vgpr_register_operand (src, mode)) + && gcn_vgpr_equivalent_register_operand (src, mode)) return true; else if (MEM_P (src) && AS_ANY_DS_P (MEM_ADDR_SPACE (src)) && gcn_ds_address_p (XEXP (src, 0)) - && gcn_vgpr_register_operand (dest, mode)) + && gcn_vgpr_equivalent_register_operand (dest, mode)) return true; return false; @@ -3006,7 +3047,8 @@ gcn_compute_frame_offsets (void) if ((df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno)) || ((regno & ~1) == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)) - offsets->callee_saves += (VGPR_REGNO_P (regno) ? 256 : 4); + offsets->callee_saves += (VGPR_REGNO_P (regno) + || AVGPR_REGNO_P (regno) ? 256 : 4); /* Round up to 64-bit boundary to maintain stack alignment. */ offsets->callee_saves = (offsets->callee_saves + 7) & ~7; @@ -3949,6 +3991,11 @@ gcn_memory_move_cost (machine_mode mode, reg_class_t regclass, bool in) if (in) return (LOAD_COST + 2) * nregs; return STORE_COST * nregs; + case AVGPR_REGS: + case ALL_VGPR_REGS: + if (in) + return (LOAD_COST + (TARGET_CDNA2_PLUS ? 2 : 4)) * nregs; + return (STORE_COST + (TARGET_CDNA2_PLUS ? 0 : 2)) * nregs; case ALL_REGS: case ALL_GPR_REGS: case SRCDST_REGS: @@ -3968,6 +4015,15 @@ gcn_memory_move_cost (machine_mode mode, reg_class_t regclass, bool in) static int gcn_register_move_cost (machine_mode, reg_class_t dst, reg_class_t src) { + if (src == AVGPR_REGS) + { + if (dst == AVGPR_REGS) + return TARGET_CDNA1 ? 6 : 2; + if (dst != VGPR_REGS) + return 6; + } + if (dst == AVGPR_REGS && src != VGPR_REGS) + return 6; /* Increase cost of moving from and to vector registers. While this is fast in hardware (I think), it has hidden cost of setting up the exec flags. */ @@ -5674,6 +5730,7 @@ gcn_vmem_insn_p (attr_type type) case TYPE_MUBUF: case TYPE_MTBUF: case TYPE_FLAT: + case TYPE_VOP3P_MAI: return true; case TYPE_UNKNOWN: case TYPE_SOP1: @@ -5913,7 +5970,8 @@ gcn_md_reorg (void) FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST) { const_rtx x = *iter; - if (REG_P (x) && VGPR_REGNO_P (REGNO (x))) + if (REG_P (x) && (VGPR_REGNO_P (REGNO (x)) + || AVGPR_REGNO_P (REGNO (x)))) { if (VECTOR_MODE_P (GET_MODE (x))) { @@ -6069,17 +6127,16 @@ gcn_md_reorg (void) if (!prev_insn->insn) continue; + HARD_REG_SET depregs = prev_insn->writes & ireads; + /* VALU writes SGPR followed by VMEM reading the same SGPR requires 5 wait states. */ if ((prev_insn->age + nops_rqd) < 5 && prev_insn->unit == UNIT_VECTOR - && gcn_vmem_insn_p (itype)) - { - HARD_REG_SET regs = prev_insn->writes & ireads; - if (hard_reg_set_intersect_p - (regs, reg_class_contents[(int) SGPR_REGS])) - nops_rqd = 5 - prev_insn->age; - } + && gcn_vmem_insn_p (itype) + && hard_reg_set_intersect_p + (depregs, reg_class_contents[(int) SGPR_REGS])) + nops_rqd = 5 - prev_insn->age; /* VALU sets VCC/EXEC followed by VALU uses VCCZ/EXECZ requires 5 wait states. */ @@ -6101,15 +6158,12 @@ gcn_md_reorg (void) SGPR/VCC as lane select requires 4 wait states. */ if ((prev_insn->age + nops_rqd) < 4 && prev_insn->unit == UNIT_VECTOR - && get_attr_laneselect (insn) == LANESELECT_YES) - { - HARD_REG_SET regs = prev_insn->writes & ireads; - if (hard_reg_set_intersect_p - (regs, reg_class_contents[(int) SGPR_REGS]) + && get_attr_laneselect (insn) == LANESELECT_YES + && (hard_reg_set_intersect_p + (depregs, reg_class_contents[(int) SGPR_REGS]) || hard_reg_set_intersect_p - (regs, reg_class_contents[(int) VCC_CONDITIONAL_REG])) - nops_rqd = 4 - prev_insn->age; - } + (depregs, reg_class_contents[(int) VCC_CONDITIONAL_REG]))) + nops_rqd = 4 - prev_insn->age; /* VALU writes VGPR followed by VALU_DPP reading that VGPR requires 2 wait states. */ @@ -6117,9 +6171,8 @@ gcn_md_reorg (void) && prev_insn->unit == UNIT_VECTOR && itype == TYPE_VOP_DPP) { - HARD_REG_SET regs = prev_insn->writes & ireads; if (hard_reg_set_intersect_p - (regs, reg_class_contents[(int) VGPR_REGS])) + (depregs, reg_class_contents[(int) VGPR_REGS])) nops_rqd = 2 - prev_insn->age; } @@ -6138,6 +6191,35 @@ gcn_md_reorg (void) (prev_insn->writes, reg_class_contents[(int)VCC_CONDITIONAL_REG]))) nops_rqd = ivccwait - prev_insn->age; + + /* CDNA1: write VGPR before v_accvgpr_write reads it. */ + if (TARGET_CDNA1 + && (prev_insn->age + nops_rqd) < 2 + && hard_reg_set_intersect_p + (depregs, reg_class_contents[(int) VGPR_REGS]) + && hard_reg_set_intersect_p + (iwrites, reg_class_contents[(int) AVGPR_REGS])) + nops_rqd = 2 - prev_insn->age; + + /* CDNA1: v_accvgpr_write writes AVGPR before v_accvgpr_read. */ + if (TARGET_CDNA1 + && (prev_insn->age + nops_rqd) < 3 + && hard_reg_set_intersect_p + (depregs, reg_class_contents[(int) AVGPR_REGS]) + && hard_reg_set_intersect_p + (iwrites, reg_class_contents[(int) VGPR_REGS])) + nops_rqd = 3 - prev_insn->age; + + /* CDNA1: Undocumented(?!) read-after-write when restoring values + from AVGPRs to VGPRS. Observed problem was for address register + of flat_load instruction, but others may be affected? */ + if (TARGET_CDNA1 + && (prev_insn->age + nops_rqd) < 2 + && hard_reg_set_intersect_p + (prev_insn->reads, reg_class_contents[(int) AVGPR_REGS]) + && hard_reg_set_intersect_p + (depregs, reg_class_contents[(int) VGPR_REGS])) + nops_rqd = 2 - prev_insn->age; } /* Insert the required number of NOPs. */ @@ -6429,7 +6511,7 @@ output_file_start (void) void gcn_hsa_declare_function_name (FILE *file, const char *name, tree) { - int sgpr, vgpr; + int sgpr, vgpr, avgpr; bool xnack_enabled = TARGET_XNACK; fputs ("\n\n", file); @@ -6454,6 +6536,12 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) if (df_regs_ever_live_p (FIRST_VGPR_REG + vgpr)) break; vgpr++; + for (avgpr = 255; avgpr >= 0; avgpr--) + if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr)) + break; + avgpr++; + vgpr = (vgpr + 3) & ~3; + avgpr = (avgpr + 3) & ~3; if (!leaf_function_p ()) { @@ -6462,6 +6550,8 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) vgpr = MAX_NORMAL_VGPR_COUNT; if (sgpr < MAX_NORMAL_SGPR_COUNT) sgpr = MAX_NORMAL_SGPR_COUNT; + if (avgpr < MAX_NORMAL_AVGPR_COUNT) + avgpr = MAX_NORMAL_AVGPR_COUNT; } /* The gfx90a accum_offset field can't represent 0 registers. */ @@ -6519,6 +6609,11 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) ? 2 : cfun->machine->args.requested & (1 << WORK_ITEM_ID_Y_ARG) ? 1 : 0); + int next_free_vgpr = vgpr; + if (TARGET_CDNA1 && avgpr > vgpr) + next_free_vgpr = avgpr; + if (TARGET_CDNA2_PLUS) + next_free_vgpr += avgpr; fprintf (file, "\t .amdhsa_next_free_vgpr\t%i\n" "\t .amdhsa_next_free_sgpr\t%i\n" @@ -6529,7 +6624,7 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) "\t .amdhsa_group_segment_fixed_size\t%u\n" "\t .amdhsa_float_denorm_mode_32\t3\n" "\t .amdhsa_float_denorm_mode_16_64\t3\n", - vgpr, + next_free_vgpr, sgpr, xnack_enabled, LDS_SIZE); @@ -6537,7 +6632,7 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) fprintf (file, "\t .amdhsa_accum_offset\t%i\n" "\t .amdhsa_tg_split\t0\n", - (vgpr+3)&~3); // I think this means the AGPRs come after the VGPRs + vgpr); /* The AGPRs come after the VGPRs. */ fputs ("\t.end_amdhsa_kernel\n", file); #if 1 @@ -6564,9 +6659,9 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) cfun->machine->kernarg_segment_byte_size, cfun->machine->kernarg_segment_alignment, LDS_SIZE, - sgpr, vgpr); - if (gcn_arch == PROCESSOR_GFX90a) - fprintf (file, " .agpr_count: 0\n"); // AGPRs are not used, yet + sgpr, next_free_vgpr); + if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908) + fprintf (file, " .agpr_count: %i\n", avgpr); fputs (" .end_amdgpu_metadata\n", file); #endif @@ -6662,6 +6757,9 @@ print_reg (FILE *file, rtx x) else if (VGPR_REGNO_P (REGNO (x))) fprintf (file, "v[%i:%i]", REGNO (x) - FIRST_VGPR_REG, REGNO (x) - FIRST_VGPR_REG + 1); + else if (AVGPR_REGNO_P (REGNO (x))) + fprintf (file, "a[%i:%i]", REGNO (x) - FIRST_AVGPR_REG, + REGNO (x) - FIRST_AVGPR_REG + 1); else if (REGNO (x) == FLAT_SCRATCH_REG) fprintf (file, "flat_scratch"); else if (REGNO (x) == EXEC_REG) @@ -6680,6 +6778,9 @@ print_reg (FILE *file, rtx x) else if (VGPR_REGNO_P (REGNO (x))) fprintf (file, "v[%i:%i]", REGNO (x) - FIRST_VGPR_REG, REGNO (x) - FIRST_VGPR_REG + 3); + else if (AVGPR_REGNO_P (REGNO (x))) + fprintf (file, "a[%i:%i]", REGNO (x) - FIRST_AVGPR_REG, + REGNO (x) - FIRST_AVGPR_REG + 3); else gcc_unreachable (); } @@ -7603,6 +7704,8 @@ gcn_dwarf_register_number (unsigned int regno) } else if (VGPR_REGNO_P (regno)) return (regno - FIRST_VGPR_REG + 2560); + else if (AVGPR_REGNO_P (regno)) + return (regno - FIRST_AVGPR_REG + 3072); /* Otherwise, there's nothing sensible to do. */ return regno + 100000; diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index 6372f49d379..cb52be7a3a1 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -146,6 +146,9 @@ #define FIRST_VGPR_REG 160 #define VGPR_REGNO(N) ((N)+FIRST_VGPR_REG) #define LAST_VGPR_REG 415 +#define FIRST_AVGPR_REG 416 +#define AVGPR_REGNO(N) ((N)+FIRST_AVGPR_REG) +#define LAST_AVGPR_REG 671 /* Frame Registers, and other registers */ @@ -157,10 +160,10 @@ #define RETURN_VALUE_REG 168 /* Must be divisible by 4. */ #define STATIC_CHAIN_REGNUM 30 #define WORK_ITEM_ID_Z_REG 162 -#define SOFT_ARG_REG 416 -#define FRAME_POINTER_REGNUM 418 -#define DWARF_LINK_REGISTER 420 -#define FIRST_PSEUDO_REGISTER 421 +#define SOFT_ARG_REG 672 +#define FRAME_POINTER_REGNUM 674 +#define DWARF_LINK_REGISTER 676 +#define FIRST_PSEUDO_REGISTER 677 #define FIRST_PARM_REG (FIRST_SGPR_REG + 24) #define FIRST_VPARM_REG (FIRST_VGPR_REG + 8) @@ -176,6 +179,7 @@ #define SGPR_OR_VGPR_REGNO_P(N) ((N)>=FIRST_VGPR_REG && (N) <= LAST_SGPR_REG) #define SGPR_REGNO_P(N) ((N) <= LAST_SGPR_REG) #define VGPR_REGNO_P(N) ((N)>=FIRST_VGPR_REG && (N) <= LAST_VGPR_REG) +#define AVGPR_REGNO_P(N) ((N)>=FIRST_AVGPR_REG && (N) <= LAST_AVGPR_REG) #define SSRC_REGNO_P(N) ((N) <= SCC_REG && (N) != VCCZ_REG) #define SDST_REGNO_P(N) ((N) <= EXEC_HI_REG && (N) != VCCZ_REG) #define CC_REG_P(X) (REG_P (X) && CC_REGNO_P (REGNO (X))) @@ -206,7 +210,7 @@ 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ - /* VGRPs */ \ + /* VGPRs */ \ 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ @@ -223,6 +227,23 @@ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + /* Accumulation VGPRs */ \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ /* Other registers. */ \ 1, 1, 1, 1, 1 \ } @@ -244,7 +265,7 @@ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ - /* VGRPs */ \ + /* VGPRs */ \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ @@ -261,6 +282,23 @@ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + /* Accumulation VGPRs */ \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ /* Other registers. */ \ 1, 1, 1, 1, 1 \ } @@ -320,6 +358,8 @@ enum reg_class SGPR_SRC_REGS, GENERAL_REGS, VGPR_REGS, + AVGPR_REGS, + ALL_VGPR_REGS, ALL_GPR_REGS, SRCDST_REGS, AFP_REGS, @@ -345,6 +385,8 @@ enum reg_class "SGPR_SRC_REGS", \ "GENERAL_REGS", \ "VGPR_REGS", \ + "AVGPR_REGS", \ + "ALL_VGPR_REGS", \ "ALL_GPR_REGS", \ "SRCDST_REGS", \ "AFP_REGS", \ @@ -357,40 +399,58 @@ enum reg_class #define REG_CLASS_CONTENTS { \ /* NO_REGS. */ \ {0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* SCC_CONDITIONAL_REG. */ \ {0, 0, 0, 0, \ NAMED_REG_MASK2 (SCC_REG), 0, 0, 0, \ - 0, 0, 0, 0, 0}, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0}, \ /* VCCZ_CONDITIONAL_REG. */ \ {0, 0, 0, NAMED_REG_MASK (VCCZ_REG), \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* VCC_CONDITIONAL_REG. */ \ {0, 0, 0, NAMED_REG_MASK (VCC_LO_REG)|NAMED_REG_MASK (VCC_HI_REG), \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* EXECZ_CONDITIONAL_REG. */ \ {0, 0, 0, 0, \ NAMED_REG_MASK2 (EXECZ_REG), 0, 0, 0, \ - 0, 0, 0, 0, 0}, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0}, \ /* ALL_CONDITIONAL_REGS. */ \ {0, 0, 0, NAMED_REG_MASK (VCCZ_REG), \ NAMED_REG_MASK2 (EXECZ_REG) | NAMED_REG_MASK2 (SCC_REG), 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* EXEC_MASK_REG. */ \ {0, 0, 0, NAMED_REG_MASK (EXEC_LO_REG) | NAMED_REG_MASK (EXEC_HI_REG), \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* SGPR_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, 0xf1, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* SGPR_EXEC_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, \ 0xf1 | NAMED_REG_MASK (EXEC_LO_REG) | NAMED_REG_MASK (EXEC_HI_REG), \ 0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* SGPR_VOP_SRC_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, \ @@ -398,12 +458,16 @@ enum reg_class -NAMED_REG_MASK (EXEC_LO_REG) \ -NAMED_REG_MASK (EXEC_HI_REG), \ NAMED_REG_MASK2 (SCC_REG), 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* SGPR_MEM_SRC_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, \ 0xffffffff-NAMED_REG_MASK (VCCZ_REG)-NAMED_REG_MASK (M0_REG) \ -NAMED_REG_MASK (EXEC_LO_REG)-NAMED_REG_MASK (EXEC_HI_REG), \ 0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* SGPR_DST_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, \ @@ -413,30 +477,56 @@ enum reg_class /* SGPR_SRC_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ NAMED_REG_MASK2 (EXECZ_REG) | NAMED_REG_MASK2 (SCC_REG), 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* GENERAL_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, 0xf1, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0}, \ /* VGPR_REGS. */ \ {0, 0, 0, 0, \ 0, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0}, \ + /* AVGPR_REGS. */ \ + {0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0}, \ + /* ALL_VGPR_REGS. */ \ + {0, 0, 0, 0, \ + 0, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0}, \ /* ALL_GPR_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, 0xf1, \ 0, 0xffffffff, 0xffffffff, 0xffffffff, \ - 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0}, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0}, \ /* SRCDST_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, \ 0xffffffff-NAMED_REG_MASK (VCCZ_REG), \ 0, 0xffffffff, 0xffffffff, 0xffffffff, \ - 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0}, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0}, \ /* AFP_REGS. */ \ {0, 0, 0, 0, \ + 0, 0, 0, 0, \ + 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0xf}, \ /* ALL_REGS. */ \ {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ + 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, \ 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }} @@ -541,6 +631,34 @@ enum gcn_address_spaces "v236", "v237", "v238", "v239", "v240", "v241", "v242", "v243", "v244", \ "v245", "v246", "v247", "v248", "v249", "v250", "v251", "v252", "v253", \ "v254", "v255", \ + "a0", "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "a10", \ + "a11", "a12", "a13", "a14", "a15", "a16", "a17", "a18", "a19", "a20", \ + "a21", "a22", "a23", "a24", "a25", "a26", "a27", "a28", "a29", "a30", \ + "a31", "a32", "a33", "a34", "a35", "a36", "a37", "a38", "a39", "a40", \ + "a41", "a42", "a43", "a44", "a45", "a46", "a47", "a48", "a49", "a50", \ + "a51", "a52", "a53", "a54", "a55", "a56", "a57", "a58", "a59", "a60", \ + "a61", "a62", "a63", "a64", "a65", "a66", "a67", "a68", "a69", "a70", \ + "a71", "a72", "a73", "a74", "a75", "a76", "a77", "a78", "a79", "a80", \ + "a81", "a82", "a83", "a84", "a85", "a86", "a87", "a88", "a89", "a90", \ + "a91", "a92", "a93", "a94", "a95", "a96", "a97", "a98", "a99", "a100", \ + "a101", "a102", "a103", "a104", "a105", "a106", "a107", "a108", "a109", \ + "a110", "a111", "a112", "a113", "a114", "a115", "a116", "a117", "a118", \ + "a119", "a120", "a121", "a122", "a123", "a124", "a125", "a126", "a127", \ + "a128", "a129", "a130", "a131", "a132", "a133", "a134", "a135", "a136", \ + "a137", "a138", "a139", "a140", "a141", "a142", "a143", "a144", "a145", \ + "a146", "a147", "a148", "a149", "a150", "a151", "a152", "a153", "a154", \ + "a155", "a156", "a157", "a158", "a159", "a160", "a161", "a162", "a163", \ + "a164", "a165", "a166", "a167", "a168", "a169", "a170", "a171", "a172", \ + "a173", "a174", "a175", "a176", "a177", "a178", "a179", "a180", "a181", \ + "a182", "a183", "a184", "a185", "a186", "a187", "a188", "a189", "a190", \ + "a191", "a192", "a193", "a194", "a195", "a196", "a197", "a198", "a199", \ + "a200", "a201", "a202", "a203", "a204", "a205", "a206", "a207", "a208", \ + "a209", "a210", "a211", "a212", "a213", "a214", "a215", "a216", "a217", \ + "a218", "a219", "a220", "a221", "a222", "a223", "a224", "a225", "a226", \ + "a227", "a228", "a229", "a230", "a231", "a232", "a233", "a234", "a235", \ + "a236", "a237", "a238", "a239", "a240", "a241", "a242", "a243", "a244", \ + "a245", "a246", "a247", "a248", "a249", "a250", "a251", "a252", "a253", \ + "a254", "a255", \ "?ap0", "?ap1", "?fp0", "?fp1", "?dwlr" } #define PRINT_OPERAND(FILE, X, CODE) print_operand(FILE, X, CODE) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index e6a9ac60b57..b7fbbaf830b 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -51,13 +51,15 @@ (define_constants (EXECZ_REG 128) (SCC_REG 129) (FIRST_VGPR_REG 160) - (LAST_VGPR_REG 415)]) + (LAST_VGPR_REG 415) + (FIRST_AVGPR_REG 416) + (LAST_AVGPR_REG 671)]) (define_constants [(SP_REGNUM 16) (LR_REGNUM 18) - (AP_REGNUM 416) - (FP_REGNUM 418)]) + (AP_REGNUM 672) + (FP_REGNUM 674)]) (define_c_enum "unspecv" [ UNSPECV_PROLOGUE_USE @@ -171,6 +173,11 @@ (define_c_enum "unspec" [ ; vdst: vgpr0-255 ; sdst: sgpr0-103/vcc/tba/tma/ttmp0-11 ; +; vop3p_mai - vector, three inputs, one vector output +; vsrc0,vsrc1,vsrc2: inline constant -16 to -64, fp inline immediate, +; (acc or arch) vgpr0-255 +; vdst: (acc or arch) vgpr0-255 +; ; vop_sdwa - second dword for vop1/vop2/vopc for specifying sub-dword address ; src0: vgpr0-255 ; dst_sel: BYTE_0-3, WORD_0-1, DWORD @@ -229,7 +236,8 @@ (define_c_enum "unspec" [ (define_attr "type" "unknown,sop1,sop2,sopk,sopc,sopp,smem,ds,vop2,vop1,vopc, - vop3a,vop3b,vop_sdwa,vop_dpp,mubuf,mtbuf,flat,mult,vmult" + vop3a,vop3b,vop3p_mai,vop_sdwa,vop_dpp,mubuf,mtbuf,flat,mult, + vmult" (const_string "unknown")) ; Set if instruction is executed in scalar or vector unit @@ -237,7 +245,7 @@ (define_attr "type" (define_attr "unit" "unknown,scalar,vector" (cond [(eq_attr "type" "sop1,sop2,sopk,sopc,sopp,smem,mult") (const_string "scalar") - (eq_attr "type" "vop2,vop1,vopc,vop3a,vop3b,ds, + (eq_attr "type" "vop2,vop1,vopc,vop3a,vop3b,ds,vop3p_mai, vop_sdwa,vop_dpp,flat,vmult") (const_string "vector")] (const_string "unknown"))) @@ -284,7 +292,7 @@ (define_attr "length" "" ; Disable alternatives that only apply to specific ISA variants. -(define_attr "gcn_version" "gcn3,gcn5" (const_string "gcn3")) +(define_attr "gcn_version" "gcn3,gcn5,cdna2" (const_string "gcn3")) (define_attr "rdna" "any,no,yes" (const_string "any")) (define_attr "enabled" "" @@ -297,6 +305,9 @@ (define_attr "enabled" "" (eq_attr "gcn_version" "gcn3") (const_int 1) (and (eq_attr "gcn_version" "gcn5") (ne (symbol_ref "TARGET_GCN5_PLUS") (const_int 0))) + (const_int 1) + (and (eq_attr "gcn_version" "cdna2") + (ne (symbol_ref "TARGET_CDNA2_PLUS") (const_int 0))) (const_int 1)] (const_int 0))) @@ -552,25 +563,32 @@ (define_insn "*mov_insn" [(set (match_operand:SISF 0 "nonimmediate_operand") (match_operand:SISF 1 "gcn_load_operand"))] "" - {@ [cons: =0, 1; attrs: type, exec, length] - [SD ,SSA ;sop1 ,* ,4 ] s_mov_b32\t%0, %1 - [SD ,J ;sopk ,* ,4 ] s_movk_i32\t%0, %1 - [SD ,B ;sop1 ,* ,8 ] s_mov_b32\t%0, %1 - [SD ,RB ;smem ,* ,12] s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0) - [RB ,Sm ;smem ,* ,12] s_buffer_store%s1\t%1, s[0:3], %0 - [Sm ,RS ;smem ,* ,12] s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0) - [RS ,Sm ;smem ,* ,12] s_store_dword\t%1, %A0 - [v ,v ;vop1 ,* ,4 ] v_mov_b32\t%0, %1 - [Sg ,v ;vop3a,none,8 ] v_readlane_b32\t%0, %1, 0 - [v ,Sv ;vop3a,none,8 ] v_writelane_b32\t%0, %1, 0 - [v ,RF ;flat ,* ,12] flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0 - [RF ,v ;flat ,* ,12] flat_store_dword\t%A0, %1%O0%g0 - [v ,B ;vop1 ,* ,8 ] v_mov_b32\t%0, %1 - [RLRG,v ;ds ,* ,12] ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) - [v ,RLRG;ds ,* ,12] ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) - [SD ,Y ;sop1 ,* ,8 ] s_mov_b32\t%0, %1 - [v ,RM ;flat ,* ,12] global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) - [RM ,v ;flat ,* ,12] global_store_dword\t%A0, %1%O0%g0 + {@ [cons: =0, 1; attrs: type, exec, length, gcn_version] + [SD ,SSA ;sop1 ,* ,4 ,* ] s_mov_b32\t%0, %1 + [SD ,J ;sopk ,* ,4 ,* ] s_movk_i32\t%0, %1 + [SD ,B ;sop1 ,* ,8 ,* ] s_mov_b32\t%0, %1 + [SD ,RB ;smem ,* ,12,* ] s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0) + [RB ,Sm ;smem ,* ,12,* ] s_buffer_store%s1\t%1, s[0:3], %0 + [Sm ,RS ;smem ,* ,12,* ] s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0) + [RS ,Sm ;smem ,* ,12,* ] s_store_dword\t%1, %A0 + [v ,v ;vop1 ,* ,4 ,* ] v_mov_b32\t%0, %1 + [Sg ,v ;vop3a,none,8 ,* ] v_readlane_b32\t%0, %1, 0 + [v ,Sv ;vop3a,none,8 ,* ] v_writelane_b32\t%0, %1, 0 + [v ,^a ;vop3p_mai,*,8,* ] v_accvgpr_read_b32\t%0, %1 + [a ,v ;vop3p_mai,*,8,* ] v_accvgpr_write_b32\t%0, %1 + [a ,a ;vop1 ,* ,4,cdna2] v_accvgpr_mov_b32\t%0, %1 + [v ,RF ;flat ,* ,12,* ] flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0 + [^a ,RF ;flat ,* ,12,cdna2] ^ + [RF ,v ;flat ,* ,12,* ] flat_store_dword\t%A0, %1%O0%g0 + [RF ,a ;flat ,* ,12,cdna2] ^ + [v ,B ;vop1 ,* ,8 ,* ] v_mov_b32\t%0, %1 + [RLRG,v ;ds ,* ,12,* ] ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) + [v ,RLRG;ds ,* ,12,* ] ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) + [SD ,Y ;sop1 ,* ,8 ,* ] s_mov_b32\t%0, %1 + [v ,RM ;flat ,* ,12,* ] global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + [^a ,RM ;flat ,* ,12,cdna2] ^ + [RM ,v ;flat ,* ,12,* ] global_store_dword\t%A0, %1%O0%g0 + [RM ,a ;flat ,* ,12,cdna2] ^ }) ; 8/16bit move pattern @@ -580,20 +598,27 @@ (define_insn "*mov_insn" [(set (match_operand:QIHI 0 "nonimmediate_operand") (match_operand:QIHI 1 "gcn_load_operand"))] "gcn_valid_move_p (mode, operands[0], operands[1])" - {@ [cons: =0, 1; attrs: type, exec, length] - [SD ,SSA ;sop1 ,* ,4 ] s_mov_b32\t%0, %1 - [SD ,J ;sopk ,* ,4 ] s_movk_i32\t%0, %1 - [SD ,B ;sop1 ,* ,8 ] s_mov_b32\t%0, %1 - [v ,v ;vop1 ,* ,4 ] v_mov_b32\t%0, %1 - [Sg ,v ;vop3a,none,4 ] v_readlane_b32\t%0, %1, 0 - [v ,Sv ;vop3a,none,4 ] v_writelane_b32\t%0, %1, 0 - [v ,RF ;flat ,* ,12] flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0 - [RF ,v ;flat ,* ,12] flat_store%s0\t%A0, %1%O0%g0 - [v ,B ;vop1 ,* ,8 ] v_mov_b32\t%0, %1 - [RLRG,v ;ds ,* ,12] ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) - [v ,RLRG;ds ,* ,12] ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) - [v ,RM ;flat ,* ,12] global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) - [RM ,v ;flat ,* ,12] global_store%s0\t%A0, %1%O0%g0 + {@ [cons: =0, 1; attrs: type, exec, length, gcn_version] + [SD ,SSA ;sop1 ,* ,4 ,* ] s_mov_b32\t%0, %1 + [SD ,J ;sopk ,* ,4 ,* ] s_movk_i32\t%0, %1 + [SD ,B ;sop1 ,* ,8 ,* ] s_mov_b32\t%0, %1 + [v ,v ;vop1 ,* ,4 ,* ] v_mov_b32\t%0, %1 + [Sg ,v ;vop3a,none,4 ,* ] v_readlane_b32\t%0, %1, 0 + [v ,Sv ;vop3a,none,4 ,* ] v_writelane_b32\t%0, %1, 0 + [v ,^a ;vop3p_mai,*,8,* ] v_accvgpr_read_b32\t%0, %1 + [a ,v ;vop3p_mai,*,8,* ] v_accvgpr_write_b32\t%0, %1 + [a ,a ;vop1 ,* ,8,cdna2] v_accvgpr_mov_b32\t%0, %1 + [v ,RF ;flat ,* ,12,* ] flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0 + [^a ,RF ;flat ,* ,12,cdna2] ^ + [RF ,v ;flat ,* ,12,* ] flat_store%s0\t%A0, %1%O0%g0 + [RF ,a ;flat ,* ,12,cdna2] ^ + [v ,B ;vop1 ,* ,8 ,* ] v_mov_b32\t%0, %1 + [RLRG,v ;ds ,* ,12,* ] ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) + [v ,RLRG;ds ,* ,12,* ] ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) + [v ,RM ;flat ,* ,12,* ] global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + [^a ,RM ;flat ,* ,12,cdna2] ^ + [RM ,v ;flat ,* ,12,* ] global_store%s0\t%A0, %1%O0%g0 + [RM ,a ;flat ,* ,12,cdna2] ^ }) ; 64bit move pattern @@ -602,22 +627,29 @@ (define_insn_and_split "*mov_insn" [(set (match_operand:DIDF 0 "nonimmediate_operand") (match_operand:DIDF 1 "general_operand"))] "GET_CODE(operands[1]) != SYMBOL_REF" - {@ [cons: =0, 1; attrs: type, length] - [SD ,SSA ;sop1 ,4 ] s_mov_b64\t%0, %1 - [SD ,C ;sop1 ,8 ] ^ - [SD ,DB ;mult ,* ] # - [RS ,Sm ;smem ,12] s_store_dwordx2\t%1, %A0 - [Sm ,RS ;smem ,12] s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0) - [v ,v ;vmult,* ] # - [v ,DB ;vmult,* ] # - [Sg ,v ;vmult,* ] # - [v ,Sv ;vmult,* ] # - [v ,RF ;flat ,12] flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0 - [RF ,v ;flat ,12] flat_store_dwordx2\t%A0, %1%O0%g0 - [RLRG,v ;ds ,12] ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) - [v ,RLRG;ds ,12] ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) - [v ,RM ;flat ,12] global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) - [RM ,v ;flat ,12] global_store_dwordx2\t%A0, %1%O0%g0 + {@ [cons: =0, 1; attrs: type, length, gcn_version] + [SD ,SSA ;sop1 ,4 ,* ] s_mov_b64\t%0, %1 + [SD ,C ;sop1 ,8 ,* ] ^ + [SD ,DB ;mult ,* ,* ] # + [RS ,Sm ;smem ,12,* ] s_store_dwordx2\t%1, %A0 + [Sm ,RS ;smem ,12,* ] s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0) + [v ,v ;vmult,* ,* ] # + [v ,DB ;vmult,* ,* ] # + [Sg ,v ;vmult,* ,* ] # + [v ,Sv ;vmult,* ,* ] # + [v ,^a ;vmult,* ,* ] # + [a ,v ;vmult,* ,* ] # + [a ,a ;vmult,* ,cdna2] # + [v ,RF ;flat ,12,* ] flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0 + [^a ,RF ;flat ,12,cdna2] ^ + [RF ,v ;flat ,12,* ] flat_store_dwordx2\t%A0, %1%O0%g0 + [RF ,a ;flat ,12,cdna2] ^ + [RLRG,v ;ds ,12,* ] ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) + [v ,RLRG;ds ,12,* ] ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) + [v ,RM ;flat ,12,* ] global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + [^a ,RM ;flat ,12,cdna2] ^ + [RM ,v ;flat ,12,* ] global_store_dwordx2\t%A0, %1%O0%g0 + [RM ,a ;flat ,12,cdna2] ^ } "reload_completed && ((!MEM_P (operands[0]) && !MEM_P (operands[1]) @@ -655,19 +687,26 @@ (define_insn_and_split "*movti_insn" [(set (match_operand:TI 0 "nonimmediate_operand") (match_operand:TI 1 "general_operand" ))] "" - {@ [cons: =0, 1; attrs: type, delayeduse, length] - [SD,SSB;mult ,* ,* ] # - [RS,Sm ;smem ,* ,12] s_store_dwordx4\t%1, %A0 - [Sm,RS ;smem ,yes,12] s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0) - [RF,v ;flat ,* ,12] flat_store_dwordx4\t%A0, %1%O0%g0 - [v ,RF ;flat ,* ,12] flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0 - [v ,v ;vmult,* ,* ] # - [v ,Sv ;vmult,* ,* ] # - [SD,v ;vmult,* ,* ] # - [RM,v ;flat ,yes,12] global_store_dwordx4\t%A0, %1%O0%g0 - [v ,RM ;flat ,* ,12] global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) - [RL,v ;ds ,* ,12] ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) - [v ,RL ;ds ,* ,12] ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) + {@ [cons: =0, 1; attrs: type, delayeduse, length, gcn_version] + [SD,SSB;mult ,* ,* ,* ] # + [RS,Sm ;smem ,* ,12,* ] s_store_dwordx4\t%1, %A0 + [Sm,RS ;smem ,yes,12,* ] s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0) + [RF,v ;flat ,* ,12,* ] flat_store_dwordx4\t%A0, %1%O0%g0 + [RF,a ;flat ,* ,12,cdna2] ^ + [v ,RF ;flat ,* ,12,* ] flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0 + [^a,RF ;flat ,* ,12,cdna2] ^ + [v ,v ;vmult,* ,* ,* ] # + [v ,Sv ;vmult,* ,* ,* ] # + [SD,v ;vmult,* ,* ,* ] # + [RM,v ;flat ,yes,12,* ] global_store_dwordx4\t%A0, %1%O0%g0 + [RM,a ;flat ,yes,12,cdna2] ^ + [v ,RM ;flat ,* ,12,* ] global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + [^a,RM ;flat ,* ,12,cdna2] ^ + [RL,v ;ds ,* ,12,* ] ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) + [v ,RL ;ds ,* ,12,* ] ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) + [v ,^a ;vmult,* ,* ,* ] # + [a ,v ;vmult,* ,* ,* ] # + [a ,a ;vmult,* ,* ,cdna2] # } "reload_completed && REG_P (operands[0]) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 0e224ca8f65..ee8bde38150 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -471,6 +471,26 @@ copy_early_debug_info (const char *infile, const char *outfile) return true; } +/* CDNA2 devices have twice as many VGPRs compared to older devices, + but the AVGPRS are allocated from the same pool. */ + +static int +isa_has_combined_avgprs (int isa) +{ + switch (isa) + { + case EF_AMDGPU_MACH_AMDGCN_GFX803: + case EF_AMDGPU_MACH_AMDGCN_GFX900: + case EF_AMDGPU_MACH_AMDGCN_GFX906: + case EF_AMDGPU_MACH_AMDGCN_GFX908: + case EF_AMDGPU_MACH_AMDGCN_GFX1030: + return false; + case EF_AMDGPU_MACH_AMDGCN_GFX90a: + return true; + } + fatal_error (input_location, "unhandled ISA in isa_has_combined_avgprs"); +} + /* Parse an input assembler file, extract the offload tables etc., and output (1) the assembler code, minus the tables (which can contain problematic relocations), and (2) a C file with the offload tables @@ -496,6 +516,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) { int sgpr_count; int vgpr_count; + int avgpr_count; char *kernel_name; } regcount = { -1, -1, NULL }; @@ -543,6 +564,12 @@ process_asm (FILE *in, FILE *out, FILE *cfile) gcc_assert (regcount.kernel_name); break; } + else if (sscanf (buf, " .agpr_count: %d\n", + ®count.avgpr_count) == 1) + { + gcc_assert (regcount.kernel_name); + break; + } break; } @@ -685,6 +712,8 @@ process_asm (FILE *in, FILE *out, FILE *cfile) { sgpr_count = regcounts[j].sgpr_count; vgpr_count = regcounts[j].vgpr_count; + if (isa_has_combined_avgprs (elf_arch)) + vgpr_count += regcounts[j].avgpr_count; break; } diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md index 5554a06b63b..d3bf83de166 100644 --- a/gcc/config/gcn/predicates.md +++ b/gcc/config/gcn/predicates.md @@ -70,6 +70,30 @@ (define_predicate "gcn_vgpr_register_operand" return VGPR_REGNO_P (REGNO (op)) || REGNO (op) >= FIRST_PSEUDO_REGISTER; }) +(define_predicate "gcn_avgpr_register_operand" + (match_operand 0 "register_operand") + { + if (GET_CODE (op) == SUBREG) + op = SUBREG_REG (op); + + if (!REG_P (op)) + return false; + + return AVGPR_REGNO_P (REGNO (op)) || REGNO (op) >= FIRST_PSEUDO_REGISTER; +}) + +(define_predicate "gcn_avgpr_hard_register_operand" + (match_operand 0 "register_operand") + { + if (GET_CODE (op) == SUBREG) + op = SUBREG_REG (op); + + if (!REG_P (op)) + return false; + + return AVGPR_REGNO_P (REGNO (op)); +}) + (define_predicate "gcn_inline_immediate_operand" (match_code "const_int,const_double,const_vector") { diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5d86152e5dd..e01cdcbe22c 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -2010,6 +2010,9 @@ Any @code{symbol_ref} or @code{label_ref} @item v VGPR register +@item a +Accelerator VGPR register (CDNA1 onwards) + @item Sg SGPR register diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-mem-double.c b/gcc/testsuite/gcc.target/gcn/avgpr-mem-double.c new file mode 100644 index 00000000000..ce089fb198d --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-mem-double.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx90a -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[068]" } } */ +/* { dg-final { scan-assembler {load[^\n]*a[0-9[]} } } */ +/* { dg-final { scan-assembler {store[^\n]*a[0-9[]} } } */ + +#define TYPE double + +#include "avgpr-mem-int.c" diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-mem-int.c b/gcc/testsuite/gcc.target/gcn/avgpr-mem-int.c new file mode 100644 index 00000000000..03d81486466 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-mem-int.c @@ -0,0 +1,116 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx90a -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[068]" } } */ +/* { dg-final { scan-assembler {load[^\n]*a[0-9[]} } } */ +/* { dg-final { scan-assembler {store[^\n]*a[0-9[]} } } */ + +#ifndef TYPE +#define TYPE int +#endif + +TYPE a[50]; + +int f() +{ + __asm__ volatile ("; fake -> %0" :: "va"(a[0])); + __asm__ volatile ("; fake -> %0" :: "va"(a[1])); + __asm__ volatile ("; fake -> %0" :: "va"(a[2])); + __asm__ volatile ("; fake -> %0" :: "va"(a[3])); + __asm__ volatile ("; fake -> %0" :: "va"(a[4])); + __asm__ volatile ("; fake -> %0" :: "va"(a[5])); + __asm__ volatile ("; fake -> %0" :: "va"(a[6])); + __asm__ volatile ("; fake -> %0" :: "va"(a[7])); + __asm__ volatile ("; fake -> %0" :: "va"(a[8])); + __asm__ volatile ("; fake -> %0" :: "va"(a[9])); + __asm__ volatile ("; fake -> %0" :: "va"(a[10])); + __asm__ volatile ("; fake -> %0" :: "va"(a[11])); + __asm__ volatile ("; fake -> %0" :: "va"(a[12])); + __asm__ volatile ("; fake -> %0" :: "va"(a[13])); + __asm__ volatile ("; fake -> %0" :: "va"(a[14])); + __asm__ volatile ("; fake -> %0" :: "va"(a[15])); + __asm__ volatile ("; fake -> %0" :: "va"(a[16])); + __asm__ volatile ("; fake -> %0" :: "va"(a[17])); + __asm__ volatile ("; fake -> %0" :: "va"(a[18])); + __asm__ volatile ("; fake -> %0" :: "va"(a[19])); + __asm__ volatile ("; fake -> %0" :: "va"(a[20])); + __asm__ volatile ("; fake -> %0" :: "va"(a[21])); + __asm__ volatile ("; fake -> %0" :: "va"(a[22])); + __asm__ volatile ("; fake -> %0" :: "va"(a[23])); + __asm__ volatile ("; fake -> %0" :: "va"(a[24])); + __asm__ volatile ("; fake -> %0" :: "va"(a[25])); + __asm__ volatile ("; fake -> %0" :: "va"(a[26])); + __asm__ volatile ("; fake -> %0" :: "va"(a[27])); + __asm__ volatile ("; fake -> %0" :: "va"(a[28])); + __asm__ volatile ("; fake -> %0" :: "va"(a[29])); + __asm__ volatile ("; fake -> %0" :: "va"(a[30])); + __asm__ volatile ("; fake -> %0" :: "va"(a[31])); + __asm__ volatile ("; fake -> %0" :: "va"(a[32])); + __asm__ volatile ("; fake -> %0" :: "va"(a[33])); + __asm__ volatile ("; fake -> %0" :: "va"(a[34])); + __asm__ volatile ("; fake -> %0" :: "va"(a[35])); + __asm__ volatile ("; fake -> %0" :: "va"(a[36])); + __asm__ volatile ("; fake -> %0" :: "va"(a[37])); + __asm__ volatile ("; fake -> %0" :: "va"(a[38])); + __asm__ volatile ("; fake -> %0" :: "va"(a[39])); + __asm__ volatile ("; fake -> %0" :: "va"(a[40])); + __asm__ volatile ("; fake -> %0" :: "va"(a[41])); + __asm__ volatile ("; fake -> %0" :: "va"(a[42])); + __asm__ volatile ("; fake -> %0" :: "va"(a[43])); + __asm__ volatile ("; fake -> %0" :: "va"(a[44])); + __asm__ volatile ("; fake -> %0" :: "va"(a[45])); + __asm__ volatile ("; fake -> %0" :: "va"(a[46])); + __asm__ volatile ("; fake -> %0" :: "va"(a[47])); + __asm__ volatile ("; fake -> %0" :: "va"(a[48])); + __asm__ volatile ("; fake -> %0" :: "va"(a[49])); + + __asm__ volatile ("; fake <- %0" : "+va"(a[0])); + __asm__ volatile ("; fake <- %0" : "+va"(a[1])); + __asm__ volatile ("; fake <- %0" : "+va"(a[2])); + __asm__ volatile ("; fake <- %0" : "+va"(a[3])); + __asm__ volatile ("; fake <- %0" : "+va"(a[4])); + __asm__ volatile ("; fake <- %0" : "+va"(a[5])); + __asm__ volatile ("; fake <- %0" : "+va"(a[6])); + __asm__ volatile ("; fake <- %0" : "+va"(a[7])); + __asm__ volatile ("; fake <- %0" : "+va"(a[8])); + __asm__ volatile ("; fake <- %0" : "+va"(a[9])); + __asm__ volatile ("; fake <- %0" : "+va"(a[10])); + __asm__ volatile ("; fake <- %0" : "+va"(a[11])); + __asm__ volatile ("; fake <- %0" : "+va"(a[12])); + __asm__ volatile ("; fake <- %0" : "+va"(a[13])); + __asm__ volatile ("; fake <- %0" : "+va"(a[14])); + __asm__ volatile ("; fake <- %0" : "+va"(a[15])); + __asm__ volatile ("; fake <- %0" : "+va"(a[16])); + __asm__ volatile ("; fake <- %0" : "+va"(a[17])); + __asm__ volatile ("; fake <- %0" : "+va"(a[18])); + __asm__ volatile ("; fake <- %0" : "+va"(a[19])); + __asm__ volatile ("; fake <- %0" : "+va"(a[20])); + __asm__ volatile ("; fake <- %0" : "+va"(a[21])); + __asm__ volatile ("; fake <- %0" : "+va"(a[22])); + __asm__ volatile ("; fake <- %0" : "+va"(a[23])); + __asm__ volatile ("; fake <- %0" : "+va"(a[24])); + __asm__ volatile ("; fake <- %0" : "+va"(a[25])); + __asm__ volatile ("; fake <- %0" : "+va"(a[26])); + __asm__ volatile ("; fake <- %0" : "+va"(a[27])); + __asm__ volatile ("; fake <- %0" : "+va"(a[28])); + __asm__ volatile ("; fake <- %0" : "+va"(a[29])); + __asm__ volatile ("; fake <- %0" : "+va"(a[30])); + __asm__ volatile ("; fake <- %0" : "+va"(a[31])); + __asm__ volatile ("; fake <- %0" : "+va"(a[32])); + __asm__ volatile ("; fake <- %0" : "+va"(a[33])); + __asm__ volatile ("; fake <- %0" : "+va"(a[34])); + __asm__ volatile ("; fake <- %0" : "+va"(a[35])); + __asm__ volatile ("; fake <- %0" : "+va"(a[36])); + __asm__ volatile ("; fake <- %0" : "+va"(a[37])); + __asm__ volatile ("; fake <- %0" : "+va"(a[38])); + __asm__ volatile ("; fake <- %0" : "+va"(a[39])); + __asm__ volatile ("; fake <- %0" : "+va"(a[40])); + __asm__ volatile ("; fake <- %0" : "+va"(a[41])); + __asm__ volatile ("; fake <- %0" : "+va"(a[42])); + __asm__ volatile ("; fake <- %0" : "+va"(a[43])); + __asm__ volatile ("; fake <- %0" : "+va"(a[44])); + __asm__ volatile ("; fake <- %0" : "+va"(a[45])); + __asm__ volatile ("; fake <- %0" : "+va"(a[46])); + __asm__ volatile ("; fake <- %0" : "+va"(a[47])); + __asm__ volatile ("; fake <- %0" : "+va"(a[48])); + __asm__ volatile ("; fake <- %0" : "+va"(a[49])); +} diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-mem-long.c b/gcc/testsuite/gcc.target/gcn/avgpr-mem-long.c new file mode 100644 index 00000000000..dcfb483f3f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-mem-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx90a -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[068]" } } */ +/* { dg-final { scan-assembler {load[^\n]*a[0-9[]} } } */ +/* { dg-final { scan-assembler {store[^\n]*a[0-9[]} } } */ + +#define TYPE long + +#include "avgpr-mem-int.c" diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-mem-short.c b/gcc/testsuite/gcc.target/gcn/avgpr-mem-short.c new file mode 100644 index 00000000000..91cc14ef181 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-mem-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx90a -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[068]" } } */ +/* { dg-final { scan-assembler {load[^\n]*a[0-9[]} } } */ +/* { dg-final { scan-assembler {store[^\n]*a[0-9[]} } } */ + +#define TYPE short + +#include "avgpr-mem-int.c" diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-spill-double.c b/gcc/testsuite/gcc.target/gcn/avgpr-spill-double.c new file mode 100644 index 00000000000..3e9996d3d10 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-spill-double.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx908 -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[06]" } } */ +/* { dg-final { scan-assembler "accvgpr" } } */ + +#define TYPE double + +#include "avgpr-spill-int.c" diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-spill-int.c b/gcc/testsuite/gcc.target/gcn/avgpr-spill-int.c new file mode 100644 index 00000000000..0b64c8ec176 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-spill-int.c @@ -0,0 +1,115 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx908 -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[06]" } } */ +/* { dg-final { scan-assembler "accvgpr" } } */ + +#ifndef TYPE +#define TYPE int +#endif + +TYPE a[50]; + +int f() +{ + __asm__ volatile ("; fake <- %0" : "=v"(a[0])); + __asm__ volatile ("; fake <- %0" : "=v"(a[1])); + __asm__ volatile ("; fake <- %0" : "=v"(a[2])); + __asm__ volatile ("; fake <- %0" : "=v"(a[3])); + __asm__ volatile ("; fake <- %0" : "=v"(a[4])); + __asm__ volatile ("; fake <- %0" : "=v"(a[5])); + __asm__ volatile ("; fake <- %0" : "=v"(a[6])); + __asm__ volatile ("; fake <- %0" : "=v"(a[7])); + __asm__ volatile ("; fake <- %0" : "=v"(a[8])); + __asm__ volatile ("; fake <- %0" : "=v"(a[9])); + __asm__ volatile ("; fake <- %0" : "=v"(a[10])); + __asm__ volatile ("; fake <- %0" : "=v"(a[11])); + __asm__ volatile ("; fake <- %0" : "=v"(a[12])); + __asm__ volatile ("; fake <- %0" : "=v"(a[13])); + __asm__ volatile ("; fake <- %0" : "=v"(a[14])); + __asm__ volatile ("; fake <- %0" : "=v"(a[15])); + __asm__ volatile ("; fake <- %0" : "=v"(a[16])); + __asm__ volatile ("; fake <- %0" : "=v"(a[17])); + __asm__ volatile ("; fake <- %0" : "=v"(a[18])); + __asm__ volatile ("; fake <- %0" : "=v"(a[19])); + __asm__ volatile ("; fake <- %0" : "=v"(a[20])); + __asm__ volatile ("; fake <- %0" : "=v"(a[21])); + __asm__ volatile ("; fake <- %0" : "=v"(a[22])); + __asm__ volatile ("; fake <- %0" : "=v"(a[23])); + __asm__ volatile ("; fake <- %0" : "=v"(a[24])); + __asm__ volatile ("; fake <- %0" : "=v"(a[25])); + __asm__ volatile ("; fake <- %0" : "=v"(a[26])); + __asm__ volatile ("; fake <- %0" : "=v"(a[27])); + __asm__ volatile ("; fake <- %0" : "=v"(a[28])); + __asm__ volatile ("; fake <- %0" : "=v"(a[29])); + __asm__ volatile ("; fake <- %0" : "=v"(a[30])); + __asm__ volatile ("; fake <- %0" : "=v"(a[31])); + __asm__ volatile ("; fake <- %0" : "=v"(a[32])); + __asm__ volatile ("; fake <- %0" : "=v"(a[33])); + __asm__ volatile ("; fake <- %0" : "=v"(a[34])); + __asm__ volatile ("; fake <- %0" : "=v"(a[35])); + __asm__ volatile ("; fake <- %0" : "=v"(a[36])); + __asm__ volatile ("; fake <- %0" : "=v"(a[37])); + __asm__ volatile ("; fake <- %0" : "=v"(a[38])); + __asm__ volatile ("; fake <- %0" : "=v"(a[39])); + __asm__ volatile ("; fake <- %0" : "=v"(a[40])); + __asm__ volatile ("; fake <- %0" : "=v"(a[41])); + __asm__ volatile ("; fake <- %0" : "=v"(a[42])); + __asm__ volatile ("; fake <- %0" : "=v"(a[43])); + __asm__ volatile ("; fake <- %0" : "=v"(a[44])); + __asm__ volatile ("; fake <- %0" : "=v"(a[45])); + __asm__ volatile ("; fake <- %0" : "=v"(a[46])); + __asm__ volatile ("; fake <- %0" : "=v"(a[47])); + __asm__ volatile ("; fake <- %0" : "=v"(a[48])); + __asm__ volatile ("; fake <- %0" : "=v"(a[49])); + + __asm__ volatile ("; fake -> %0" :: "v"(a[0])); + __asm__ volatile ("; fake -> %0" :: "v"(a[1])); + __asm__ volatile ("; fake -> %0" :: "v"(a[2])); + __asm__ volatile ("; fake -> %0" :: "v"(a[3])); + __asm__ volatile ("; fake -> %0" :: "v"(a[4])); + __asm__ volatile ("; fake -> %0" :: "v"(a[5])); + __asm__ volatile ("; fake -> %0" :: "v"(a[6])); + __asm__ volatile ("; fake -> %0" :: "v"(a[7])); + __asm__ volatile ("; fake -> %0" :: "v"(a[8])); + __asm__ volatile ("; fake -> %0" :: "v"(a[9])); + __asm__ volatile ("; fake -> %0" :: "v"(a[10])); + __asm__ volatile ("; fake -> %0" :: "v"(a[11])); + __asm__ volatile ("; fake -> %0" :: "v"(a[12])); + __asm__ volatile ("; fake -> %0" :: "v"(a[13])); + __asm__ volatile ("; fake -> %0" :: "v"(a[14])); + __asm__ volatile ("; fake -> %0" :: "v"(a[15])); + __asm__ volatile ("; fake -> %0" :: "v"(a[16])); + __asm__ volatile ("; fake -> %0" :: "v"(a[17])); + __asm__ volatile ("; fake -> %0" :: "v"(a[18])); + __asm__ volatile ("; fake -> %0" :: "v"(a[19])); + __asm__ volatile ("; fake -> %0" :: "v"(a[20])); + __asm__ volatile ("; fake -> %0" :: "v"(a[21])); + __asm__ volatile ("; fake -> %0" :: "v"(a[22])); + __asm__ volatile ("; fake -> %0" :: "v"(a[23])); + __asm__ volatile ("; fake -> %0" :: "v"(a[24])); + __asm__ volatile ("; fake -> %0" :: "v"(a[25])); + __asm__ volatile ("; fake -> %0" :: "v"(a[26])); + __asm__ volatile ("; fake -> %0" :: "v"(a[27])); + __asm__ volatile ("; fake -> %0" :: "v"(a[28])); + __asm__ volatile ("; fake -> %0" :: "v"(a[29])); + __asm__ volatile ("; fake -> %0" :: "v"(a[30])); + __asm__ volatile ("; fake -> %0" :: "v"(a[31])); + __asm__ volatile ("; fake -> %0" :: "v"(a[32])); + __asm__ volatile ("; fake -> %0" :: "v"(a[33])); + __asm__ volatile ("; fake -> %0" :: "v"(a[34])); + __asm__ volatile ("; fake -> %0" :: "v"(a[35])); + __asm__ volatile ("; fake -> %0" :: "v"(a[36])); + __asm__ volatile ("; fake -> %0" :: "v"(a[37])); + __asm__ volatile ("; fake -> %0" :: "v"(a[38])); + __asm__ volatile ("; fake -> %0" :: "v"(a[39])); + __asm__ volatile ("; fake -> %0" :: "v"(a[40])); + __asm__ volatile ("; fake -> %0" :: "v"(a[41])); + __asm__ volatile ("; fake -> %0" :: "v"(a[42])); + __asm__ volatile ("; fake -> %0" :: "v"(a[43])); + __asm__ volatile ("; fake -> %0" :: "v"(a[44])); + __asm__ volatile ("; fake -> %0" :: "v"(a[45])); + __asm__ volatile ("; fake -> %0" :: "v"(a[46])); + __asm__ volatile ("; fake -> %0" :: "v"(a[47])); + __asm__ volatile ("; fake -> %0" :: "v"(a[48])); + __asm__ volatile ("; fake -> %0" :: "v"(a[49])); +} diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-spill-long.c b/gcc/testsuite/gcc.target/gcn/avgpr-spill-long.c new file mode 100644 index 00000000000..516890de14c --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-spill-long.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx908 -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[06]" } } */ +/* { dg-final { scan-assembler "accvgpr" } } */ + +#define TYPE long + +#include "avgpr-spill-int.c" diff --git a/gcc/testsuite/gcc.target/gcn/avgpr-spill-short.c b/gcc/testsuite/gcc.target/gcn/avgpr-spill-short.c new file mode 100644 index 00000000000..1e556840e0f --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/avgpr-spill-short.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=gfx908 -O1" } */ +/* { dg-skip-if "incompatible ISA" { *-*-* } { "-march=gfx90[06]" } } */ +/* { dg-final { scan-assembler "accvgpr" } } */ + +#define TYPE short + +#include "avgpr-spill-int.c" diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 7e7e2d6edfe..8aabbd99881 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1702,6 +1702,25 @@ isa_code(const char *isa) { return -1; } +/* CDNA2 devices have twice as many VGPRs compared to older devices. */ + +static int +max_isa_vgprs (int isa) +{ + switch (isa) + { + case EF_AMDGPU_MACH_AMDGCN_GFX803: + case EF_AMDGPU_MACH_AMDGCN_GFX900: + case EF_AMDGPU_MACH_AMDGCN_GFX906: + case EF_AMDGPU_MACH_AMDGCN_GFX908: + case EF_AMDGPU_MACH_AMDGCN_GFX1030: + return 256; + case EF_AMDGPU_MACH_AMDGCN_GFX90a: + return 512; + } + GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs"); +} + /* }}} */ /* {{{ Run */ @@ -2143,6 +2162,7 @@ run_kernel (struct kernel_info *kernel, void *vars, struct GOMP_kernel_launch_attributes *kla, struct goacc_asyncqueue *aq, bool module_locked) { + struct agent_info *agent = kernel->agent; GCN_DEBUG ("SGPRs: %d, VGPRs: %d\n", kernel->description->sgpr_count, kernel->description->vpgr_count); @@ -2150,8 +2170,9 @@ run_kernel (struct kernel_info *kernel, void *vars, VGPRs available to run the kernels together. */ if (kla->ndim == 3 && kernel->description->vpgr_count > 0) { + int max_vgprs = max_isa_vgprs (agent->device_isa); int granulated_vgprs = (kernel->description->vpgr_count + 3) & ~3; - int max_threads = (256 / granulated_vgprs) * 4; + int max_threads = (max_vgprs / granulated_vgprs) * 4; if (kla->gdims[2] > max_threads) { GCN_WARNING ("Too many VGPRs required to support %d threads/workers" @@ -2188,7 +2209,6 @@ run_kernel (struct kernel_info *kernel, void *vars, DEBUG_PRINT ("]\n"); DEBUG_FLUSH (); - struct agent_info *agent = kernel->agent; if (!module_locked && pthread_rwlock_rdlock (&agent->module_rwlock)) GOMP_PLUGIN_fatal ("Unable to read-lock a GCN agent rwlock");