From patchwork Wed Jan 31 15:59:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Ghiti X-Patchwork-Id: 194830 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1988238dyb; Wed, 31 Jan 2024 08:03:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IEFqr/a5t+FL+4KqFtBWcB+kweP2MA0VcYjK/TcYhlUXt+7S8gS4pGhvNLNSvnTgU/eo70x X-Received: by 2002:a17:907:d50b:b0:a36:85b9:cd1b with SMTP id wb11-20020a170907d50b00b00a3685b9cd1bmr1550101ejc.25.1706716988532; Wed, 31 Jan 2024 08:03:08 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706716988; cv=pass; d=google.com; s=arc-20160816; b=Xm/Y2dJq12eUIra+BTIh0J32HOS0EqYh0eMw/xa7Tf2niA/+QchsB1LyekNRksvcB7 COjbx2j48Yw/tqAJzFlAxedCdLu2fpDeV4WPpWQnlaDja9BnjlzyA5Sw0g83AgEH3NLP Ytp26wEl4I+oaVbmZrl2GRijHP5B+UNayJu+hda5lRkJ8rs3OWVZ1PTzld9jKaRfdsPQ cRLmXplYyyYEI8G3JbUpQ2EhowRm3an+4V+Exgund9VuUEjiH0RDRWQ19a4t6itM5wDP o6ZHMxrWxyBpg8JrO7rUyp82VM6qyAx2VcLnvVKJWO80dRa2vny1nd3iA4pbGwJcokrf ZV1w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=roEDJI/JlVTVXh0TwtWic11MC5sMgNCSBaun+agzxOw=; fh=1VT2aQTCGy+r1OM/Pn+6LWgwu7yJfNuAHlXOwVeltDo=; b=zG30d8ZAxOznyAA9Ns5u8a1VqBCVm6is3RKAhrztGbIeQQwRszXehxTj0piEbFttD+ zszVsrnS93GcKxXgI/GuTGRz+tDxJUdZSpXHhsVfLkPDMLDTGgxLE+HSwd1YnE255Q44 w4fSkLqXV1VHHtPvFgxhORm09HaCpXeMJBN8Weku0bNllsRqD6JO3G3RaoHUC22Y/0B3 UOc/nGkk6plyYXsUqhjLBY5JDgNkkAsEycugPSR9Q51ANNQ/dM3TzNqhPtAbTQ2iiwQ5 2tZPXIVXC3shP5nklZ3pZQqP+Y74UMyDc6mUt1XCSFLUjnWK8jY2ZXpvg6J/wyaIHGLn qxjg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=fpoj8ZU4; arc=pass (i=1 spf=pass spfdomain=rivosinc.com dkim=pass dkdomain=rivosinc-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-46753-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46753-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=1; AJvYcCWTX2EIKCBT0ufoeVmwmA8+VypKHUkt34Ogn8jFn3qbQiAYCX+tmE2ZfdgqyiIShQymVAxjm45cyIBvbP6mvw3+QBtJYw== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id ca26-20020a170906a3da00b00a35a0186aebsi3400790ejb.65.2024.01.31.08.03.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 08:03:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46753-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=fpoj8ZU4; arc=pass (i=1 spf=pass spfdomain=rivosinc.com dkim=pass dkdomain=rivosinc-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-46753-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46753-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id EA7631F24C58 for ; Wed, 31 Jan 2024 16:03:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3D98184A2D; Wed, 31 Jan 2024 16:02:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="fpoj8ZU4" Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41D4212AAF4 for ; Wed, 31 Jan 2024 16:02:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706716961; cv=none; b=brfeL4hXELfMym29YPQKLCXSAYTALhgCHw12/cRMgI4OyfWqrQ9zmIK/VEhXXLlN5AfjpvayyiycI4h2N/4516yrqBGkmBjSTf5O/iV+VLZv3wtQds5g4ofWY8t/xoDAS1MirUe/kqozx2n8yY2gvK+CZWZMo0D4+x8/drrte8k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706716961; c=relaxed/simple; bh=EA32hUBPidOmQsabp7fzUveOrPvgtMuXCwbWmVbMBj4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SjLIis2tOmvdEXXb/0qpe2QjOIsVzP6fA9hkuWZpZZLLvGbr+9Z9wGDxa4qj6q12Jegde7LYk41yD8LJCy9Pz9GXwE1+KaLFJFJQrWp5S8/huVh7uW10nQrjCcjqK9QqUIHqAaxyBO/GB2u4LNlmbk7VhF/cx6cJVqur7RK5eZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=fpoj8ZU4; arc=none smtp.client-ip=209.85.218.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a354408e6bfso178704566b.1 for ; Wed, 31 Jan 2024 08:02:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1706716957; x=1707321757; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=roEDJI/JlVTVXh0TwtWic11MC5sMgNCSBaun+agzxOw=; b=fpoj8ZU44ROhDqaudMMB+yIVpPZobACSoYJafiGr4nExs2cxi8HZcPhBpcBTxJ1BgS TuRX0Ct1j/FQWhH4Oh6yR9ZmqcvDY/auzv7CP107m1NEFVIGat8W+Z37bAHYAiAMeu7/ ngIDHEPXgX8cPB4uhUcQYTZxZqDXrwzL6yzVKzbOo/gk5mMuSe//GesOCvwXsSl3aC/M t3faUK8MYTh+didG6BZS8cppIvcxCNfL+LteAtzkpCQZPkPZe5FdjDUfgHmESCULZ6yj hACVNNsn+neBnrlWv0+W4INM1KEwcdR0/bNQJV5LPu4IzduiiZd+yPc4apCUI76OYSLQ KQFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706716957; x=1707321757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=roEDJI/JlVTVXh0TwtWic11MC5sMgNCSBaun+agzxOw=; b=r7COeHynpfVcV2BdkRdLb+fik8qO3ashnOvMMfxKAQ7alY2zKFY47i30un/lpTR6J0 J4idIRuArcxuk4JsEdgC4pOLLyP3QEJ0Dr9RkzSnpd4nhNxXhjpUhTlcRoqvjuOrQZY8 mb7yt1lVWv9dHyBbvkDjOgDCrFQSjEybpv6xpxJv19LqmihRUDzBCMncz767txTWdz+D Cle1Bedouvi9eDYx9fwHO+ZrKQXf4u8ynT3NmN1mOOpaWoKhxhI8PHp/jPQbPQAaSk4c H5fnPRC0mHKMAv2aX3k0eJu4XdVh36Rl/u7xIZ4v8Sbz7wIWg7pO2yYzcF1Ze0vYtRNj gNNw== X-Gm-Message-State: AOJu0Yzo4bwbW1ohaX4ph6F9o99AmXDy0KBoiOfStuj1k00ty/zisBVp dfLKFglHK70kqTssOZEVf0qjjQ0V1q47rodkdKWnfenL0nzFAdffyf7jp3XmtH8= X-Received: by 2002:a17:907:a642:b0:a33:8fed:b9d5 with SMTP id vu2-20020a170907a64200b00a338fedb9d5mr4201411ejc.3.1706716957070; Wed, 31 Jan 2024 08:02:37 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCXfm71N3T9h/F/CiQJuplDKsyuZZyOxCnpmTyjV18wHTQsMSdQnNkDDdJ4IMI8MOZMMdnUoibCHWjlKo3QMlFwAcKN2+h/dhl9NOsGzXN1JsWeY0jSAXt6kygdx5jmO3bK8iZoFHuM2PCTSYsG7ZfsOCsijcVgRagvcQVE7W3iHU8yMR10JnIGM7RSSzqhflRVpTacDW+JT62844/5FGpifmIx9ZrJrM32TGTe40Mjtw8mBvhFaS6bJSQuAKlFrcU4DEPoIWsMIUm9fjVLrTF8RvqX/J91AqrinE9UD4tO4ISOhc5MQDo4RKGEVsopkso7aPjriJ0cD9+Fss4m7V8IDYYZMnwGS9GvlGIDV9Lpae7a5DoYsVx56yheDZ7eUUC3WsLxo3l24ZoumYteEnZZiZMbL+5fvvxzGV7neS1WG53zX6tmFIdvnLEkuWqR6qDg47MOvifLiRGbtfsAarzJgey4UMkS9f6zlL5NDt6UJmERiwr0V+mC8T+2+z3IXDIyygRYyz24aCKxBXWJVx+XxUH3fJWDXb0YlsqC+SlGK4xdQccCUJyTO8t+9ksDUDrwdR5XNwyKgcWRZY7NyfNHedMNeXUVCCVKsxJI5vH9PqtlKzjTXdPyWbekeU+53OewZjRnfybXkDRYbEg== Received: from alex-rivos.ba.rivosinc.com (amontpellier-656-1-456-62.w92-145.abo.wanadoo.fr. [92.145.124.62]) by smtp.gmail.com with ESMTPSA id vh8-20020a170907d38800b00a3040040011sm180335ejc.45.2024.01.31.08.02.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 08:02:36 -0800 (PST) From: Alexandre Ghiti To: Catalin Marinas , Will Deacon , Thomas Bogendoerfer , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Albert Ou , Andrew Morton , Ved Shanbhogue , Matt Evans , Dylan Jhong , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org Cc: Alexandre Ghiti Subject: [PATCH RFC/RFT v2 3/4] riscv: Stop emitting preventive sfence.vma for new vmalloc mappings Date: Wed, 31 Jan 2024 16:59:28 +0100 Message-Id: <20240131155929.169961-4-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240131155929.169961-1-alexghiti@rivosinc.com> References: <20240131155929.169961-1-alexghiti@rivosinc.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789622473141007730 X-GMAIL-MSGID: 1789622473141007730 In 6.5, we removed the vmalloc fault path because that can't work (see [1] [2]). Then in order to make sure that new page table entries were seen by the page table walker, we had to preventively emit a sfence.vma on all harts [3] but this solution is very costly since it relies on IPI. And even there, we could end up in a loop of vmalloc faults if a vmalloc allocation is done in the IPI path (for example if it is traced, see [4]), which could result in a kernel stack overflow. Those preventive sfence.vma needed to be emitted because: - if the uarch caches invalid entries, the new mapping may not be observed by the page table walker and an invalidation may be needed. - if the uarch does not cache invalid entries, a reordered access could "miss" the new mapping and traps: in that case, we would actually only need to retry the access, no sfence.vma is required. So this patch removes those preventive sfence.vma and actually handles the possible (and unlikely) exceptions. And since the kernel stacks mappings lie in the vmalloc area, this handling must be done very early when the trap is taken, at the very beginning of handle_exception: this also rules out the vmalloc allocations in the fault path. Link: https://lore.kernel.org/linux-riscv/20230531093817.665799-1-bjorn@kernel.org/ [1] Link: https://lore.kernel.org/linux-riscv/20230801090927.2018653-1-dylan@andestech.com [2] Link: https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/ [3] Link: https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/ [4] Signed-off-by: Alexandre Ghiti --- arch/riscv/include/asm/cacheflush.h | 18 +++++- arch/riscv/include/asm/thread_info.h | 5 ++ arch/riscv/kernel/asm-offsets.c | 5 ++ arch/riscv/kernel/entry.S | 84 ++++++++++++++++++++++++++++ arch/riscv/mm/init.c | 2 + 5 files changed, 113 insertions(+), 1 deletion(-) diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h index a129dac4521d..b0d631701757 100644 --- a/arch/riscv/include/asm/cacheflush.h +++ b/arch/riscv/include/asm/cacheflush.h @@ -37,7 +37,23 @@ static inline void flush_dcache_page(struct page *page) flush_icache_mm(vma->vm_mm, 0) #ifdef CONFIG_64BIT -#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) +extern u64 new_vmalloc[NR_CPUS / sizeof(u64) + 1]; +extern char _end[]; +#define flush_cache_vmap flush_cache_vmap +static inline void flush_cache_vmap(unsigned long start, unsigned long end) +{ + if (is_vmalloc_or_module_addr((void *)start)) { + int i; + + /* + * We don't care if concurrently a cpu resets this value since + * the only place this can happen is in handle_exception() where + * an sfence.vma is emitted. + */ + for (i = 0; i < ARRAY_SIZE(new_vmalloc); ++i) + new_vmalloc[i] = -1ULL; + } +} #define flush_cache_vmap_early(start, end) local_flush_tlb_kernel_range(start, end) #endif diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index 5d473343634b..32631acdcdd4 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -60,6 +60,11 @@ struct thread_info { void *scs_base; void *scs_sp; #endif + /* + * Used in handle_exception() to save a0, a1 and a2 before knowing if we + * can access the kernel stack. + */ + unsigned long a0, a1, a2; }; #ifdef CONFIG_SHADOW_CALL_STACK diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index a03129f40c46..939ddc0e3c6e 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -35,6 +35,8 @@ void asm_offsets(void) OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]); + + OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu); OFFSET(TASK_TI_FLAGS, task_struct, thread_info.flags); OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count); OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); @@ -42,6 +44,9 @@ void asm_offsets(void) #ifdef CONFIG_SHADOW_CALL_STACK OFFSET(TASK_TI_SCS_SP, task_struct, thread_info.scs_sp); #endif + OFFSET(TASK_TI_A0, task_struct, thread_info.a0); + OFFSET(TASK_TI_A1, task_struct, thread_info.a1); + OFFSET(TASK_TI_A2, task_struct, thread_info.a2); OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 9d1a305d5508..c1ffaeaba7aa 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -19,6 +19,78 @@ .section .irqentry.text, "ax" +.macro new_vmalloc_check + REG_S a0, TASK_TI_A0(tp) + REG_S a1, TASK_TI_A1(tp) + REG_S a2, TASK_TI_A2(tp) + + csrr a0, CSR_CAUSE + /* Exclude IRQs */ + blt a0, zero, _new_vmalloc_restore_context + /* Only check new_vmalloc if we are in page/protection fault */ + li a1, EXC_LOAD_PAGE_FAULT + beq a0, a1, _new_vmalloc_kernel_address + li a1, EXC_STORE_PAGE_FAULT + beq a0, a1, _new_vmalloc_kernel_address + li a1, EXC_INST_PAGE_FAULT + bne a0, a1, _new_vmalloc_restore_context + +_new_vmalloc_kernel_address: + /* Is it a kernel address? */ + csrr a0, CSR_TVAL + bge a0, zero, _new_vmalloc_restore_context + + /* Check if a new vmalloc mapping appeared that could explain the trap */ + + /* + * Computes: + * a0 = &new_vmalloc[BIT_WORD(cpu)] + * a1 = BIT_MASK(cpu) + */ + REG_L a2, TASK_TI_CPU(tp) + /* + * Compute the new_vmalloc element position: + * (cpu / 64) * 8 = (cpu >> 6) << 3 + */ + srli a1, a2, 6 + slli a1, a1, 3 + la a0, new_vmalloc + add a0, a0, a1 + /* + * Compute the bit position in the new_vmalloc element: + * bit_pos = cpu % 64 = cpu - (cpu / 64) * 64 = cpu - (cpu >> 6) << 6 + * = cpu - ((cpu >> 6) << 3) << 3 + */ + slli a1, a1, 3 + sub a1, a2, a1 + /* Compute the "get mask": 1 << bit_pos */ + li a2, 1 + sll a1, a2, a1 + + /* Check the value of new_vmalloc for this cpu */ + REG_L a2, 0(a0) + and a2, a2, a1 + beq a2, zero, _new_vmalloc_restore_context + + /* Atomically reset the current cpu bit in new_vmalloc */ + amoxor.w a0, a1, (a0) + + /* Only emit a sfence.vma if the uarch caches invalid entries */ + ALTERNATIVE("sfence.vma", "nop", 0, RISCV_ISA_EXT_SVVPTC, 1) + + REG_L a0, TASK_TI_A0(tp) + REG_L a1, TASK_TI_A1(tp) + REG_L a2, TASK_TI_A2(tp) + csrw CSR_SCRATCH, x0 + sret + +_new_vmalloc_restore_context: + REG_L a0, TASK_TI_A0(tp) + REG_L a1, TASK_TI_A1(tp) + REG_L a2, TASK_TI_A2(tp) +.endm + + SYM_CODE_START(handle_exception) /* * If coming from userspace, preserve the user thread pointer and load @@ -30,6 +102,18 @@ SYM_CODE_START(handle_exception) .Lrestore_kernel_tpsp: csrr tp, CSR_SCRATCH + + /* + * The RISC-V kernel does not eagerly emit a sfence.vma after each + * new vmalloc mapping, which may result in exceptions: + * - if the uarch caches invalid entries, the new mapping would not be + * observed by the page table walker and an invalidation is needed. + * - if the uarch does not cache invalid entries, a reordered access + * could "miss" the new mapping and traps: in that case, we only need + * to retry the access, no sfence.vma is required. + */ + new_vmalloc_check + REG_S sp, TASK_TI_KERNEL_SP(tp) #ifdef CONFIG_VMAP_STACK diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index eafc4c2200f2..54c9fdeda11e 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -36,6 +36,8 @@ #include "../kernel/head.h" +u64 new_vmalloc[NR_CPUS / sizeof(u64) + 1]; + struct kernel_mapping kernel_map __ro_after_init; EXPORT_SYMBOL(kernel_map); #ifdef CONFIG_XIP_KERNEL