Message ID | 20221116165655.2649475-3-oliver.upton@linux.dev |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp248499wru; Wed, 16 Nov 2022 09:00:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf4TMIJDnbSpgUu2seoPpLGRCjQvWQ9OqSajB8wSQqnaQppTNuosFvxLS4UZaY4ZNU4Ms/kR X-Received: by 2002:a17:902:ef4c:b0:186:f934:30af with SMTP id e12-20020a170902ef4c00b00186f93430afmr10083608plx.126.1668618014124; Wed, 16 Nov 2022 09:00:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668618014; cv=none; d=google.com; s=arc-20160816; b=I+SCkWwzVAe0KAAGpJfXAbYfmPeD6CUNK8iU3RwNgDwMdlWEiTppFjRV7GNYY+Gvj2 pNwAJ0PntazQKTp8FISi/D7BYhSKeo/LnMqNnSGPUpPhxdw8U1QM2ObvwcVIa7AOc5Zg eCD/PusMnrDFBpVkDHcFZh3kQ077T/ccjotar2Hi3qBAcKTwO/SUTkKcMfUgCitRS8to YcL8HeeIOw1CqSoQjWh0SWiyaM4nUT3oSXFx3ruQNDWwfaadQPinM4x+5xh2gupILodX DTm9zd9EETsu6gYCxp+k9JdFrGqmye3gTGX+PrYZe6z1KmuPSQz8RQdmW29tJgSpblc0 XPUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PJOu0pfrq2jAQIo6Bks9pUoZjM76RoOURl952XiuGoE=; b=VINZvbu4thLqYW9VB6lX55cBLdEyMNianpMpCVcRAi+15r7jtkqz0iF1PFIeA9h/9W jqvCHKkP1fP06vfOFwsIQDk48Yr5tjrJt8iKz2t3ceFuRFE2A4Qe5cIoKl5N2+eFnm/T xh3w3x9etVEfLO4sNeSCsScY7/dH+SGHlh5LwM5qNoIZQS3vRRSsvzFkT8rpf3I1RvWq IZwnf6GBV1b/0UgItOir/P0gtPZ1XCjUUCwpxRBsEi5F4WWlUcuozDT9+rfTutDKjyRs nEU6WrIoi6F+hH14QxFxCutacAqHGzTzUTdQHTuKURLfJiVFlK0uu6qc3Clpx/meqInl 8yAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=PLFCEsIp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q18-20020a170902789200b00178aec118c9si14698478pll.378.2022.11.16.08.59.59; Wed, 16 Nov 2022 09:00:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=PLFCEsIp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233207AbiKPQ5V (ORCPT <rfc822;just.gull.subs@gmail.com> + 99 others); Wed, 16 Nov 2022 11:57:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229703AbiKPQ5R (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 16 Nov 2022 11:57:17 -0500 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E3373BA; Wed, 16 Nov 2022 08:57:16 -0800 (PST) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1668617835; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PJOu0pfrq2jAQIo6Bks9pUoZjM76RoOURl952XiuGoE=; b=PLFCEsIpQHYPkHrDRzIwl8KiYXr9h6qIjwmBInUzR+YMr4Lv9YkHUQZRZ7HacgHYkgFyp3 PrwMAStUr2lAu7NBIGYF321EzC0G0p/LHqYPzNPnKwy+VWYDYEFecPyf3Tj3Bz+NdM95IP 7tnXy3Nj6aI+1ptdxnUCt+9M9WO0hlM= From: Oliver Upton <oliver.upton@linux.dev> To: Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Oliver Upton <oliver.upton@linux.dev>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvmarm@lists.linux.dev, Marek Szyprowski <m.szyprowski@samsung.com>, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/2] KVM: arm64: Don't acquire RCU read lock for exclusive table walks Date: Wed, 16 Nov 2022 16:56:55 +0000 Message-Id: <20221116165655.2649475-3-oliver.upton@linux.dev> In-Reply-To: <20221116165655.2649475-1-oliver.upton@linux.dev> References: <20221116165655.2649475-1-oliver.upton@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749672802508342019?= X-GMAIL-MSGID: =?utf-8?q?1749672802508342019?= |
Series |
[v3,1/2] KVM: arm64: Take a pointer to walker data in kvm_dereference_pteref()
|
|
Commit Message
Oliver Upton
Nov. 16, 2022, 4:56 p.m. UTC
Marek reported a BUG resulting from the recent parallel faults changes,
as the hyp stage-1 map walker attempted to allocate table memory while
holding the RCU read lock:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by swapper/0/1:
#0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at:
__create_hyp_mappings+0x80/0xc4
#1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at:
kvm_pgtable_walk+0x0/0x1f4
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
Hardware name: Raspberry Pi 3 Model B (DT)
Call trace:
dump_backtrace.part.0+0xe4/0xf0
show_stack+0x18/0x40
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x178/0x220
__might_sleep+0x48/0xa0
prepare_alloc_pages+0x178/0x1a0
__alloc_pages+0x9c/0x109c
alloc_page_interleave+0x1c/0xc4
alloc_pages+0xec/0x160
get_zeroed_page+0x1c/0x44
kvm_hyp_zalloc_page+0x14/0x20
hyp_map_walker+0xd4/0x134
kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
__kvm_pgtable_walk+0x1a4/0x220
kvm_pgtable_walk+0x104/0x1f4
kvm_pgtable_hyp_map+0x80/0xc4
__create_hyp_mappings+0x9c/0xc4
kvm_mmu_init+0x144/0x1cc
kvm_arch_init+0xe4/0xef4
kvm_init+0x3c/0x3d0
arm_init+0x20/0x30
do_one_initcall+0x74/0x400
kernel_init_freeable+0x2e0/0x350
kernel_init+0x24/0x130
ret_from_fork+0x10/0x20
Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex,
RCU protection really doesn't add anything. Don't acquire the RCU read
lock for an exclusive walk. While at it, add a warning which codifies
the lack of support for shared walks in the hypervisor code.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/include/asm/kvm_pgtable.h | 22 ++++++++++++++++------
arch/arm64/kvm/hyp/pgtable.c | 4 ++--
2 files changed, 18 insertions(+), 8 deletions(-)
Comments
On Wed, Nov 16, 2022 at 04:56:55PM +0000, Oliver Upton wrote: > Marek reported a BUG resulting from the recent parallel faults changes, > as the hyp stage-1 map walker attempted to allocate table memory while > holding the RCU read lock: > > BUG: sleeping function called from invalid context at > include/linux/sched/mm.h:274 > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 > preempt_count: 0, expected: 0 > RCU nest depth: 1, expected: 0 > 2 locks held by swapper/0/1: > #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: > __create_hyp_mappings+0x80/0xc4 > #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: > kvm_pgtable_walk+0x0/0x1f4 > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918 > Hardware name: Raspberry Pi 3 Model B (DT) > Call trace: > dump_backtrace.part.0+0xe4/0xf0 > show_stack+0x18/0x40 > dump_stack_lvl+0x8c/0xb8 > dump_stack+0x18/0x34 > __might_resched+0x178/0x220 > __might_sleep+0x48/0xa0 > prepare_alloc_pages+0x178/0x1a0 > __alloc_pages+0x9c/0x109c > alloc_page_interleave+0x1c/0xc4 > alloc_pages+0xec/0x160 > get_zeroed_page+0x1c/0x44 > kvm_hyp_zalloc_page+0x14/0x20 > hyp_map_walker+0xd4/0x134 > kvm_pgtable_visitor_cb.isra.0+0x38/0x5c > __kvm_pgtable_walk+0x1a4/0x220 > kvm_pgtable_walk+0x104/0x1f4 > kvm_pgtable_hyp_map+0x80/0xc4 > __create_hyp_mappings+0x9c/0xc4 > kvm_mmu_init+0x144/0x1cc > kvm_arch_init+0xe4/0xef4 > kvm_init+0x3c/0x3d0 > arm_init+0x20/0x30 > do_one_initcall+0x74/0x400 > kernel_init_freeable+0x2e0/0x350 > kernel_init+0x24/0x130 > ret_from_fork+0x10/0x20 > > Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex, > RCU protection really doesn't add anything. Don't acquire the RCU read > lock for an exclusive walk. While at it, add a warning which codifies > the lack of support for shared walks in the hypervisor code. > > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev> > --- > arch/arm64/include/asm/kvm_pgtable.h | 22 ++++++++++++++++------ > arch/arm64/kvm/hyp/pgtable.c | 4 ++-- > 2 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h > index f23af693e3c5..a07fc5e35a8c 100644 > --- a/arch/arm64/include/asm/kvm_pgtable.h > +++ b/arch/arm64/include/asm/kvm_pgtable.h > @@ -229,8 +229,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(struct kvm_pgtable_walker *walke > return pteref; > } > > -static inline void kvm_pgtable_walk_begin(void) {} > -static inline void kvm_pgtable_walk_end(void) {} > +static inline void kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) > +{ > + /* > + * Due to the lack of RCU (or a similar protection scheme), only > + * non-shared table walkers are allowed in the hypervisor. > + */ > + WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED); > +} I think it would be better to propagate the error to the caller rather than WARN here. Since you're rejigging things anyway, can you have this function return int? Will
Hi Will, Thanks for having a look. On Thu, Nov 17, 2022 at 05:49:52PM +0000, Will Deacon wrote: > On Wed, Nov 16, 2022 at 04:56:55PM +0000, Oliver Upton wrote: [...] > > -static inline void kvm_pgtable_walk_begin(void) {} > > -static inline void kvm_pgtable_walk_end(void) {} > > +static inline void kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) > > +{ > > + /* > > + * Due to the lack of RCU (or a similar protection scheme), only > > + * non-shared table walkers are allowed in the hypervisor. > > + */ > > + WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED); > > +} > > I think it would be better to propagate the error to the caller rather > than WARN here. I'd really like to warn somewhere though since we're rather fscked at this point. Keeping that WARN close to the exceptional condition would help w/ debugging. Were you envisioning bubbling the error all the way back up (i.e. early return from kvm_pgtable_walk())? I had really only intended these to indirect lock acquisition/release, so the error handling on the caller side feels weird: static inline int kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) { if (WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED)) return -EPERM; return 0; } r = kvm_pgtable_walk_begin() if (r) return r; r = _kvm_pgtable_walk(); kvm_pgtable_walk_end(); > Since you're rejigging things anyway, can you have this > function return int? If having this is a strong motivator I can do a v4. -- Thanks, Oliver
On Thu, Nov 17, 2022 at 06:23:23PM +0000, Oliver Upton wrote: > On Thu, Nov 17, 2022 at 05:49:52PM +0000, Will Deacon wrote: > > On Wed, Nov 16, 2022 at 04:56:55PM +0000, Oliver Upton wrote: > > [...] > > > > -static inline void kvm_pgtable_walk_begin(void) {} > > > -static inline void kvm_pgtable_walk_end(void) {} > > > +static inline void kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) > > > +{ > > > + /* > > > + * Due to the lack of RCU (or a similar protection scheme), only > > > + * non-shared table walkers are allowed in the hypervisor. > > > + */ > > > + WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED); > > > +} > > > > I think it would be better to propagate the error to the caller rather > > than WARN here. > > I'd really like to warn somewhere though since we're rather fscked at > this point. Keeping that WARN close to the exceptional condition would > help w/ debugging. > > Were you envisioning bubbling the error all the way back up (i.e. early > return from kvm_pgtable_walk())? Yes, that's what I had in mind. WARN is fatal at EL2, so I think it's better to fail the pgtable operation rather than bring down the entire machine by default. Now, it _might_ be fatal anyway (e.g. if we were handling a host stage-2 fault w/ pKVM), but the caller is in a better position to decide the severity. > I had really only intended these to indirect lock acquisition/release, > so the error handling on the caller side feels weird: > > static inline int kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) > { > if (WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED)) > return -EPERM; > > return 0; > } > > r = kvm_pgtable_walk_begin() > if (r) > return r; > > r = _kvm_pgtable_walk(); > kvm_pgtable_walk_end(); This doesn't look particularly weird to me (modulo dropping the WARN, or moving it to _end()), but maybe I've lost my sense of taste. > > Since you're rejigging things anyway, can you have this > > function return int? > > If having this is a strong motivator I can do a v4. It's a really minor point, so I'll leave it up to you guys. WIll
On Fri, Nov 18, 2022 at 12:19:50PM +0000, Will Deacon wrote: > On Thu, Nov 17, 2022 at 06:23:23PM +0000, Oliver Upton wrote: > > On Thu, Nov 17, 2022 at 05:49:52PM +0000, Will Deacon wrote: > > > On Wed, Nov 16, 2022 at 04:56:55PM +0000, Oliver Upton wrote: > > > > [...] > > > > > > -static inline void kvm_pgtable_walk_begin(void) {} > > > > -static inline void kvm_pgtable_walk_end(void) {} > > > > +static inline void kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) > > > > +{ > > > > + /* > > > > + * Due to the lack of RCU (or a similar protection scheme), only > > > > + * non-shared table walkers are allowed in the hypervisor. > > > > + */ > > > > + WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED); > > > > +} > > > > > > I think it would be better to propagate the error to the caller rather > > > than WARN here. > > > > I'd really like to warn somewhere though since we're rather fscked at > > this point. Keeping that WARN close to the exceptional condition would > > help w/ debugging. > > > > Were you envisioning bubbling the error all the way back up (i.e. early > > return from kvm_pgtable_walk())? > > Yes, that's what I had in mind. WARN is fatal at EL2, so I think it's > better to fail the pgtable operation rather than bring down the entire > machine by default. Duh, I forgot WARNs really do go boom at EL2. Yeah, in that case it'd be best to let the caller clean up the mess. > > If having this is a strong motivator I can do a v4. > > It's a really minor point, so I'll leave it up to you guys. Sold (sorry I wasn't following before). v4 on the way. -- Thanks, Oliver
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index f23af693e3c5..a07fc5e35a8c 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -229,8 +229,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(struct kvm_pgtable_walker *walke return pteref; } -static inline void kvm_pgtable_walk_begin(void) {} -static inline void kvm_pgtable_walk_end(void) {} +static inline void kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) +{ + /* + * Due to the lack of RCU (or a similar protection scheme), only + * non-shared table walkers are allowed in the hypervisor. + */ + WARN_ON(walker->flags & KVM_PGTABLE_WALK_SHARED); +} + +static inline void kvm_pgtable_walk_end(struct kvm_pgtable_walker *walker) {} static inline bool kvm_pgtable_walk_lock_held(void) { @@ -247,14 +255,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(struct kvm_pgtable_walker *walke return rcu_dereference_check(pteref, !(walker->flags & KVM_PGTABLE_WALK_SHARED)); } -static inline void kvm_pgtable_walk_begin(void) +static inline void kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) { - rcu_read_lock(); + if (walker->flags & KVM_PGTABLE_WALK_SHARED) + rcu_read_lock(); } -static inline void kvm_pgtable_walk_end(void) +static inline void kvm_pgtable_walk_end(struct kvm_pgtable_walker *walker) { - rcu_read_unlock(); + if (walker->flags & KVM_PGTABLE_WALK_SHARED) + rcu_read_unlock(); } static inline bool kvm_pgtable_walk_lock_held(void) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index b5b91a882836..d6f3753cb87e 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -289,9 +289,9 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size, }; int r; - kvm_pgtable_walk_begin(); + kvm_pgtable_walk_begin(walker); r = _kvm_pgtable_walk(pgt, &walk_data); - kvm_pgtable_walk_end(); + kvm_pgtable_walk_end(walker); return r; }