Message ID | 20221217015435.73889-3-bhe@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1304316wrn; Fri, 16 Dec 2022 17:57:15 -0800 (PST) X-Google-Smtp-Source: AA0mqf5IhkgftIA9zXZgCX977pWka75tuCcyJGHtYalbOPc/Xn30senrtTREptmbZujpQ5vR3GD9 X-Received: by 2002:a17:902:e30b:b0:189:dc5e:8889 with SMTP id q11-20020a170902e30b00b00189dc5e8889mr30486567plc.43.1671242235164; Fri, 16 Dec 2022 17:57:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671242235; cv=none; d=google.com; s=arc-20160816; b=ECzi5TYMoRP6jZC9uoMZ8tUhw78lcIMoK/D5ZpkdeDsGxBn7Zae3DW+6CwEfhNmq4i dxIsPhVaWVuHdmZp3A4TuTisoE7S92txRhNHPwRapFwzuNSvCTky/pNWCyU0L99ILkZ1 IHQHkZuEkMkdXEndrajiKhaK0Hdf+z7ZYFnBVWp14jAGRw+Vvvvb6j+2vvPEUImuKymh t4kntaj9SqkD8mt1G8436wkjlHLdOHAINRuz0BoA5VFjatvBIAUb7a1j3IEA3yhss9nX OgYLD8pOeURndzFzL23t0gikUeYROBPls+vHfEnVteOvgD/mzax4+al/fgMUHTektsmO qsvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NFxowZ19NpzMI8sTYkYa3ikwqerJc6k8ne6d4Xda4xk=; b=q6go48B766XiptpssHsszSdCttd2XU0DB3lnj94kZftHK9HHuraxtb69dCcVObzAqY IIJT4TuLcC2RM61W6ucp3iL6Hmt3nfp/FIf1NRF9vEnG2Cb73cwmWx/HL+CgizWdb7Gm YIYaXZ/EkDfuAdAK0wzR5CpozZhR18TpMS51TS1pxiGdizstFd6cuGHBL9ZwnYYb64BQ YCEHbWUW3y8ESSPHFVItFUt6Rt9OsQkQc/pa8eeaIaIybG+O7zOpVizX7AEo3Etd1JK/ mrAs8IqRTw15BvVY9ufmHUZW5huh8aoknUNKwvx6Ce66pig29nzcTIHCKlLtgEVxDswO ewOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WNxqPtGb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m9-20020a170902db0900b00186b1459953si4681969plx.382.2022.12.16.17.57.02; Fri, 16 Dec 2022 17:57:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WNxqPtGb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230160AbiLQB4F (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Fri, 16 Dec 2022 20:56:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230014AbiLQBzw (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 16 Dec 2022 20:55:52 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85A0A4A590 for <linux-kernel@vger.kernel.org>; Fri, 16 Dec 2022 17:54:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671242096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NFxowZ19NpzMI8sTYkYa3ikwqerJc6k8ne6d4Xda4xk=; b=WNxqPtGbgUGrlv4K+MczMXNVUCgcZtjiCx5pPebEMzj5YsXFO1XQkLeTu9dCoCRZIhia4X /ApSZdCQ4hobNLrEOXci3eta9g7f4N/R7enTU6y0aC9c5AFWAs4ZzcoqciMUIsDOc6szXw mPFs9zz4VyvcJmGrwOcS7LlqItMtNiE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-180-hokThCgxMV65_JphGCiUqg-1; Fri, 16 Dec 2022 20:54:53 -0500 X-MC-Unique: hokThCgxMV65_JphGCiUqg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0170785A588; Sat, 17 Dec 2022 01:54:53 +0000 (UTC) Received: from MiWiFi-R3L-srv.redhat.com (ovpn-12-34.pek2.redhat.com [10.72.12.34]) by smtp.corp.redhat.com (Postfix) with ESMTP id D6A58400F5B; Sat, 17 Dec 2022 01:54:48 +0000 (UTC) From: Baoquan He <bhe@redhat.com> To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, urezki@gmail.com, stephen.s.brennan@oracle.com, willy@infradead.org, akpm@linux-foundation.org, hch@infradead.org, Baoquan He <bhe@redhat.com> Subject: [PATCH v2 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area Date: Sat, 17 Dec 2022 09:54:30 +0800 Message-Id: <20221217015435.73889-3-bhe@redhat.com> In-Reply-To: <20221217015435.73889-1-bhe@redhat.com> References: <20221217015435.73889-1-bhe@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752424497841027369?= X-GMAIL-MSGID: =?utf-8?q?1752424497841027369?= |
Series |
mm/vmalloc.c: allow vread() to read out vm_map_ram areas
|
|
Commit Message
Baoquan He
Dec. 17, 2022, 1:54 a.m. UTC
Through vmalloc API, a virtual kernel area is reserved for physical
address mapping. And vmap_area is used to track them, while vm_struct
is allocated to associate with the vmap_area to store more information
and passed out.
However, area reserved via vm_map_ram() is an exception. It doesn't have
vm_struct to associate with vmap_area. And we can't recognize the
vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
freeing path will set va->vm = NULL before unmapping, please see
function remove_vm_area().
Meanwhile, there are two types of vm_map_ram area. One is the whole
vmap_area being reserved and mapped at one time; the other is the
whole vmap_area with VMAP_BLOCK_SIZE size being reserved, while mapped
into split regions with smaller size several times via vb_alloc().
To mark the area reserved through vm_map_ram(), add flags field into
struct vmap_area. Bit 0 indicates whether it's a vm_map_ram area,
while bit 1 indicates whether it's a vmap_block type of vm_map_ram
area.
This is a preparatoin for later use.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
include/linux/vmalloc.h | 1 +
mm/vmalloc.c | 22 +++++++++++++++++-----
2 files changed, 18 insertions(+), 5 deletions(-)
Comments
On Sat, Dec 17, 2022 at 09:54:30AM +0800, Baoquan He wrote: > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > return; > } > > - va = find_vmap_area(addr); > + spin_lock(&vmap_area_lock); > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > BUG_ON(!va); > + if (va) > + va->flags &= ~VMAP_RAM; > + spin_unlock(&vmap_area_lock); > debug_check_no_locks_freed((void *)va->va_start, > (va->va_end - va->va_start)); > free_unmap_vmap_area(va); Would it be better to perform the BUG_ON() after the lock is released? You already check if va exists before unmasking so it's safe. Also, do we want to clear VMAP_BLOCK here?
On 12/17/22 at 11:44am, Lorenzo Stoakes wrote: > On Sat, Dec 17, 2022 at 09:54:30AM +0800, Baoquan He wrote: > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > return; > > } > > > > - va = find_vmap_area(addr); > > + spin_lock(&vmap_area_lock); > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > BUG_ON(!va); > > + if (va) > > + va->flags &= ~VMAP_RAM; > > + spin_unlock(&vmap_area_lock); > > debug_check_no_locks_freed((void *)va->va_start, > > (va->va_end - va->va_start)); > > free_unmap_vmap_area(va); > > Would it be better to perform the BUG_ON() after the lock is released? You > already check if va exists before unmasking so it's safe. It's a little unclear to me why we care BUG_ON() is performed before or after the lock released. We won't have a stable kernel after BUG_ON()(), right? > > Also, do we want to clear VMAP_BLOCK here? I do, but I don't find a good place to clear VMAP_BLOCK. In v1, I tried to clear it in free_vmap_area_noflush() as below, Uladzislau dislikes it. So I remove it. My thinking is when we unmap and free the vmap area, the vmap_area is moved from vmap_area_root into &free_vmap_area_root. When we allocate a new vmap_area via alloc_vmap_area(), we will allocate a new va by kmem_cache_alloc_node(), the va->flags must be 0. Seems not initializing it to 0 won't impact thing. diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5d3fd3e6fe09..d6f376060d83 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1815,6 +1815,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) spin_lock(&vmap_area_lock); unlink_va(va, &vmap_area_root); + va->flags = 0; spin_unlock(&vmap_area_lock); nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> >
On Mon, Dec 19, 2022 at 04:01:00PM +0800, Baoquan He wrote: > On 12/17/22 at 11:44am, Lorenzo Stoakes wrote: > > On Sat, Dec 17, 2022 at 09:54:30AM +0800, Baoquan He wrote: > > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > > return; > > > } > > > > > > - va = find_vmap_area(addr); > > > + spin_lock(&vmap_area_lock); > > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > > BUG_ON(!va); > > > + if (va) > > > + va->flags &= ~VMAP_RAM; > > > + spin_unlock(&vmap_area_lock); > > > debug_check_no_locks_freed((void *)va->va_start, > > > (va->va_end - va->va_start)); > > > free_unmap_vmap_area(va); > > > > Would it be better to perform the BUG_ON() after the lock is released? You > > already check if va exists before unmasking so it's safe. > > It's a little unclear to me why we care BUG_ON() is performed before or > after the lock released. We won't have a stable kernel after BUG_ON()(), > right? BUG_ON()'s can be recoverable in user context and it would be a very simple change that would not fundamentally alter anything to simply place the added lines prior to the BUG_ON(). The code as-is doesn't really make sense anyway, you BUG_ON(!va) then check if va is non-null, then immediately the function afterwards passes va around as if it were not null, so I think it'd also be an aesthetic and logical improvement :) > > > > Also, do we want to clear VMAP_BLOCK here? > > I do, but I don't find a good place to clear VMAP_BLOCK. > > In v1, I tried to clear it in free_vmap_area_noflush() as below, > Uladzislau dislikes it. So I remove it. My thinking is when we unmap and > free the vmap area, the vmap_area is moved from vmap_area_root into > &free_vmap_area_root. When we allocate a new vmap_area via > alloc_vmap_area(), we will allocate a new va by kmem_cache_alloc_node(), > the va->flags must be 0. Seems not initializing it to 0 won't impact > thing. > You are at this point clearing the VMAP_RAM flag though, so if it is unimportant what the flags are after this call, why are you clearing this one? It is just a little confusing, I wonder whether the VMAP_BLOCK flag is necessary at all, is it possible to just treat a non-VMAP_BLOCK VMAP_RAM area as if it were simply a fully occupied block? Do we gain much by the distinction? > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 5d3fd3e6fe09..d6f376060d83 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1815,6 +1815,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) > > spin_lock(&vmap_area_lock); > unlink_va(va, &vmap_area_root); > + va->flags = 0; > spin_unlock(&vmap_area_lock); > > nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> > > > >
On 12/19/22 at 09:09am, Lorenzo Stoakes wrote: > On Mon, Dec 19, 2022 at 04:01:00PM +0800, Baoquan He wrote: > > On 12/17/22 at 11:44am, Lorenzo Stoakes wrote: > > > On Sat, Dec 17, 2022 at 09:54:30AM +0800, Baoquan He wrote: > > > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > > > return; > > > > } > > > > > > > > - va = find_vmap_area(addr); > > > > + spin_lock(&vmap_area_lock); > > > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > > > BUG_ON(!va); > > > > + if (va) > > > > + va->flags &= ~VMAP_RAM; > > > > + spin_unlock(&vmap_area_lock); > > > > debug_check_no_locks_freed((void *)va->va_start, > > > > (va->va_end - va->va_start)); > > > > free_unmap_vmap_area(va); > > > > > > Would it be better to perform the BUG_ON() after the lock is released? You > > > already check if va exists before unmasking so it's safe. > > > > It's a little unclear to me why we care BUG_ON() is performed before or > > after the lock released. We won't have a stable kernel after BUG_ON()(), > > right? > > BUG_ON()'s can be recoverable in user context and it would be a very simple > change that would not fundamentally alter anything to simply place the added > lines prior to the BUG_ON(). > > The code as-is doesn't really make sense anyway, you BUG_ON(!va) then check if > va is non-null, then immediately the function afterwards passes va around as if > it were not null, so I think it'd also be an aesthetic and logical improvement > :) In fact, I should not do the checking, but do the clearing anyway. If I change it as below, does it look better to you? diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5e578563784a..369b848d097a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2253,8 +2253,7 @@ void vm_unmap_ram(const void *mem, unsigned int count) spin_lock(&vmap_area_lock); va = __find_vmap_area((unsigned long)addr, &vmap_area_root); BUG_ON(!va); - if (va) - va->flags &= ~VMAP_RAM; + va->flags &= ~VMAP_RAM; spin_unlock(&vmap_area_lock); debug_check_no_locks_freed((void *)va->va_start, (va->va_end - va->va_start)); > > > > > > > Also, do we want to clear VMAP_BLOCK here? > > > > I do, but I don't find a good place to clear VMAP_BLOCK. > > > > In v1, I tried to clear it in free_vmap_area_noflush() as below, > > Uladzislau dislikes it. So I remove it. My thinking is when we unmap and > > free the vmap area, the vmap_area is moved from vmap_area_root into > > &free_vmap_area_root. When we allocate a new vmap_area via > > alloc_vmap_area(), we will allocate a new va by kmem_cache_alloc_node(), > > the va->flags must be 0. Seems not initializing it to 0 won't impact > > thing. > > > > You are at this point clearing the VMAP_RAM flag though, so if it is unimportant > what the flags are after this call, why are you clearing this one? With my understanding, We had better do the clearing. Currently, from the code, not doing the clearing won't cause issue. If possible, I would like to clear it as below draft code. > > It is just a little confusing, I wonder whether the VMAP_BLOCK flag is necessary > at all, is it possible to just treat a non-VMAP_BLOCK VMAP_RAM area as if it > were simply a fully occupied block? Do we gain much by the distinction? Yeah, VMAP_BLOCK flag is necessary. vmap_block contains used region, or dirty/free regions. While the non-vmap_blcok vm_map_ram area is similar with the non vm_map_ram area. When reading out vm_map_ram regions, vmap_block regions need be treated differently. > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 5d3fd3e6fe09..d6f376060d83 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1815,6 +1815,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) > > > > spin_lock(&vmap_area_lock); > > unlink_va(va, &vmap_area_root); > > + va->flags = 0; > > spin_unlock(&vmap_area_lock); > > > > nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> > > > > > > > >
On Mon, Dec 19, 2022 at 08:24:47PM +0800, Baoquan He wrote: > In fact, I should not do the checking, but do the clearing anyway. If I > change it as below, does it look better to you? > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 5e578563784a..369b848d097a 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2253,8 +2253,7 @@ void vm_unmap_ram(const void *mem, unsigned int count) > spin_lock(&vmap_area_lock); > va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > BUG_ON(!va); > - if (va) > - va->flags &= ~VMAP_RAM; > + va->flags &= ~VMAP_RAM; > spin_unlock(&vmap_area_lock); > debug_check_no_locks_freed((void *)va->va_start, > (va->va_end - va->va_start)); This is better as it avoids the slightly contradictory situation of checking for a condition we've asserted is not the case, but I would still far prefer keeping this as-is and placing the unlock before the BUG_ON(). This avoids explicitly and knowingly holding a lock over a BUG_ON() and is simple to implement, e.g.:- spin_lock(&vmap_area_lock); va = __find_vmap_area((unsigned long)addr, &vmap_area_root); if (va) va->flags &= ~VMAP_RAM; spin_unlock(&vmap_area_lock); BUG_ON(!va); > > You are at this point clearing the VMAP_RAM flag though, so if it is unimportant > > what the flags are after this call, why are you clearing this one? > > With my understanding, We had better do the clearing. Currently, from > the code, not doing the clearing won't cause issue. If possible, I would > like to clear it as below draft code. > Sure, it seems appropriate to clear it, I'm just unsure as to why you aren't just clearing both flags? Perhaps just set va->flags = 0? > > > > It is just a little confusing, I wonder whether the VMAP_BLOCK flag is necessary > > at all, is it possible to just treat a non-VMAP_BLOCK VMAP_RAM area as if it > > were simply a fully occupied block? Do we gain much by the distinction? > > Yeah, VMAP_BLOCK flag is necessary. vmap_block contains used region, > or dirty/free regions. While the non-vmap_blcok vm_map_ram area is > similar with the non vm_map_ram area. When reading out vm_map_ram > regions, vmap_block regions need be treated differently. OK looking through again closely I see you're absolutely right, I wondered whether you could somehow make a non-VMAP_BLOCK vread() operation be equivalent to a block one (only across the whole mapping), but I don't think you can.
On 12/19/22 at 01:01pm, Lorenzo Stoakes wrote: > On Mon, Dec 19, 2022 at 08:24:47PM +0800, Baoquan He wrote: > > In fact, I should not do the checking, but do the clearing anyway. If I > > change it as below, does it look better to you? > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 5e578563784a..369b848d097a 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -2253,8 +2253,7 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > spin_lock(&vmap_area_lock); > > va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > BUG_ON(!va); > > - if (va) > > - va->flags &= ~VMAP_RAM; > > + va->flags &= ~VMAP_RAM; > > spin_unlock(&vmap_area_lock); > > debug_check_no_locks_freed((void *)va->va_start, > > (va->va_end - va->va_start)); > > This is better as it avoids the slightly contradictory situation of checking for > a condition we've asserted is not the case, but I would still far prefer keeping > this as-is and placing the unlock before the BUG_ON(). > > This avoids explicitly and knowingly holding a lock over a BUG_ON() and is > simple to implement, e.g.:- > > spin_lock(&vmap_area_lock); > va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > if (va) > va->flags &= ~VMAP_RAM; > spin_unlock(&vmap_area_lock); > BUG_ON(!va); OK, I will change like this. > > > > You are at this point clearing the VMAP_RAM flag though, so if it is unimportant > > > what the flags are after this call, why are you clearing this one? > > > > With my understanding, We had better do the clearing. Currently, from > > the code, not doing the clearing won't cause issue. If possible, I would > > like to clear it as below draft code. > > > > Sure, it seems appropriate to clear it, I'm just unsure as to why you aren't > just clearing both flags? Perhaps just set va->flags = 0? Hmm, for the two kinds of vm_map_ram areas, their code paths are different. for unmapping vmap_block vm_map_ram, it goes through vb_free(). I can only do the clearing in free_vmap_block(). -->vm_unmap_ram() -->vb_free() -->free_vmap_block() For non vmap_block vm_map_ram area, I can do the clearing in vm_unmap_ram(). -->vm_unmap_ram() -->__find_vmap_area() -->free_unmap_vmap_area() As said earlier, clearing va->flags when unmap vm_map_ram area, or clearing va->vm in remove_vm_area(), these have better be done. However, not clearing them won't cause issue currently. Because the old vmap_area is inserted into free_vmap_area_root, when we allocate a new vmap_area through alloc_vmap_area(), we will get q new vmap_area from vmap_area_cachep, the old va->flags or va->vm won't be carried into the new vmap_area. Clearing them is a good to have, just in case. Rethinking about this, I may need to do the clearing when freeing vmap_block. Otherwise, people will be confused why the clearing is not done. @@ -1815,6 +1815,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) spin_lock(&vmap_area_lock); unlink_va(va, &vmap_area_root); + va->flags = 0; spin_unlock(&vmap_area_lock); nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> > > > > > > > It is just a little confusing, I wonder whether the VMAP_BLOCK flag is necessary > > > at all, is it possible to just treat a non-VMAP_BLOCK VMAP_RAM area as if it > > > were simply a fully occupied block? Do we gain much by the distinction? > > > > Yeah, VMAP_BLOCK flag is necessary. vmap_block contains used region, > > or dirty/free regions. While the non-vmap_blcok vm_map_ram area is > > similar with the non vm_map_ram area. When reading out vm_map_ram > > regions, vmap_block regions need be treated differently. > > OK looking through again closely I see you're absolutely right, I wondered > whether you could somehow make a non-VMAP_BLOCK vread() operation be equivalent > to a block one (only across the whole mapping), but I don't think you can. Right, it's much easier to do the same handling on non-VMAP_BLOCK vm_map_ram as the normal vmap area.
On Tue, Dec 20, 2022 at 08:14:15PM +0800, Baoquan He wrote: > Hmm, for the two kinds of vm_map_ram areas, their code paths are > different. for unmapping vmap_block vm_map_ram, it goes through > vb_free(). I can only do the clearing in free_vmap_block(). > > -->vm_unmap_ram() > -->vb_free() > -->free_vmap_block() > > For non vmap_block vm_map_ram area, I can do the clearing in > vm_unmap_ram(). > -->vm_unmap_ram() > -->__find_vmap_area() > -->free_unmap_vmap_area() > > As said earlier, clearing va->flags when unmap vm_map_ram area, or > clearing va->vm in remove_vm_area(), these have better be done. > However, not clearing them won't cause issue currently. Because the > old vmap_area is inserted into free_vmap_area_root, when we allocate a > new vmap_area through alloc_vmap_area(), we will get q new vmap_area > from vmap_area_cachep, the old va->flags or va->vm won't be carried into > the new vmap_area. Clearing them is a good to have, just in case. > Sure, this is more so about avoiding confusion and perhaps some future change which might not take into account that flag state could be invalid. I guess logically speaking, this is still a block when you are unmapping ram, so it's not unreasonable to retain the VMAP_BLOCK flag. In that case I'd say we're good simply clearing VMAP_RAM here. Thanks for the explanations! > Rethinking about this, I may need to do the clearing when freeing > vmap_block. Otherwise, people will be confused why the clearing is not > done. > > @@ -1815,6 +1815,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) > > spin_lock(&vmap_area_lock); > unlink_va(va, &vmap_area_root); > + va->flags = 0; > spin_unlock(&vmap_area_lock); > > nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> > That sounds like a good idea!
> Through vmalloc API, a virtual kernel area is reserved for physical > address mapping. And vmap_area is used to track them, while vm_struct > is allocated to associate with the vmap_area to store more information > and passed out. > > However, area reserved via vm_map_ram() is an exception. It doesn't have > vm_struct to associate with vmap_area. And we can't recognize the > vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal > freeing path will set va->vm = NULL before unmapping, please see > function remove_vm_area(). > A normal "free" path sets it to NULL in order to prevent a double-free of same VA. We can avoid of touching the va->vm if needed and do an unlink on entry in the remove_vm_area() when a lock is taken to find an area. Will it help you? > Meanwhile, there are two types of vm_map_ram area. One is the whole > vmap_area being reserved and mapped at one time; the other is the > whole vmap_area with VMAP_BLOCK_SIZE size being reserved, while mapped > into split regions with smaller size several times via vb_alloc(). > > To mark the area reserved through vm_map_ram(), add flags field into > struct vmap_area. Bit 0 indicates whether it's a vm_map_ram area, > while bit 1 indicates whether it's a vmap_block type of vm_map_ram > area. > > This is a preparatoin for later use. > > Signed-off-by: Baoquan He <bhe@redhat.com> > --- > include/linux/vmalloc.h | 1 + > mm/vmalloc.c | 22 +++++++++++++++++----- > 2 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 096d48aa3437..69250efa03d1 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -76,6 +76,7 @@ struct vmap_area { > unsigned long subtree_max_size; /* in "free" tree */ > struct vm_struct *vm; /* in "busy" tree */ > }; > + unsigned long flags; /* mark type of vm_map_ram area */ > }; > > /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */ > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 5d3fd3e6fe09..190f29bbaaa7 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1586,7 +1586,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) > static struct vmap_area *alloc_vmap_area(unsigned long size, > unsigned long align, > unsigned long vstart, unsigned long vend, > - int node, gfp_t gfp_mask) > + int node, gfp_t gfp_mask, > + unsigned long va_flags) > { > struct vmap_area *va; > unsigned long freed; > @@ -1630,6 +1631,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > va->va_start = addr; > va->va_end = addr + size; > va->vm = NULL; > + va->flags = va_flags; > > spin_lock(&vmap_area_lock); > insert_vmap_area(va, &vmap_area_root, &vmap_area_list); > @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) > > +#define VMAP_RAM 0x1 > +#define VMAP_BLOCK 0x2 > +#define VMAP_FLAGS_MASK 0x3 > Maybe to rename a VMAP_BLOCK to something like VMAP_BLOCK_RESERVED or VMAP_PER_CPU_BLOCK? > struct vmap_block_queue { > spinlock_t lock; > struct list_head free; > @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) > > va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, > VMALLOC_START, VMALLOC_END, > - node, gfp_mask); > + node, gfp_mask, > + VMAP_RAM|VMAP_BLOCK); > A new_vmap_block() is for a per-cpu path. As far as i see the VMAP_BLOCK flag is used to mark a VA that corresponds to a reserved per-cpu free area. Whereas a VMAP_RAM is for VA that was obtained over per-cpu path but over alloc_vmap_area() thus a VA should be read out over "busy" tree directly. Why do you need to set here both VMAP_RAM and VMAP_BLOCK? > if (IS_ERR(va)) { > kfree(vb); > return ERR_CAST(va); > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > return; > } > > - va = find_vmap_area(addr); > + spin_lock(&vmap_area_lock); > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > BUG_ON(!va); > + if (va) > + va->flags &= ~VMAP_RAM; > + spin_unlock(&vmap_area_lock); > debug_check_no_locks_freed((void *)va->va_start, > Agree with Lorenzo. BUG_ON() should be out of spinlock(). Furthermore i think it makes sense to go with WARN_ON_ONCE() and do not kill a system. Instead emit a warning and bailout. What do you think? Maybe separate patch for it? > (va->va_end - va->va_start)); > free_unmap_vmap_area(va); > @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) > } else { > struct vmap_area *va; > va = alloc_vmap_area(size, PAGE_SIZE, > - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); > + VMALLOC_START, VMALLOC_END, > + node, GFP_KERNEL, VMAP_RAM); > if (IS_ERR(va)) > return NULL; > > @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, > if (!(flags & VM_NO_GUARD)) > size += PAGE_SIZE; > > - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); > + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); > if (IS_ERR(va)) { > kfree(area); > return NULL; > I know we have already discussed the new parameter. But what if we just use atomic_set operation to mark VA as either vmap-ram or vmap-block? As for alloc_vmap_area() we set it just as zero. -- Uladzislau Rezki
On 12/20/22 at 05:55pm, Uladzislau Rezki wrote: > > Through vmalloc API, a virtual kernel area is reserved for physical > > address mapping. And vmap_area is used to track them, while vm_struct > > is allocated to associate with the vmap_area to store more information > > and passed out. > > > > However, area reserved via vm_map_ram() is an exception. It doesn't have > > vm_struct to associate with vmap_area. And we can't recognize the > > vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal > > freeing path will set va->vm = NULL before unmapping, please see > > function remove_vm_area(). > > > A normal "free" path sets it to NULL in order to prevent a double-free > of same VA. We can avoid of touching the va->vm if needed and do an unlink > on entry in the remove_vm_area() when a lock is taken to find an area. > > Will it help you? Sorry, this mail sneaked out of my sight until I notice it now. My mutt client makes it look like in the thread I talked with Lorenzo. Yes, as I replied to your v2 patch, that is very helpful, thanks. > > > Meanwhile, there are two types of vm_map_ram area. One is the whole > > vmap_area being reserved and mapped at one time; the other is the > > whole vmap_area with VMAP_BLOCK_SIZE size being reserved, while mapped > > into split regions with smaller size several times via vb_alloc(). > > > > To mark the area reserved through vm_map_ram(), add flags field into > > struct vmap_area. Bit 0 indicates whether it's a vm_map_ram area, > > while bit 1 indicates whether it's a vmap_block type of vm_map_ram > > area. > > > > This is a preparatoin for later use. > > > > Signed-off-by: Baoquan He <bhe@redhat.com> > > --- > > include/linux/vmalloc.h | 1 + > > mm/vmalloc.c | 22 +++++++++++++++++----- > > 2 files changed, 18 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > > index 096d48aa3437..69250efa03d1 100644 > > --- a/include/linux/vmalloc.h > > +++ b/include/linux/vmalloc.h > > @@ -76,6 +76,7 @@ struct vmap_area { > > unsigned long subtree_max_size; /* in "free" tree */ > > struct vm_struct *vm; /* in "busy" tree */ > > }; > > + unsigned long flags; /* mark type of vm_map_ram area */ > > }; > > > > /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */ > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 5d3fd3e6fe09..190f29bbaaa7 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1586,7 +1586,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) > > static struct vmap_area *alloc_vmap_area(unsigned long size, > > unsigned long align, > > unsigned long vstart, unsigned long vend, > > - int node, gfp_t gfp_mask) > > + int node, gfp_t gfp_mask, > > + unsigned long va_flags) > > { > > struct vmap_area *va; > > unsigned long freed; > > @@ -1630,6 +1631,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > > va->va_start = addr; > > va->va_end = addr + size; > > va->vm = NULL; > > + va->flags = va_flags; > > > > spin_lock(&vmap_area_lock); > > insert_vmap_area(va, &vmap_area_root, &vmap_area_list); > > @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) > > > > +#define VMAP_RAM 0x1 > > +#define VMAP_BLOCK 0x2 > > +#define VMAP_FLAGS_MASK 0x3 > > > Maybe to rename a VMAP_BLOCK to something like VMAP_BLOCK_RESERVED or > VMAP_PER_CPU_BLOCK? Both VMAP_BLOCK or VMAP_PER_CPU_BLOCK look good to me, please see my explanation at below. > > > struct vmap_block_queue { > > spinlock_t lock; > > struct list_head free; > > @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) > > > > va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, > > VMALLOC_START, VMALLOC_END, > > - node, gfp_mask); > > + node, gfp_mask, > > + VMAP_RAM|VMAP_BLOCK); > > > A new_vmap_block() is for a per-cpu path. As far as i see the VMAP_BLOCK > flag is used to mark a VA that corresponds to a reserved per-cpu free area. > > Whereas a VMAP_RAM is for VA that was obtained over per-cpu path but > over alloc_vmap_area() thus a VA should be read out over "busy" tree > directly. > > Why do you need to set here both VMAP_RAM and VMAP_BLOCK? My understanding is that the vm_map_ram area has two types, one is the vb percpu area via vb_alloc(), the other is allocated via alloc_vmap_area(). While both of them is got from vm_map_ram() interface, this is the main point that distinguishes the vm_map_ram area than the normal vmalloc area, and this makes vm_map_ram area not owning va->vm pointer. So here, I use flag VMAP_RAM to mark the vm_map_ram area, including the two types; meanwhile, I add VMAP_BLOCK to mark out the vb percpu area. I understand people could have different view about them, e.g as you said, use VMAP_RAM to mark the type of vm_map_ram area allocated through alloc_vmap_area(), while use VMAP_PER_CPU_BLOCK to mark vb percpu area from vb_alloc. In this way, we may need to rename VMAP_RAM to reflect the area allocated from alloc_vmap_area() only. Both is fine to me. > > > if (IS_ERR(va)) { > > kfree(vb); > > return ERR_CAST(va); > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > return; > > } > > > > - va = find_vmap_area(addr); > > + spin_lock(&vmap_area_lock); > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > BUG_ON(!va); > > + if (va) > > + va->flags &= ~VMAP_RAM; > > + spin_unlock(&vmap_area_lock); > > debug_check_no_locks_freed((void *)va->va_start, > > > Agree with Lorenzo. BUG_ON() should be out of spinlock(). Furthermore > i think it makes sense to go with WARN_ON_ONCE() and do not kill a system. > Instead emit a warning and bailout. > > What do you think? Maybe separate patch for it? Agree, your patch looks great to me. Thanks. > > > (va->va_end - va->va_start)); > > free_unmap_vmap_area(va); > > @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) > > } else { > > struct vmap_area *va; > > va = alloc_vmap_area(size, PAGE_SIZE, > > - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); > > + VMALLOC_START, VMALLOC_END, > > + node, GFP_KERNEL, VMAP_RAM); > > if (IS_ERR(va)) > > return NULL; > > > > @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, > > if (!(flags & VM_NO_GUARD)) > > size += PAGE_SIZE; > > > > - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); > > + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); > > if (IS_ERR(va)) { > > kfree(area); > > return NULL; > > > I know we have already discussed the new parameter. But what if we just > use atomic_set operation to mark VA as either vmap-ram or vmap-block? > > As for alloc_vmap_area() we set it just as zero. Sorry, I may not get your point clearly, could you be more specific?
Hi Uladzislau Rezki, On 12/23/22 at 12:14pm, Baoquan He wrote: > On 12/20/22 at 05:55pm, Uladzislau Rezki wrote: ...... > > spin_lock(&vmap_area_lock); > > > insert_vmap_area(va, &vmap_area_root, &vmap_area_list); > > > @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > > > > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) > > > > > > +#define VMAP_RAM 0x1 > > > +#define VMAP_BLOCK 0x2 > > > +#define VMAP_FLAGS_MASK 0x3 > > > > > Maybe to rename a VMAP_BLOCK to something like VMAP_BLOCK_RESERVED or > > VMAP_PER_CPU_BLOCK? > > Both VMAP_BLOCK or VMAP_PER_CPU_BLOCK look good to me, please see my > explanation at below. > > > > > > struct vmap_block_queue { > > > spinlock_t lock; > > > struct list_head free; > > > @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) > > > > > > va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, > > > VMALLOC_START, VMALLOC_END, > > > - node, gfp_mask); > > > + node, gfp_mask, > > > + VMAP_RAM|VMAP_BLOCK); > > > > > A new_vmap_block() is for a per-cpu path. As far as i see the VMAP_BLOCK > > flag is used to mark a VA that corresponds to a reserved per-cpu free area. > > > > Whereas a VMAP_RAM is for VA that was obtained over per-cpu path but > > over alloc_vmap_area() thus a VA should be read out over "busy" tree > > directly. Rethinking about the vmap->flags and the bit0->VMAP_RAM, bit1->VMAP_BLOCK correspondence, it looks better to use bit0->VMAP_RAM to indicate the vm_map_ram area, no matter how it's handled inside vm_map_ram() interface; and use bit1->VMAP_BLOCK to mark out the special vm_map_ram area which is further subdivided and managed by struct vmap_block. With these, you can see that we can identify vm_map_ram area and treat it as one type of vmalloc area, e.g in vread(), s_show(). Means when we are talking about vm_map_ram areas, we use (vmap->flags & VMAP_RAM) to recognize them; when we need to differentiate and handle vm_map_ram areas respectively, we use (vmap->flags & VMAP_BLOCK) to pick out the area which is further managed by vmap_block. Please help check if this is OK to you. > > > > Why do you need to set here both VMAP_RAM and VMAP_BLOCK? > > My understanding is that the vm_map_ram area has two types, one is > the vb percpu area via vb_alloc(), the other is allocated via > alloc_vmap_area(). While both of them is got from vm_map_ram() > interface, this is the main point that distinguishes the vm_map_ram area > than the normal vmalloc area, and this makes vm_map_ram area not owning > va->vm pointer. So here, I use flag VMAP_RAM to mark the vm_map_ram > area, including the two types; meanwhile, I add VMAP_BLOCK to mark out > the vb percpu area. > > I understand people could have different view about them, e.g as you > said, use VMAP_RAM to mark the type of vm_map_ram area allocated through > alloc_vmap_area(), while use VMAP_PER_CPU_BLOCK to mark vb percpu area > from vb_alloc. In this way, we may need to rename VMAP_RAM to reflect > the area allocated from alloc_vmap_area() only. Both is fine to me. > > > > > > if (IS_ERR(va)) { > > > kfree(vb); > > > return ERR_CAST(va); > > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > > return; > > > } > > > > > > - va = find_vmap_area(addr); > > > + spin_lock(&vmap_area_lock); > > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > > BUG_ON(!va); > > > + if (va) > > > + va->flags &= ~VMAP_RAM; > > > + spin_unlock(&vmap_area_lock); > > > debug_check_no_locks_freed((void *)va->va_start, > > > > > Agree with Lorenzo. BUG_ON() should be out of spinlock(). Furthermore > > i think it makes sense to go with WARN_ON_ONCE() and do not kill a system. > > Instead emit a warning and bailout. > > > > What do you think? Maybe separate patch for it? > > Agree, your patch looks great to me. Thanks. > > > > > > (va->va_end - va->va_start)); > > > free_unmap_vmap_area(va); > > > @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) > > > } else { > > > struct vmap_area *va; > > > va = alloc_vmap_area(size, PAGE_SIZE, > > > - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); > > > + VMALLOC_START, VMALLOC_END, > > > + node, GFP_KERNEL, VMAP_RAM); > > > if (IS_ERR(va)) > > > return NULL; > > > > > > @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, > > > if (!(flags & VM_NO_GUARD)) > > > size += PAGE_SIZE; > > > > > > - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); > > > + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); > > > if (IS_ERR(va)) { > > > kfree(area); > > > return NULL; > > > > > I know we have already discussed the new parameter. But what if we just > > use atomic_set operation to mark VA as either vmap-ram or vmap-block? As I replied at above, I take the vm_map_ram as one kind of vmalloc area, and mark out the percpu vmap block handling of vm_map_ram area. Seems the passing in the flags through function parameter is better. Not sure if I got your suggestion correctly, and my code change is appropriate. I have sent v3 according to your and Lorenzo's comments and suggestion, and my rethinking after reading your words. I make some adjustment to try to remove misundersanding or confusion when reading patch and code. Please help check if it's OK.
On Fri, Jan 13, 2023 at 11:55:07AM +0800, Baoquan He wrote: > Hi Uladzislau Rezki, > > On 12/23/22 at 12:14pm, Baoquan He wrote: > > On 12/20/22 at 05:55pm, Uladzislau Rezki wrote: > ...... > > > spin_lock(&vmap_area_lock); > > > > insert_vmap_area(va, &vmap_area_root, &vmap_area_list); > > > > @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > > > > > > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) > > > > > > > > +#define VMAP_RAM 0x1 > > > > +#define VMAP_BLOCK 0x2 > > > > +#define VMAP_FLAGS_MASK 0x3 > > > > > > > Maybe to rename a VMAP_BLOCK to something like VMAP_BLOCK_RESERVED or > > > VMAP_PER_CPU_BLOCK? > > > > Both VMAP_BLOCK or VMAP_PER_CPU_BLOCK look good to me, please see my > > explanation at below. > > > > > > > > > struct vmap_block_queue { > > > > spinlock_t lock; > > > > struct list_head free; > > > > @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) > > > > > > > > va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, > > > > VMALLOC_START, VMALLOC_END, > > > > - node, gfp_mask); > > > > + node, gfp_mask, > > > > + VMAP_RAM|VMAP_BLOCK); > > > > > > > A new_vmap_block() is for a per-cpu path. As far as i see the VMAP_BLOCK > > > flag is used to mark a VA that corresponds to a reserved per-cpu free area. > > > > > > Whereas a VMAP_RAM is for VA that was obtained over per-cpu path but > > > over alloc_vmap_area() thus a VA should be read out over "busy" tree > > > directly. > > Rethinking about the vmap->flags and the bit0->VMAP_RAM, > bit1->VMAP_BLOCK correspondence, it looks better to use bit0->VMAP_RAM > to indicate the vm_map_ram area, no matter how it's handled inside > vm_map_ram() interface; and use bit1->VMAP_BLOCK to mark out the special > vm_map_ram area which is further subdivided and managed by struct > vmap_block. With these, you can see that we can identify vm_map_ram area > and treat it as one type of vmalloc area, e.g in vread(), s_show(). > > Means when we are talking about vm_map_ram areas, we use > (vmap->flags & VMAP_RAM) to recognize them; when we need to > differentiate and handle vm_map_ram areas respectively, we use > (vmap->flags & VMAP_BLOCK) to pick out the area which is further managed > by vmap_block. Please help check if this is OK to you. > > > > > > > Why do you need to set here both VMAP_RAM and VMAP_BLOCK? > > > > My understanding is that the vm_map_ram area has two types, one is > > the vb percpu area via vb_alloc(), the other is allocated via > > alloc_vmap_area(). While both of them is got from vm_map_ram() > > interface, this is the main point that distinguishes the vm_map_ram area > > than the normal vmalloc area, and this makes vm_map_ram area not owning > > va->vm pointer. So here, I use flag VMAP_RAM to mark the vm_map_ram > > area, including the two types; meanwhile, I add VMAP_BLOCK to mark out > > the vb percpu area. > > > > I understand people could have different view about them, e.g as you > > said, use VMAP_RAM to mark the type of vm_map_ram area allocated through > > alloc_vmap_area(), while use VMAP_PER_CPU_BLOCK to mark vb percpu area > > from vb_alloc. In this way, we may need to rename VMAP_RAM to reflect > > the area allocated from alloc_vmap_area() only. Both is fine to me. > > > > > > > > > if (IS_ERR(va)) { > > > > kfree(vb); > > > > return ERR_CAST(va); > > > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > > > return; > > > > } > > > > > > > > - va = find_vmap_area(addr); > > > > + spin_lock(&vmap_area_lock); > > > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > > > BUG_ON(!va); > > > > + if (va) > > > > + va->flags &= ~VMAP_RAM; > > > > + spin_unlock(&vmap_area_lock); > > > > debug_check_no_locks_freed((void *)va->va_start, > > > > > > > Agree with Lorenzo. BUG_ON() should be out of spinlock(). Furthermore > > > i think it makes sense to go with WARN_ON_ONCE() and do not kill a system. > > > Instead emit a warning and bailout. > > > > > > What do you think? Maybe separate patch for it? > > > > Agree, your patch looks great to me. Thanks. > > > > > > > > > (va->va_end - va->va_start)); > > > > free_unmap_vmap_area(va); > > > > @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) > > > > } else { > > > > struct vmap_area *va; > > > > va = alloc_vmap_area(size, PAGE_SIZE, > > > > - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); > > > > + VMALLOC_START, VMALLOC_END, > > > > + node, GFP_KERNEL, VMAP_RAM); > > > > if (IS_ERR(va)) > > > > return NULL; > > > > > > > > @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, > > > > if (!(flags & VM_NO_GUARD)) > > > > size += PAGE_SIZE; > > > > > > > > - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); > > > > + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); > > > > if (IS_ERR(va)) { > > > > kfree(area); > > > > return NULL; > > > > > > > I know we have already discussed the new parameter. But what if we just > > > use atomic_set operation to mark VA as either vmap-ram or vmap-block? > > As I replied at above, I take the vm_map_ram as one kind of vmalloc > area, and mark out the percpu vmap block handling of vm_map_ram area. > Seems the passing in the flags through function parameter is better. Not > sure if I got your suggestion correctly, and my code change is > appropriate. I have sent v3 according to your and Lorenzo's comments and > suggestion, and my rethinking after reading your words. I make some > adjustment to try to remove misundersanding or confusion when reading > patch and code. Please help check if it's OK. > OK, if we decided to go with a parameter it is OK, it is not a big deal and complexity. If needed it can be adjusted later on if there is a need. Thanks! -- Uladzislau Rezki
On 01/16/23 at 06:54pm, Uladzislau Rezki wrote: > On Fri, Jan 13, 2023 at 11:55:07AM +0800, Baoquan He wrote: > > Hi Uladzislau Rezki, > > > > On 12/23/22 at 12:14pm, Baoquan He wrote: > > > On 12/20/22 at 05:55pm, Uladzislau Rezki wrote: > > ...... > > > > spin_lock(&vmap_area_lock); > > > > > insert_vmap_area(va, &vmap_area_root, &vmap_area_list); > > > > > @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > > > > > > > > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) > > > > > > > > > > +#define VMAP_RAM 0x1 > > > > > +#define VMAP_BLOCK 0x2 > > > > > +#define VMAP_FLAGS_MASK 0x3 > > > > > > > > > Maybe to rename a VMAP_BLOCK to something like VMAP_BLOCK_RESERVED or > > > > VMAP_PER_CPU_BLOCK? > > > > > > Both VMAP_BLOCK or VMAP_PER_CPU_BLOCK look good to me, please see my > > > explanation at below. > > > > > > > > > > > > struct vmap_block_queue { > > > > > spinlock_t lock; > > > > > struct list_head free; > > > > > @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) > > > > > > > > > > va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, > > > > > VMALLOC_START, VMALLOC_END, > > > > > - node, gfp_mask); > > > > > + node, gfp_mask, > > > > > + VMAP_RAM|VMAP_BLOCK); > > > > > > > > > A new_vmap_block() is for a per-cpu path. As far as i see the VMAP_BLOCK > > > > flag is used to mark a VA that corresponds to a reserved per-cpu free area. > > > > > > > > Whereas a VMAP_RAM is for VA that was obtained over per-cpu path but > > > > over alloc_vmap_area() thus a VA should be read out over "busy" tree > > > > directly. > > > > Rethinking about the vmap->flags and the bit0->VMAP_RAM, > > bit1->VMAP_BLOCK correspondence, it looks better to use bit0->VMAP_RAM > > to indicate the vm_map_ram area, no matter how it's handled inside > > vm_map_ram() interface; and use bit1->VMAP_BLOCK to mark out the special > > vm_map_ram area which is further subdivided and managed by struct > > vmap_block. With these, you can see that we can identify vm_map_ram area > > and treat it as one type of vmalloc area, e.g in vread(), s_show(). > > > > Means when we are talking about vm_map_ram areas, we use > > (vmap->flags & VMAP_RAM) to recognize them; when we need to > > differentiate and handle vm_map_ram areas respectively, we use > > (vmap->flags & VMAP_BLOCK) to pick out the area which is further managed > > by vmap_block. Please help check if this is OK to you. > > > > > > > > > > Why do you need to set here both VMAP_RAM and VMAP_BLOCK? > > > > > > My understanding is that the vm_map_ram area has two types, one is > > > the vb percpu area via vb_alloc(), the other is allocated via > > > alloc_vmap_area(). While both of them is got from vm_map_ram() > > > interface, this is the main point that distinguishes the vm_map_ram area > > > than the normal vmalloc area, and this makes vm_map_ram area not owning > > > va->vm pointer. So here, I use flag VMAP_RAM to mark the vm_map_ram > > > area, including the two types; meanwhile, I add VMAP_BLOCK to mark out > > > the vb percpu area. > > > > > > I understand people could have different view about them, e.g as you > > > said, use VMAP_RAM to mark the type of vm_map_ram area allocated through > > > alloc_vmap_area(), while use VMAP_PER_CPU_BLOCK to mark vb percpu area > > > from vb_alloc. In this way, we may need to rename VMAP_RAM to reflect > > > the area allocated from alloc_vmap_area() only. Both is fine to me. > > > > > > > > > > > > if (IS_ERR(va)) { > > > > > kfree(vb); > > > > > return ERR_CAST(va); > > > > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > > > > return; > > > > > } > > > > > > > > > > - va = find_vmap_area(addr); > > > > > + spin_lock(&vmap_area_lock); > > > > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > > > > BUG_ON(!va); > > > > > + if (va) > > > > > + va->flags &= ~VMAP_RAM; > > > > > + spin_unlock(&vmap_area_lock); > > > > > debug_check_no_locks_freed((void *)va->va_start, > > > > > > > > > Agree with Lorenzo. BUG_ON() should be out of spinlock(). Furthermore > > > > i think it makes sense to go with WARN_ON_ONCE() and do not kill a system. > > > > Instead emit a warning and bailout. > > > > > > > > What do you think? Maybe separate patch for it? > > > > > > Agree, your patch looks great to me. Thanks. > > > > > > > > > > > > (va->va_end - va->va_start)); > > > > > free_unmap_vmap_area(va); > > > > > @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) > > > > > } else { > > > > > struct vmap_area *va; > > > > > va = alloc_vmap_area(size, PAGE_SIZE, > > > > > - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); > > > > > + VMALLOC_START, VMALLOC_END, > > > > > + node, GFP_KERNEL, VMAP_RAM); > > > > > if (IS_ERR(va)) > > > > > return NULL; > > > > > > > > > > @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, > > > > > if (!(flags & VM_NO_GUARD)) > > > > > size += PAGE_SIZE; > > > > > > > > > > - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); > > > > > + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); > > > > > if (IS_ERR(va)) { > > > > > kfree(area); > > > > > return NULL; > > > > > > > > > I know we have already discussed the new parameter. But what if we just > > > > use atomic_set operation to mark VA as either vmap-ram or vmap-block? > > > > As I replied at above, I take the vm_map_ram as one kind of vmalloc > > area, and mark out the percpu vmap block handling of vm_map_ram area. > > Seems the passing in the flags through function parameter is better. Not > > sure if I got your suggestion correctly, and my code change is > > appropriate. I have sent v3 according to your and Lorenzo's comments and > > suggestion, and my rethinking after reading your words. I make some > > adjustment to try to remove misundersanding or confusion when reading > > patch and code. Please help check if it's OK. > > > OK, if we decided to go with a parameter it is OK, it is not a big deal > and complexity. If needed it can be adjusted later on if there is a > need. My preference for function parameter passing is we don't need do the atomic reading when we want to check va->flags. However, in va->flags setting side, atomic_set() code is simpler than function parameter. flags = atomic_read(&va->flags); if (flags & VMAP_RAM) { } I checked code, and feel it doesn't have much difference, so keep the current code. If there's other thing I didn't think of, we can still change. Thanks.
On Wed, Jan 18, 2023 at 11:09:44AM +0800, Baoquan He wrote: > On 01/16/23 at 06:54pm, Uladzislau Rezki wrote: > > On Fri, Jan 13, 2023 at 11:55:07AM +0800, Baoquan He wrote: > > > Hi Uladzislau Rezki, > > > > > > On 12/23/22 at 12:14pm, Baoquan He wrote: > > > > On 12/20/22 at 05:55pm, Uladzislau Rezki wrote: > > > ...... > > > > > spin_lock(&vmap_area_lock); > > > > > > insert_vmap_area(va, &vmap_area_root, &vmap_area_list); > > > > > > @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > > > > > > > > > > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) > > > > > > > > > > > > +#define VMAP_RAM 0x1 > > > > > > +#define VMAP_BLOCK 0x2 > > > > > > +#define VMAP_FLAGS_MASK 0x3 > > > > > > > > > > > Maybe to rename a VMAP_BLOCK to something like VMAP_BLOCK_RESERVED or > > > > > VMAP_PER_CPU_BLOCK? > > > > > > > > Both VMAP_BLOCK or VMAP_PER_CPU_BLOCK look good to me, please see my > > > > explanation at below. > > > > > > > > > > > > > > > struct vmap_block_queue { > > > > > > spinlock_t lock; > > > > > > struct list_head free; > > > > > > @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) > > > > > > > > > > > > va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, > > > > > > VMALLOC_START, VMALLOC_END, > > > > > > - node, gfp_mask); > > > > > > + node, gfp_mask, > > > > > > + VMAP_RAM|VMAP_BLOCK); > > > > > > > > > > > A new_vmap_block() is for a per-cpu path. As far as i see the VMAP_BLOCK > > > > > flag is used to mark a VA that corresponds to a reserved per-cpu free area. > > > > > > > > > > Whereas a VMAP_RAM is for VA that was obtained over per-cpu path but > > > > > over alloc_vmap_area() thus a VA should be read out over "busy" tree > > > > > directly. > > > > > > Rethinking about the vmap->flags and the bit0->VMAP_RAM, > > > bit1->VMAP_BLOCK correspondence, it looks better to use bit0->VMAP_RAM > > > to indicate the vm_map_ram area, no matter how it's handled inside > > > vm_map_ram() interface; and use bit1->VMAP_BLOCK to mark out the special > > > vm_map_ram area which is further subdivided and managed by struct > > > vmap_block. With these, you can see that we can identify vm_map_ram area > > > and treat it as one type of vmalloc area, e.g in vread(), s_show(). > > > > > > Means when we are talking about vm_map_ram areas, we use > > > (vmap->flags & VMAP_RAM) to recognize them; when we need to > > > differentiate and handle vm_map_ram areas respectively, we use > > > (vmap->flags & VMAP_BLOCK) to pick out the area which is further managed > > > by vmap_block. Please help check if this is OK to you. > > > > > > > > > > > > > Why do you need to set here both VMAP_RAM and VMAP_BLOCK? > > > > > > > > My understanding is that the vm_map_ram area has two types, one is > > > > the vb percpu area via vb_alloc(), the other is allocated via > > > > alloc_vmap_area(). While both of them is got from vm_map_ram() > > > > interface, this is the main point that distinguishes the vm_map_ram area > > > > than the normal vmalloc area, and this makes vm_map_ram area not owning > > > > va->vm pointer. So here, I use flag VMAP_RAM to mark the vm_map_ram > > > > area, including the two types; meanwhile, I add VMAP_BLOCK to mark out > > > > the vb percpu area. > > > > > > > > I understand people could have different view about them, e.g as you > > > > said, use VMAP_RAM to mark the type of vm_map_ram area allocated through > > > > alloc_vmap_area(), while use VMAP_PER_CPU_BLOCK to mark vb percpu area > > > > from vb_alloc. In this way, we may need to rename VMAP_RAM to reflect > > > > the area allocated from alloc_vmap_area() only. Both is fine to me. > > > > > > > > > > > > > > > if (IS_ERR(va)) { > > > > > > kfree(vb); > > > > > > return ERR_CAST(va); > > > > > > @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) > > > > > > return; > > > > > > } > > > > > > > > > > > > - va = find_vmap_area(addr); > > > > > > + spin_lock(&vmap_area_lock); > > > > > > + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); > > > > > > BUG_ON(!va); > > > > > > + if (va) > > > > > > + va->flags &= ~VMAP_RAM; > > > > > > + spin_unlock(&vmap_area_lock); > > > > > > debug_check_no_locks_freed((void *)va->va_start, > > > > > > > > > > > Agree with Lorenzo. BUG_ON() should be out of spinlock(). Furthermore > > > > > i think it makes sense to go with WARN_ON_ONCE() and do not kill a system. > > > > > Instead emit a warning and bailout. > > > > > > > > > > What do you think? Maybe separate patch for it? > > > > > > > > Agree, your patch looks great to me. Thanks. > > > > > > > > > > > > > > > (va->va_end - va->va_start)); > > > > > > free_unmap_vmap_area(va); > > > > > > @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) > > > > > > } else { > > > > > > struct vmap_area *va; > > > > > > va = alloc_vmap_area(size, PAGE_SIZE, > > > > > > - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); > > > > > > + VMALLOC_START, VMALLOC_END, > > > > > > + node, GFP_KERNEL, VMAP_RAM); > > > > > > if (IS_ERR(va)) > > > > > > return NULL; > > > > > > > > > > > > @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, > > > > > > if (!(flags & VM_NO_GUARD)) > > > > > > size += PAGE_SIZE; > > > > > > > > > > > > - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); > > > > > > + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); > > > > > > if (IS_ERR(va)) { > > > > > > kfree(area); > > > > > > return NULL; > > > > > > > > > > > I know we have already discussed the new parameter. But what if we just > > > > > use atomic_set operation to mark VA as either vmap-ram or vmap-block? > > > > > > As I replied at above, I take the vm_map_ram as one kind of vmalloc > > > area, and mark out the percpu vmap block handling of vm_map_ram area. > > > Seems the passing in the flags through function parameter is better. Not > > > sure if I got your suggestion correctly, and my code change is > > > appropriate. I have sent v3 according to your and Lorenzo's comments and > > > suggestion, and my rethinking after reading your words. I make some > > > adjustment to try to remove misundersanding or confusion when reading > > > patch and code. Please help check if it's OK. > > > > > OK, if we decided to go with a parameter it is OK, it is not a big deal > > and complexity. If needed it can be adjusted later on if there is a > > need. > > My preference for function parameter passing is we don't need do the > atomic reading when we want to check va->flags. However, in va->flags > setting side, atomic_set() code is simpler than function parameter. > > flags = atomic_read(&va->flags); > if (flags & VMAP_RAM) { > > } > > I checked code, and feel it doesn't have much difference, so keep the > current code. If there's other thing I didn't think of, we can still > change. Thanks. > That is fine. -- Uladzislau Rezki
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 096d48aa3437..69250efa03d1 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -76,6 +76,7 @@ struct vmap_area { unsigned long subtree_max_size; /* in "free" tree */ struct vm_struct *vm; /* in "busy" tree */ }; + unsigned long flags; /* mark type of vm_map_ram area */ }; /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5d3fd3e6fe09..190f29bbaaa7 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1586,7 +1586,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) static struct vmap_area *alloc_vmap_area(unsigned long size, unsigned long align, unsigned long vstart, unsigned long vend, - int node, gfp_t gfp_mask) + int node, gfp_t gfp_mask, + unsigned long va_flags) { struct vmap_area *va; unsigned long freed; @@ -1630,6 +1631,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, va->va_start = addr; va->va_end = addr + size; va->vm = NULL; + va->flags = va_flags; spin_lock(&vmap_area_lock); insert_vmap_area(va, &vmap_area_root, &vmap_area_list); @@ -1887,6 +1889,10 @@ struct vmap_area *find_vmap_area(unsigned long addr) #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) +#define VMAP_RAM 0x1 +#define VMAP_BLOCK 0x2 +#define VMAP_FLAGS_MASK 0x3 + struct vmap_block_queue { spinlock_t lock; struct list_head free; @@ -1962,7 +1968,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, VMALLOC_START, VMALLOC_END, - node, gfp_mask); + node, gfp_mask, + VMAP_RAM|VMAP_BLOCK); if (IS_ERR(va)) { kfree(vb); return ERR_CAST(va); @@ -2229,8 +2236,12 @@ void vm_unmap_ram(const void *mem, unsigned int count) return; } - va = find_vmap_area(addr); + spin_lock(&vmap_area_lock); + va = __find_vmap_area((unsigned long)addr, &vmap_area_root); BUG_ON(!va); + if (va) + va->flags &= ~VMAP_RAM; + spin_unlock(&vmap_area_lock); debug_check_no_locks_freed((void *)va->va_start, (va->va_end - va->va_start)); free_unmap_vmap_area(va); @@ -2265,7 +2276,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) } else { struct vmap_area *va; va = alloc_vmap_area(size, PAGE_SIZE, - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); + VMALLOC_START, VMALLOC_END, + node, GFP_KERNEL, VMAP_RAM); if (IS_ERR(va)) return NULL; @@ -2505,7 +2517,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, if (!(flags & VM_NO_GUARD)) size += PAGE_SIZE; - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); + va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0); if (IS_ERR(va)) { kfree(area); return NULL;