Message ID | 20230613160950.3554675-1-ryan.roberts@arm.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp682931vqr; Tue, 13 Jun 2023 09:41:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7ZusHPOfUqFM6nZLJw3zWwPt+vI7OcO3WptjxxgOwnC9tA1bnWf+tiJWfhDMxSTgUfRbbr X-Received: by 2002:ac2:464a:0:b0:4f6:1160:b956 with SMTP id s10-20020ac2464a000000b004f61160b956mr5754020lfo.54.1686674471073; Tue, 13 Jun 2023 09:41:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686674471; cv=none; d=google.com; s=arc-20160816; b=AJLajgEslQwcIBTn2HmCnxdOtEfmBXF1ampF9xQKr40XVW+4njA0OsOi4xHbp+CvHQ hGK54HYxjr/avm8HgMQ4l0H4KLK8XUtVsTJF6U0Ouy4iZxeJSokBu6QAx/O4Wj9W4Xo0 F1nzuKS5n2maulSFwqK9Q9Lo91GdhBPrNwxd3UMrWdJBiTvbXXqQGxPwI9l+gNyybu+9 0uOGx2D1bMDaARXKtL/jokyeBV7WjiJTvIYIA8lfvHgxK+ZKDd9gpJc0SkD4QyP9jgqJ SqiPfJfIkoNr8evUeU5qRjIDcvHJFBP5RmDyqHxPdQkXd4taxCIrE2JtGXOnDSgdnzFP rqnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=me9t2DCnKN+LukP5X6rqgsUNUvRgOZXBnl/uVsoThtM=; b=Tly33KZzRm5NJSy8Iw2R/8jMfYPirYa9nv3n2MO+Yp3pb0PwpJW/iBUKgYdyEVQCQz v4DTll0J3lJuspxMzLggR9fyQTBZHHS7xFfWkkyTnPz7P7i1BL6C9xHqerVLfv+EKkHA iV163FZxlgdcbHDR/gyB5aPqEjpWI3hgN0/bx8ZfaGP6uPEGiRfAGTr+gPTVi3BjeNks h0Qrm0bBjXatsXZVQhbabqpGjYAL0QeMNRpWVQ+d4MS4piPohkYyNqN9ytmPh2zinRdT Vn98x2CyboCLkRQFK1mNaWtZF+1kOfQK42Tiv8FBZGeEA2J8QTPwFrg2thwNQ/1wQIeo 5jtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u6-20020a50eac6000000b00514b1c545d3si7719495edp.347.2023.06.13.09.40.46; Tue, 13 Jun 2023 09:41:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242945AbjFMQKP (ORCPT <rfc822;lekhanya01809@gmail.com> + 99 others); Tue, 13 Jun 2023 12:10:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243032AbjFMQKD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 13 Jun 2023 12:10:03 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 875401981; Tue, 13 Jun 2023 09:10:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F24661FB; Tue, 13 Jun 2023 09:10:46 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BAA7D3F5A1; Tue, 13 Jun 2023 09:10:00 -0700 (PDT) From: Ryan Roberts <ryan.roberts@arm.com> To: Jonathan Corbet <corbet@lwn.net>, Andrew Morton <akpm@linux-foundation.org>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Yu Zhao <yuzhao@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v1 0/2] Report on physically contiguous memory in smaps Date: Tue, 13 Jun 2023 17:09:48 +0100 Message-Id: <20230613160950.3554675-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768606370248138428?= X-GMAIL-MSGID: =?utf-8?q?1768606370248138428?= |
Series |
Report on physically contiguous memory in smaps
|
|
Message
Ryan Roberts
June 13, 2023, 4:09 p.m. UTC
Hi All, I thought I would try my luck with this pair of patches... This series adds new entries to /proc/pid/smaps[_rollup] to report on physically contiguous runs of memory. The first patch reports on the sizes of the runs by binning into power-of-2 blocks and reporting how much memory is in which bin. The second patch reports on how much of the memory is contpte-mapped in the page table (this is a hint that arm64 supports to tell the HW that a range of ptes map physically contiguous memory). With filesystems now supporting large folios in the page cache, this provides a useful way to see what sizes are actually getting mapped. And with the prospect of large folios for anonymous memory and contpte mapping for conformant large folios on the horizon, this reporting will become useful to aid application performance optimization. Perhaps I should really be submitting these patches as part of my large anon folios and contpte sets (which I plan to post soon), but given this touches the user ABI, I thought it was sensible to post it early and separately to get feedback. It would specifically be good to get feedback on: - The exact set of new fields depend on the system that its being run on. Does this cause problem for compat? (specifically the bins are determined based on PAGE_SIZE and PMD_SIZE). - The ContPTEMapped field is effectively arm64-specific. What is the preferred way to handle arch-specific values if not here? The patches are based on mm-unstable (dd69ce3382a2). Some minor conflicts will need to be resolved if rebasing to Linus's tree. I have a branch at [1]. I've tested on Ampere Altra (arm64) only. [1] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/granule_perf/folio_smap-lkml_v1 Thanks, Ryan Ryan Roberts (2): mm: /proc/pid/smaps: Report large folio mappings mm: /proc/pid/smaps: Report contpte mappings Documentation/filesystems/proc.rst | 31 +++++++ fs/proc/task_mmu.c | 134 ++++++++++++++++++++++++++++- 2 files changed, 161 insertions(+), 4 deletions(-) -- 2.25.1
Comments
On Tue, Jun 13, 2023 at 05:09:48PM +0100, Ryan Roberts wrote: > Hi All, > > I thought I would try my luck with this pair of patches... Ack on the idea. Actually I have a script to do just this, but it's based on pagemap (attaching the script at the end). > This series adds new entries to /proc/pid/smaps[_rollup] to report on physically > contiguous runs of memory. The first patch reports on the sizes of the runs by > binning into power-of-2 blocks and reporting how much memory is in which bin. > The second patch reports on how much of the memory is contpte-mapped in the page > table (this is a hint that arm64 supports to tell the HW that a range of ptes > map physically contiguous memory). > > With filesystems now supporting large folios in the page cache, this provides a > useful way to see what sizes are actually getting mapped. And with the prospect > of large folios for anonymous memory and contpte mapping for conformant large > folios on the horizon, this reporting will become useful to aid application > performance optimization. > > Perhaps I should really be submitting these patches as part of my large anon > folios and contpte sets (which I plan to post soon), but given this touches > the user ABI, I thought it was sensible to post it early and separately to get > feedback. > > It would specifically be good to get feedback on: > > - The exact set of new fields depend on the system that its being run on. Does > this cause problem for compat? (specifically the bins are determined based > on PAGE_SIZE and PMD_SIZE). > - The ContPTEMapped field is effectively arm64-specific. What is the preferred > way to handle arch-specific values if not here? No strong opinions here. === $ cat memory-histogram/mem_hist.py """Script that scans VMAs, outputting histograms regarding memory allocations. Example usage: python3 mem_hist.py --omit-file-backed --omit-unfaulted-vmas For every process on the system, this script scans each VMA, counting the number of order n allocations for 0 <= n <= MAX_ORDER. An order n allocation is a region of memory aligned to a PAGESIZE * (2 ^ n) sized region consisting of 2 ^ n pages in which every page is present (according to the data in /proc/<pid>/pagemap). VMA information as in /proc/<pid>/maps is output for all scanned VMAs along with a histogram of allocation orders. For example, this histogram states that there are 12 order 0 allocations, 4 order 1 allocations, 5 order 2 allocations, and so on: [12, 4, 5, 9, 5, 10, 6, 2, 2, 4, 3, 4] In addition to per-VMA histograms, per-process histograms are printed. Per-process histograms are the sum of the histograms of all VMAs contained within it, allowing for an overview of the memory allocations patterns of the process as a whole. Processes, and VMAs under each process are printed sorted in reverse-lexographic order of historgrams. That is, VMAs containing more high order allocations will be printed after ones containing more low order allocations. The output can thus be easily visually scanned to find VMAs in which hugepage use shows the most potential benefit. To reduce output clutter, the options --omit-file-backed exists to omit VMAs that are file backed (which, outside of tmpfs, don't support transparent hugepages on Linux). Additionally, the option --omit-unfaulted-vmas exists to omit VMAs containing zero resident pages. """ import argparse import functools import re import struct import subprocess import sys ALL_PIDS_CMD = "ps --no-headers -e | awk '{ print $1 }'" # Maximum order the script creates histograms up to. This is by default 9 # since the usual hugepage size on x86 is 2MB which is 2**9 4KB pages MAX_ORDER = 9 PAGE_SIZE = 2**12 BLANK_HIST = [0] * (MAX_ORDER + 1) class Vma: """Represents a virtual memory area. Attributes: proc: Process object in which this VMA is contained start_vaddr: Start virtual address of VMA end_vaddr: End virtual address of VMA perms: Permission string of VMA as in /proc/<pid>/maps (eg. rw-p) mapped_file: Path to file backing this VMA from /proc/<pid>/maps, empty string if not file backed. Note there are some cases in Linux where this may be nonempty and the VMA not file backed (eg. memfds) hist: This VMA's histogram as a list of integers """ def __init__(self, proc, start_vaddr, end_vaddr, perms, mapped_file): self.proc = proc self.start_vaddr = start_vaddr self.end_vaddr = end_vaddr self.perms = perms self.mapped_file = mapped_file def is_file_backed(self): """Returns true if this VMA is file backed, false otherwise.""" # The output printed for memfds (eg. /memfd:crosvm) also happens to be a # valid file path on *nix, so special case them return (bool(re.match("(?:/[^/]+)+", self.mapped_file)) and not bool(re.match("^/memfd:", self.mapped_file))) @staticmethod def bitmask(hi, lo): """Returns a bitmask with the bits from index hi to low+1 set.""" return ((1 << (hi - lo)) - 1) << lo @property @functools.lru_cache(maxsize=50000) def hist(self): """Returns this VMA's histogram as a list.""" hist = BLANK_HIST[:] pagemap_file = safe_open_procfile(self.proc.pid, "pagemap", "rb") if not pagemap_file: err_print( "Cannot open /proc/{0}/pagemap, not generating histogram".format( self.proc.pid)) return hist # Page index of start/end VMA virtual addresses vma_start_page_i = self.start_vaddr // PAGE_SIZE vma_end_page_i = self.end_vaddr // PAGE_SIZE for order in range(0, MAX_ORDER + 1): # If there are less than two previous order pages, there can be no more # pages of a higher order so just break out to save time if order > 0 and hist[order - 1] < 2: break # First and last pages aligned to 2**order bytes in this VMA first_aligned_page = (vma_start_page_i & self.bitmask(64, order)) + 2**order last_aligned_page = vma_end_page_i & self.bitmask(64, order) # Iterate over all order-sized and order-aligned chunks in this VMA for start_page_i in range(first_aligned_page, last_aligned_page, 2**order): if self._is_region_present(pagemap_file, start_page_i, start_page_i + 2**order): hist[order] += 1 # Subtract two lower order VMAs so that we don't double-count # order n VMAs as two order n-1 VMAs as well if order > 0: hist[order - 1] -= 2 pagemap_file.close() return hist def _is_region_present(self, pagemap_file, start_page_i, end_page_i): """Returns True if all pages in the given range are resident. Args: pagemap_file: Opened /proc/<pid>/pagemap file for this process start_page_i: Start page index for range end_page_i: End page index for range Returns: True if all pages from page index start_page_i to end_page_i are present according to the pagemap file, False otherwise. """ pagemap_file.seek(start_page_i * 8) for _ in range(start_page_i, end_page_i): # /proc/<pid>/pagemaps contains an 8 byte value for every page page_info, = struct.unpack("Q", pagemap_file.read(8)) # Bit 63 is set if the page is present if not page_info & (1 << 63): return False return True def __str__(self): return ("{start:016x}-{end:016x} {size:<8} {perms:<4} {hist:<50} " "{mapped_file:<40}").format( start=self.start_vaddr, end=self.end_vaddr, size="%dk" % ((self.end_vaddr - self.start_vaddr) // 1024), perms=self.perms, hist=str(self.hist), mapped_file=str(self.mapped_file)) class Process: """Represents a running process. Attributes: vmas: List of VMA objects representing this processes's VMAs pid: Process PID name: Name of process (read from /proc/<pid>/status """ _MAPS_LINE_REGEX = ("([0-9a-f]+)-([0-9a-f]+) ([r-][w-][x-][ps-]) " "[0-9a-f]+ [0-9a-f]+:[0-9a-f]+ [0-9]+[ ]*(.*)") def __init__(self, pid): self.vmas = [] self.pid = pid self.name = None self._read_name() self._read_vma_info() def _read_name(self): """Reads this Process's name from /proc/<pid>/status.""" get_name_sp = subprocess.Popen( "grep Name: /proc/%d/status | awk '{ print $2 }'" % self.pid, shell=True, stdout=subprocess.PIPE) self.name = get_name_sp.communicate()[0].decode("ascii").strip() def _read_vma_info(self): """Populates this Process's VMA list.""" f = safe_open_procfile(self.pid, "maps", "r") if not f: err_print("Could not read maps for process {0}".format(self.pid)) return for line in f: match = re.match(Process._MAPS_LINE_REGEX, line) start_vaddr = int(match.group(1), 16) end_vaddr = int(match.group(2), 16) perms = match.group(3) mapped_file = match.group(4) if match.lastindex == 4 else None self.vmas.append(Vma(self, start_vaddr, end_vaddr, perms, mapped_file)) f.close() @property @functools.lru_cache(maxsize=50000) def hist(self): """The process-level memory allocation histogram. This is the sum of all VMA histograms for every VMA in this process. For example, if a process had two VMAs with the following histograms: [1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0] [0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0] This would return: [1, 3, 5, 3, 0, 0, 0, 0, 0, 0, 0] """ return [sum(x) for x in zip(*[vma.hist for vma in self.vmas])] def __str__(self): return "process {pid:<18} {name:<25} {hist:<50}".format( pid=self.pid, name=str(self.name), hist=str(self.hist)) def safe_open_procfile(pid, file_name, mode): """Safely open the given file under /proc/<pid>. This catches a variety of common errors bound to happen when using this script (eg. permission denied, process already exited). Args: pid: Pid of process (used to construct /proc/<pid>/) file_name: File directly under /proc/<pid>/ to open mode: Mode to pass to open (eg. "w", "r") Returns: File object corresponding to file requested or None if there was an error """ full_path = "/proc/{0}/{1}".format(pid, file_name) try: return open(full_path, mode) except PermissionError: err_print("Not accessing {0} (permission denied)".format(full_path)) except FileNotFoundError: err_print( "Not opening {0} (does not exist, process {1} likely exited)".format( full_path, pid)) def err_print(*args, **kwargs): print(*args, file=sys.stderr, **kwargs) def print_hists(args): """Prints all process and VMA histograms as/per module documentation.""" pid_list_sp = subprocess.Popen( ALL_PIDS_CMD, shell=True, stdout=subprocess.PIPE) pid_list = map(int, pid_list_sp.communicate()[0].splitlines()) procs = [] for pid in pid_list: procs.append(Process(pid)) for proc in sorted(procs, key=lambda p: p.hist[::-1]): # Don't print info on kernel threads or processes we couldn't collect info # on due to insufficent permissions if not proc.vmas: continue print(proc) for vma in sorted(proc.vmas, key=lambda v: v.hist[::-1]): if args.no_unfaulted_vmas and vma.hist == BLANK_HIST: continue elif args.omit_file_backed and vma.is_file_backed(): continue print(" ", vma) if __name__ == "__main__": parser = argparse.ArgumentParser( description=("Create per-process and per-VMA " "histograms of contigous virtual " "memory allocations")) parser.add_argument( "--omit-unfaulted-vmas", dest="no_unfaulted_vmas", action="store_true", help="Omit VMAs containing 0 present pages from output") parser.add_argument( "--omit-file-backed", dest="omit_file_backed", action="store_true", help="Omit VMAs corresponding to mmaped files") print_hists(parser.parse_args())
On 13/06/2023 19:44, Yu Zhao wrote: > On Tue, Jun 13, 2023 at 05:09:48PM +0100, Ryan Roberts wrote: >> Hi All, >> >> I thought I would try my luck with this pair of patches... > > Ack on the idea. > > Actually I have a script to do just this, but it's based on pagemap (attaching the script at the end). I did consider that approach, but it was much more code to write the script than to modify smaps ;-). Longer term, I think it would be good to have it in smaps because its more accessible. For the contpte case we would need to add a bit to every pagemap entry to express that. I'm not sure how palletable that would be for a arch-specific thing? Thanks for the script anyway!