[v17,2/6] ring-buffer: Introducing ring-buffer mapping functions
Commit Message
In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:
ring_buffer_{map,unmap}()
ring_buffer_map_fault()
And controls on the ring-buffer:
ring_buffer_map_get_reader() /* swap reader and head */
Mapping the ring-buffer also involves:
A unique ID for each subbuf of the ring-buffer, currently they are
only identified through their in-kernel VA.
A meta-page, where are stored ring-buffer statistics and a
description for the current reader
The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.
Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Comments
On Tue, 13 Feb 2024 11:49:41 +0000
Vincent Donnefort <vdonnefort@google.com> wrote:
Did you test with lockdep?
> +static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
> + struct ring_buffer_per_cpu *cpu_buffer,
> + bool inc)
> +{
> + unsigned long flags;
> +
> + lockdep_assert_held(cpu_buffer->mapping_lock);
/work/git/linux-trace.git/kernel/trace/ring_buffer.c: In function ‘__rb_inc_dec_mapped’:
/work/git/linux-trace.git/include/linux/lockdep.h:234:61: error: invalid type argument of ‘->’ (have ‘struct mutex’)
234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
| ^~
/work/git/linux-trace.git/include/asm-generic/bug.h:123:32: note: in definition of macro ‘WARN_ON’
123 | int __ret_warn_on = !!(condition); \
| ^~~~~~~~~
/work/git/linux-trace.git/include/linux/lockdep.h:267:9: note: in expansion of macro ‘lockdep_assert’
267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
| ^~~~~~~~~~~~~~
/work/git/linux-trace.git/include/linux/lockdep.h:267:24: note: in expansion of macro ‘lockdep_is_held’
267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
| ^~~~~~~~~~~~~~~
/work/git/linux-trace.git/kernel/trace/ring_buffer.c:6167:9: note: in expansion of macro ‘lockdep_assert_held’
6167 | lockdep_assert_held(cpu_buffer->mapping_lock);
| ^~~~~~~~~~~~~~~~~~~
I believe that is supposed to be:
lockdep_assert_held(&cpu_buffer->mapping_lock);
-- Steve
> +
> + if (inc && cpu_buffer->mapped == UINT_MAX)
> + return -EBUSY;
> +
> + if (WARN_ON(!inc && cpu_buffer->mapped == 0))
> + return -EINVAL;
> +
> + mutex_lock(&buffer->mutex);
> + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> +
> + if (inc)
> + cpu_buffer->mapped++;
> + else
> + cpu_buffer->mapped--;
> +
> + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> + mutex_unlock(&buffer->mutex);
> +
> + return 0;
> +}
On Tue, 13 Feb 2024 15:53:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 13 Feb 2024 11:49:41 +0000
> Vincent Donnefort <vdonnefort@google.com> wrote:
>
> Did you test with lockdep?
>
> > +static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
> > + struct ring_buffer_per_cpu *cpu_buffer,
> > + bool inc)
> > +{
> > + unsigned long flags;
> > +
> > + lockdep_assert_held(cpu_buffer->mapping_lock);
>
> /work/git/linux-trace.git/kernel/trace/ring_buffer.c: In function ‘__rb_inc_dec_mapped’:
> /work/git/linux-trace.git/include/linux/lockdep.h:234:61: error: invalid type argument of ‘->’ (have ‘struct mutex’)
> 234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
> | ^~
> /work/git/linux-trace.git/include/asm-generic/bug.h:123:32: note: in definition of macro ‘WARN_ON’
> 123 | int __ret_warn_on = !!(condition); \
> | ^~~~~~~~~
> /work/git/linux-trace.git/include/linux/lockdep.h:267:9: note: in expansion of macro ‘lockdep_assert’
> 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
> | ^~~~~~~~~~~~~~
> /work/git/linux-trace.git/include/linux/lockdep.h:267:24: note: in expansion of macro ‘lockdep_is_held’
> 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
> | ^~~~~~~~~~~~~~~
> /work/git/linux-trace.git/kernel/trace/ring_buffer.c:6167:9: note: in expansion of macro ‘lockdep_assert_held’
> 6167 | lockdep_assert_held(cpu_buffer->mapping_lock);
> | ^~~~~~~~~~~~~~~~~~~
>
> I believe that is supposed to be:
>
> lockdep_assert_held(&cpu_buffer->mapping_lock);
If this is the only issue with this series, I may just fix up the patch
myself.
Hi Vincent,
kernel test robot noticed the following build errors:
[auto build test ERROR on ca185770db914869ff9fe773bac5e0e5e4165b83]
url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Donnefort/ring-buffer-Zero-ring-buffer-sub-buffers/20240213-195302
base: ca185770db914869ff9fe773bac5e0e5e4165b83
patch link: https://lore.kernel.org/r/20240213114945.3528801-3-vdonnefort%40google.com
patch subject: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions
config: i386-buildonly-randconfig-001-20240214 (https://download.01.org/0day-ci/archive/20240214/202402140910.TFs9k0YR-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240214/202402140910.TFs9k0YR-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202402140910.TFs9k0YR-lkp@intel.com/
All errors (new ones prefixed by >>):
>> kernel/trace/ring_buffer.c:6185:2: error: member reference type 'struct mutex' is not a pointer; did you mean to use '.'?
6185 | lockdep_assert_held(cpu_buffer->mapping_lock);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:267:17: note: expanded from macro 'lockdep_assert_held'
267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:234:52: note: expanded from macro 'lockdep_is_held'
234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
| ^
include/linux/lockdep.h:261:32: note: expanded from macro 'lockdep_assert'
261 | do { WARN_ON(debug_locks && !(cond)); } while (0)
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
include/asm-generic/bug.h:123:25: note: expanded from macro 'WARN_ON'
123 | int __ret_warn_on = !!(condition); \
| ^~~~~~~~~
>> kernel/trace/ring_buffer.c:6185:2: error: cannot take the address of an rvalue of type 'struct lockdep_map'
6185 | lockdep_assert_held(cpu_buffer->mapping_lock);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:267:17: note: expanded from macro 'lockdep_assert_held'
267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:234:45: note: expanded from macro 'lockdep_is_held'
234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
| ^
include/linux/lockdep.h:261:32: note: expanded from macro 'lockdep_assert'
261 | do { WARN_ON(debug_locks && !(cond)); } while (0)
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
include/asm-generic/bug.h:123:25: note: expanded from macro 'WARN_ON'
123 | int __ret_warn_on = !!(condition); \
| ^~~~~~~~~
2 errors generated.
vim +6185 kernel/trace/ring_buffer.c
6174
6175 /*
6176 * Fast-path for rb_buffer_(un)map(). Called whenever the meta-page doesn't need
6177 * to be set-up or torn-down.
6178 */
6179 static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
6180 struct ring_buffer_per_cpu *cpu_buffer,
6181 bool inc)
6182 {
6183 unsigned long flags;
6184
> 6185 lockdep_assert_held(cpu_buffer->mapping_lock);
6186
6187 if (inc && cpu_buffer->mapped == UINT_MAX)
6188 return -EBUSY;
6189
6190 if (WARN_ON(!inc && cpu_buffer->mapped == 0))
6191 return -EINVAL;
6192
6193 mutex_lock(&buffer->mutex);
6194 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
6195
6196 if (inc)
6197 cpu_buffer->mapped++;
6198 else
6199 cpu_buffer->mapped--;
6200
6201 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
6202 mutex_unlock(&buffer->mutex);
6203
6204 return 0;
6205 }
6206
Hi Vincent,
kernel test robot noticed the following build errors:
[auto build test ERROR on ca185770db914869ff9fe773bac5e0e5e4165b83]
url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Donnefort/ring-buffer-Zero-ring-buffer-sub-buffers/20240213-195302
base: ca185770db914869ff9fe773bac5e0e5e4165b83
patch link: https://lore.kernel.org/r/20240213114945.3528801-3-vdonnefort%40google.com
patch subject: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions
config: x86_64-randconfig-161-20240214 (https://download.01.org/0day-ci/archive/20240214/202402141856.fVl4pCHi-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240214/202402141856.fVl4pCHi-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202402141856.fVl4pCHi-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/x86/include/asm/bug.h:87,
from include/linux/bug.h:5,
from include/linux/jump_label.h:256,
from arch/x86/include/asm/string_64.h:6,
from arch/x86/include/asm/string.h:5,
from include/linux/string.h:61,
from include/linux/bitmap.h:12,
from include/linux/cpumask.h:12,
from include/linux/interrupt.h:8,
from include/linux/trace_recursion.h:5,
from kernel/trace/ring_buffer.c:7:
kernel/trace/ring_buffer.c: In function '__rb_inc_dec_mapped':
>> include/linux/lockdep.h:234:52: error: invalid type argument of '->' (have 'struct mutex')
234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
| ^~
include/asm-generic/bug.h:123:25: note: in definition of macro 'WARN_ON'
123 | int __ret_warn_on = !!(condition); \
| ^~~~~~~~~
include/linux/lockdep.h:267:2: note: in expansion of macro 'lockdep_assert'
267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
| ^~~~~~~~~~~~~~
include/linux/lockdep.h:267:17: note: in expansion of macro 'lockdep_is_held'
267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
| ^~~~~~~~~~~~~~~
kernel/trace/ring_buffer.c:6185:2: note: in expansion of macro 'lockdep_assert_held'
6185 | lockdep_assert_held(cpu_buffer->mapping_lock);
| ^~~~~~~~~~~~~~~~~~~
vim +234 include/linux/lockdep.h
f607c668577481 Peter Zijlstra 2009-07-20 233
f8319483f57f1c Peter Zijlstra 2016-11-30 @234 #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
f8319483f57f1c Peter Zijlstra 2016-11-30 235 #define lockdep_is_held_type(lock, r) lock_is_held_type(&(lock)->dep_map, (r))
f607c668577481 Peter Zijlstra 2009-07-20 236
@@ -6,6 +6,8 @@
#include <linux/seq_file.h>
#include <linux/poll.h>
+#include <uapi/linux/trace_mmap.h>
+
struct trace_buffer;
struct ring_buffer_iter;
@@ -221,4 +223,9 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
#define trace_rb_cpu_prepare NULL
#endif
+int ring_buffer_map(struct trace_buffer *buffer, int cpu);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
+ unsigned long pgoff);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
#endif /* _LINUX_RING_BUFFER_H */
new file mode 100644
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include <linux/types.h>
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size: Size of this meta-page.
+ * @meta_struct_len: Size of this structure.
+ * @subbuf_size: Size of each sub-buffer.
+ * @nr_subbufs: Number of subbfs in the ring-buffer, including the reader.
+ * @reader.lost_events: Number of events lost at the time of the reader swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : @nr_subbufs - 1]
+ * @reader.read: Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries: Number of entries in the ring-buffer.
+ * @overrun: Number of entries lost in the ring-buffer.
+ * @read: Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+ __u32 meta_page_size;
+ __u32 meta_struct_len;
+
+ __u32 subbuf_size;
+ __u32 nr_subbufs;
+
+ struct {
+ __u64 lost_events;
+ __u32 id;
+ __u32 read;
+ } reader;
+
+ __u64 flags;
+
+ __u64 entries;
+ __u64 overrun;
+ __u64 read;
+
+ __u64 Reserved1;
+ __u64 Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
@@ -9,6 +9,7 @@
#include <linux/ring_buffer.h>
#include <linux/trace_clock.h>
#include <linux/sched/clock.h>
+#include <linux/cacheflush.h>
#include <linux/trace_seq.h>
#include <linux/spinlock.h>
#include <linux/irq_work.h>
@@ -338,6 +339,7 @@ struct buffer_page {
local_t entries; /* entries on this page */
unsigned long real_end; /* real end of data */
unsigned order; /* order of the page */
+ u32 id; /* ID for external mapping */
struct buffer_data_page *page; /* Actual data page */
};
@@ -484,6 +486,12 @@ struct ring_buffer_per_cpu {
u64 read_stamp;
/* pages removed since last reset */
unsigned long pages_removed;
+
+ unsigned int mapped;
+ struct mutex mapping_lock;
+ unsigned long *subbuf_ids; /* ID to subbuf addr */
+ struct trace_buffer_meta *meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
long nr_pages_to_update;
struct list_head new_pages; /* new pages to add */
@@ -1548,6 +1556,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
init_waitqueue_head(&cpu_buffer->irq_work.waiters);
init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
+ mutex_init(&cpu_buffer->mapping_lock);
bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
GFP_KERNEL, cpu_to_node(cpu));
@@ -1738,8 +1747,6 @@ bool ring_buffer_time_stamp_abs(struct trace_buffer *buffer)
return buffer->time_stamp_abs;
}
-static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
-
static inline unsigned long rb_page_entries(struct buffer_page *bpage)
{
return local_read(&bpage->entries) & RB_WRITE_MASK;
@@ -5160,6 +5167,22 @@ static void rb_clear_buffer_page(struct buffer_page *page)
page->read = 0;
}
+static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ struct trace_buffer_meta *meta = cpu_buffer->meta_page;
+
+ meta->reader.read = cpu_buffer->reader_page->read;
+ meta->reader.id = cpu_buffer->reader_page->id;
+ meta->reader.lost_events = cpu_buffer->lost_events;
+
+ meta->entries = local_read(&cpu_buffer->entries);
+ meta->overrun = local_read(&cpu_buffer->overrun);
+ meta->read = cpu_buffer->read;
+
+ /* Some archs do not have data cache coherency between kernel and user-space */
+ flush_dcache_folio(virt_to_folio(cpu_buffer->meta_page));
+}
+
static void
rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
{
@@ -5204,6 +5227,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->lost_events = 0;
cpu_buffer->last_overrun = 0;
+ if (cpu_buffer->mapped)
+ rb_update_meta_page(cpu_buffer);
+
rb_head_page_activate(cpu_buffer);
cpu_buffer->pages_removed = 0;
}
@@ -5418,6 +5444,12 @@ int ring_buffer_swap_cpu(struct trace_buffer *buffer_a,
cpu_buffer_a = buffer_a->buffers[cpu];
cpu_buffer_b = buffer_b->buffers[cpu];
+ /* It's up to the callers to not try to swap mapped buffers */
+ if (WARN_ON_ONCE(cpu_buffer_a->mapped || cpu_buffer_b->mapped)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
/* At least make sure the two buffers are somewhat the same */
if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
goto out;
@@ -5682,7 +5714,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
* Otherwise, we can simply swap the page with the one passed in.
*/
if (read || (len < (commit - read)) ||
- cpu_buffer->reader_page == cpu_buffer->commit_page) {
+ cpu_buffer->reader_page == cpu_buffer->commit_page ||
+ cpu_buffer->mapped) {
struct buffer_data_page *rpage = cpu_buffer->reader_page->page;
unsigned int rpos = read;
unsigned int pos = 0;
@@ -5901,6 +5934,11 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
cpu_buffer = buffer->buffers[cpu];
+ if (cpu_buffer->mapped) {
+ err = -EBUSY;
+ goto error;
+ }
+
/* Update the number of pages to match the new size */
nr_pages = old_size * buffer->buffers[cpu]->nr_pages;
nr_pages = DIV_ROUND_UP(nr_pages, buffer->subbuf_size);
@@ -6002,6 +6040,338 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
}
EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_set);
+#define subbuf_page(off, start) \
+ virt_to_page((void *)(start + (off << PAGE_SHIFT)))
+
+#define foreach_subbuf_page(sub_order, start, page) \
+ page = subbuf_page(0, (start)); \
+ for (int __off = 0; __off < (1 << (sub_order)); \
+ __off++, page = subbuf_page(__off, (start)))
+
+static inline void subbuf_map_prepare(unsigned long subbuf_start, int order)
+{
+ struct page *page;
+
+ /*
+ * When allocating order > 0 pages, only the first struct page has a
+ * refcount > 1. Increasing the refcount here ensures none of the struct
+ * page composing the sub-buffer is freeed when the mapping is closed.
+ */
+ foreach_subbuf_page(order, subbuf_start, page)
+ page_ref_inc(page);
+}
+
+static inline void subbuf_unmap(unsigned long subbuf_start, int order)
+{
+ struct page *page;
+
+ foreach_subbuf_page(order, subbuf_start, page) {
+ page_ref_dec(page);
+ page->mapping = NULL;
+ }
+}
+
+static void rb_free_subbuf_ids(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ int sub_id;
+
+ for (sub_id = 0; sub_id < cpu_buffer->nr_pages + 1; sub_id++)
+ subbuf_unmap(cpu_buffer->subbuf_ids[sub_id],
+ cpu_buffer->buffer->subbuf_order);
+
+ kfree(cpu_buffer->subbuf_ids);
+ cpu_buffer->subbuf_ids = NULL;
+}
+
+static int rb_alloc_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ struct page *page;
+
+ if (cpu_buffer->meta_page)
+ return 0;
+
+ page = alloc_page(GFP_USER | __GFP_ZERO);
+ if (!page)
+ return -ENOMEM;
+
+ cpu_buffer->meta_page = page_to_virt(page);
+
+ return 0;
+}
+
+static void rb_free_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ unsigned long addr = (unsigned long)cpu_buffer->meta_page;
+
+ if (!addr)
+ return;
+
+ virt_to_page((void *)addr)->mapping = NULL;
+ free_page(addr);
+ cpu_buffer->meta_page = NULL;
+}
+
+static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
+ unsigned long *subbuf_ids)
+{
+ struct trace_buffer_meta *meta = cpu_buffer->meta_page;
+ unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
+ struct buffer_page *first_subbuf, *subbuf;
+ int id = 0;
+
+ subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
+ subbuf_map_prepare(subbuf_ids[id], cpu_buffer->buffer->subbuf_order);
+ cpu_buffer->reader_page->id = id++;
+
+ first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
+ do {
+ if (WARN_ON(id >= nr_subbufs))
+ break;
+
+ subbuf_ids[id] = (unsigned long)subbuf->page;
+ subbuf->id = id;
+ subbuf_map_prepare(subbuf_ids[id], cpu_buffer->buffer->subbuf_order);
+
+ rb_inc_page(&subbuf);
+ id++;
+ } while (subbuf != first_subbuf);
+
+ /* install subbuf ID to kern VA translation */
+ cpu_buffer->subbuf_ids = subbuf_ids;
+
+ meta->meta_page_size = PAGE_SIZE;
+ meta->meta_struct_len = sizeof(*meta);
+ meta->nr_subbufs = nr_subbufs;
+ meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+
+ rb_update_meta_page(cpu_buffer);
+}
+
+static inline struct ring_buffer_per_cpu *
+rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return ERR_PTR(-EINVAL);
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ mutex_lock(&cpu_buffer->mapping_lock);
+
+ if (!cpu_buffer->mapped) {
+ mutex_unlock(&cpu_buffer->mapping_lock);
+ return ERR_PTR(-ENODEV);
+ }
+
+ return cpu_buffer;
+}
+
+static inline void rb_put_mapped_buffer(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ mutex_unlock(&cpu_buffer->mapping_lock);
+}
+
+/*
+ * Fast-path for rb_buffer_(un)map(). Called whenever the meta-page doesn't need
+ * to be set-up or torn-down.
+ */
+static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
+ struct ring_buffer_per_cpu *cpu_buffer,
+ bool inc)
+{
+ unsigned long flags;
+
+ lockdep_assert_held(cpu_buffer->mapping_lock);
+
+ if (inc && cpu_buffer->mapped == UINT_MAX)
+ return -EBUSY;
+
+ if (WARN_ON(!inc && cpu_buffer->mapped == 0))
+ return -EINVAL;
+
+ mutex_lock(&buffer->mutex);
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ if (inc)
+ cpu_buffer->mapped++;
+ else
+ cpu_buffer->mapped--;
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ mutex_unlock(&buffer->mutex);
+
+ return 0;
+}
+
+int ring_buffer_map(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long flags, *subbuf_ids;
+ int err = 0;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -EINVAL;
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ mutex_lock(&cpu_buffer->mapping_lock);
+
+ if (cpu_buffer->mapped) {
+ err = __rb_inc_dec_mapped(buffer, cpu_buffer, true);
+ mutex_unlock(&cpu_buffer->mapping_lock);
+ return err;
+ }
+
+ /* prevent another thread from changing buffer/sub-buffer sizes */
+ mutex_lock(&buffer->mutex);
+
+ err = rb_alloc_meta_page(cpu_buffer);
+ if (err)
+ goto unlock;
+
+ /* subbuf_ids include the reader while nr_pages does not */
+ subbuf_ids = kcalloc(cpu_buffer->nr_pages + 1, sizeof(*subbuf_ids), GFP_KERNEL);
+ if (!subbuf_ids) {
+ rb_free_meta_page(cpu_buffer);
+ err = -ENOMEM;
+ goto unlock;
+ }
+
+ atomic_inc(&cpu_buffer->resize_disabled);
+
+ /*
+ * Lock all readers to block any subbuf swap until the subbuf IDs are
+ * assigned.
+ */
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
+ cpu_buffer->mapped = 1;
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+unlock:
+ mutex_unlock(&buffer->mutex);
+ mutex_unlock(&cpu_buffer->mapping_lock);
+
+ return err;
+}
+
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long flags;
+ int err = 0;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -EINVAL;
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ mutex_lock(&cpu_buffer->mapping_lock);
+
+ if (!cpu_buffer->mapped) {
+ err = -ENODEV;
+ goto out;
+ } else if (cpu_buffer->mapped > 1) {
+ __rb_inc_dec_mapped(buffer, cpu_buffer, false);
+ goto out;
+ }
+
+ mutex_lock(&buffer->mutex);
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ cpu_buffer->mapped = 0;
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+ rb_free_subbuf_ids(cpu_buffer);
+ rb_free_meta_page(cpu_buffer);
+ atomic_dec(&cpu_buffer->resize_disabled);
+
+ mutex_unlock(&buffer->mutex);
+out:
+ mutex_unlock(&cpu_buffer->mapping_lock);
+
+ return err;
+}
+
+/*
+ * +--------------+ pgoff == 0
+ * | meta page |
+ * +--------------+ pgoff == 1
+ * | subbuffer 0 |
+ * +--------------+ pgoff == 1 + (1 << subbuf_order)
+ * | subbuffer 1 |
+ * ...
+ */
+struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
+ unsigned long pgoff)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ unsigned long subbuf_id, subbuf_offset, addr;
+ struct page *page;
+
+ if (!pgoff)
+ return virt_to_page((void *)cpu_buffer->meta_page);
+
+ pgoff--;
+
+ subbuf_id = pgoff >> buffer->subbuf_order;
+ if (subbuf_id > cpu_buffer->nr_pages)
+ return NULL;
+
+ subbuf_offset = pgoff & ((1UL << buffer->subbuf_order) - 1);
+ addr = cpu_buffer->subbuf_ids[subbuf_id] + (subbuf_offset * PAGE_SIZE);
+ page = virt_to_page((void *)addr);
+
+ return page;
+}
+
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long reader_size;
+ unsigned long flags;
+
+ cpu_buffer = rb_get_mapped_buffer(buffer, cpu);
+ if (IS_ERR(cpu_buffer))
+ return (int)PTR_ERR(cpu_buffer);
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+consume:
+ if (rb_per_cpu_empty(cpu_buffer))
+ goto out;
+
+ reader_size = rb_page_size(cpu_buffer->reader_page);
+
+ /*
+ * There are data to be read on the current reader page, we can
+ * return to the caller. But before that, we assume the latter will read
+ * everything. Let's update the kernel reader accordingly.
+ */
+ if (cpu_buffer->reader_page->read < reader_size) {
+ while (cpu_buffer->reader_page->read < reader_size)
+ rb_advance_reader(cpu_buffer);
+ goto out;
+ }
+
+ if (WARN_ON(!rb_get_reader_page(cpu_buffer)))
+ goto out;
+
+ goto consume;
+out:
+ /* Some archs do not have data cache coherency between kernel and user-space */
+ flush_dcache_folio(virt_to_folio(cpu_buffer->reader_page->page));
+
+ rb_update_meta_page(cpu_buffer);
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ rb_put_mapped_buffer(cpu_buffer);
+
+ return 0;
+}
+
/*
* We only allocate new buffers, never free them if the CPU goes down.
* If we were to free the buffer, then the user would lose any trace that was in