[v6,5/5] KVM: selftests: Allowing running dirty_log_perf_test on specific CPUs

Message ID 20221021211816.1525201-6-vipinsh@google.com
State New
Headers
Series dirty_log_perf_test vCPU pinning |

Commit Message

Vipin Sharma Oct. 21, 2022, 9:18 p.m. UTC
  Add a command line option, -c, to pin vCPUs to physical CPUs (pCPUs),
i.e.  to force vCPUs to run on specific pCPUs.

Requirement to implement this feature came in discussion on the patch
"Make page tables for eager page splitting NUMA aware"
https://lore.kernel.org/lkml/YuhPT2drgqL+osLl@google.com/

This feature is useful as it provides a way to analyze performance based
on the vCPUs and dirty log worker locations, like on the different NUMA
nodes or on the same NUMA nodes.

To keep things simple, implementation is intentionally very limited,
either all of the vCPUs will be pinned followed by an optional main
thread or nothing will be pinned.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: David Matlack <dmatlack@google.com>
---
 .../selftests/kvm/dirty_log_perf_test.c       | 22 ++++++-
 .../selftests/kvm/include/perf_test_util.h    |  6 ++
 .../selftests/kvm/lib/perf_test_util.c        | 65 ++++++++++++++++++-
 3 files changed, 90 insertions(+), 3 deletions(-)
  

Comments

Wang, Wei W Oct. 26, 2022, 2:27 a.m. UTC | #1
On Saturday, October 22, 2022 5:18 AM, Vipin Sharma wrote:
> +static void pin_this_task_to_pcpu(uint32_t pcpu) {
> +	cpu_set_t mask;
> +	int r;
> +
> +	CPU_ZERO(&mask);
> +	CPU_SET(pcpu, &mask);
> +	r = sched_setaffinity(0, sizeof(mask), &mask);
> +	TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
> +}
> +
>  static void *vcpu_thread_main(void *data)  {
> +	struct perf_test_vcpu_args *vcpu_args;
>  	struct vcpu_thread *vcpu = data;
> 
> +	vcpu_args = &perf_test_args.vcpu_args[vcpu->vcpu_idx];
> +
> +	if (perf_test_args.pin_vcpus)
> +		pin_this_task_to_pcpu(vcpu_args->pcpu);
> +

I think it would be better to do the thread pinning at the time when the
thread is created by providing a pthread_attr_t attr, e.g. :

pthread_attr_t attr;

CPU_SET(vcpu->pcpu, &cpu_set);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
pthread_create(thread, attr,...);

Also, pinning a vCPU thread to a pCPU is a general operation
which other users would need. I think we could make it more general and
put it to kvm_util, e.g. adding it to the helper function that I'm trying to create

+ * Create a vcpu thread with user provided attribute and the name in
+ * "vcpu-##id" format.
+ */
+void __vcpu_thread_create(struct kvm_vcpu *vcpu, const pthread_attr_t *attr,
+		   void *(*start_routine)(void *), uint32_t private_data_size)

(https://lore.kernel.org/kvm/20221024113445.1022147-1-wei.w.wang@intel.com/T/#m0ceed820278a9deb199871ee6da7d6ec54d065f4)
  
Sean Christopherson Oct. 26, 2022, 3:44 p.m. UTC | #2
On Wed, Oct 26, 2022, Wang, Wei W wrote:
> On Saturday, October 22, 2022 5:18 AM, Vipin Sharma wrote:
> > +static void pin_this_task_to_pcpu(uint32_t pcpu) {
> > +	cpu_set_t mask;
> > +	int r;
> > +
> > +	CPU_ZERO(&mask);
> > +	CPU_SET(pcpu, &mask);
> > +	r = sched_setaffinity(0, sizeof(mask), &mask);
> > +	TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
> > +}
> > +
> >  static void *vcpu_thread_main(void *data)  {
> > +	struct perf_test_vcpu_args *vcpu_args;
> >  	struct vcpu_thread *vcpu = data;
> > 
> > +	vcpu_args = &perf_test_args.vcpu_args[vcpu->vcpu_idx];
> > +
> > +	if (perf_test_args.pin_vcpus)
> > +		pin_this_task_to_pcpu(vcpu_args->pcpu);
> > +
> 
> I think it would be better to do the thread pinning at the time when the
> thread is created by providing a pthread_attr_t attr, e.g. :
> 
> pthread_attr_t attr;
> 
> CPU_SET(vcpu->pcpu, &cpu_set);
> pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
> pthread_create(thread, attr,...);
> 
> Also, pinning a vCPU thread to a pCPU is a general operation
> which other users would need. I think we could make it more general and
> put it to kvm_util.

We could, but it taking advantage of the pinning functionality would require
plumbing a command line option for every test, or alternatively adding partial
command line parsing with a "hidden" global struct to kvm_selftest_init(), though
handling error checking for a truly generic case would be a mess.  Either way,
extending pinning to other tests would require non-trivial effort, and can be
done on top of this series.

That said, it's also trival to extract the pinning helpers to common code, and I
can't think of any reason not to do that straightaway.

Vipin, any objection to squashing the below diff with patch 5?

>  e.g. adding it to the helper function that I'm trying to create

If we go this route in the future, we'd need to add a worker trampoline as the
pinning needs to happen in the worker task itself to guarantee that the pinning
takes effect before the worker does anything useful.  That should be very doable.

I do like the idea of extending __vcpu_thread_create(), but we can do that once
__vcpu_thread_create() lands to avoid further delaying this series.

---
 .../selftests/kvm/dirty_log_perf_test.c       |  7 ++-
 .../selftests/kvm/include/kvm_util_base.h     |  4 ++
 .../selftests/kvm/include/perf_test_util.h    |  8 +--
 tools/testing/selftests/kvm/lib/kvm_util.c    | 54 ++++++++++++++++++
 .../selftests/kvm/lib/perf_test_util.c        | 57 +------------------
 5 files changed, 68 insertions(+), 62 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 35504b36b126..a82fc51d57ca 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -471,8 +471,11 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	if (pcpu_list)
-		perf_test_setup_pinning(pcpu_list, nr_vcpus);
+	if (pcpu_list) {
+		kvm_parse_vcpu_pinning(pcpu_list, perf_test_args.vcpu_to_pcpu,
+				       nr_vcpus);
+		perf_test_args.pin_vcpus = true;
+	}
 
 	TEST_ASSERT(p.iterations >= 2, "The test should have at least two iterations");
 
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index e42a09cd24a0..3bf2333ef95d 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -688,6 +688,10 @@ static inline struct kvm_vm *vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
 
 struct kvm_vcpu *vm_recreate_with_one_vcpu(struct kvm_vm *vm);
 
+void kvm_pin_this_task_to_pcpu(uint32_t pcpu);
+void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
+			    int nr_vcpus);
+
 unsigned long vm_compute_max_gfn(struct kvm_vm *vm);
 unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size);
 unsigned int vm_num_host_pages(enum vm_guest_mode mode, unsigned int num_guest_pages);
diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
index ccfe3b9dc6bd..85320e0640fc 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -27,8 +27,6 @@ struct perf_test_vcpu_args {
 	/* Only used by the host userspace part of the vCPU thread */
 	struct kvm_vcpu *vcpu;
 	int vcpu_idx;
-	/* The pCPU to which this vCPU is pinned. Only valid if pin_vcpus is true. */
-	uint32_t pcpu;
 };
 
 struct perf_test_args {
@@ -43,8 +41,12 @@ struct perf_test_args {
 	bool nested;
 	/* True if all vCPUs are pinned to pCPUs */
 	bool pin_vcpus;
+	/* The vCPU=>pCPU pinning map. Only valid if pin_vcpus is true. */
+	uint32_t vcpu_to_pcpu[KVM_MAX_VCPUS];
 
 	struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
+
+
 };
 
 extern struct perf_test_args perf_test_args;
@@ -64,6 +66,4 @@ void perf_test_guest_code(uint32_t vcpu_id);
 uint64_t perf_test_nested_pages(int nr_vcpus);
 void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[]);
 
-void perf_test_setup_pinning(const char *pcpus_string, int nr_vcpus);
-
 #endif /* SELFTEST_KVM_PERF_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..8292af9d7660 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -11,6 +11,7 @@
 #include "processor.h"
 
 #include <assert.h>
+#include <sched.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -443,6 +444,59 @@ struct kvm_vcpu *vm_recreate_with_one_vcpu(struct kvm_vm *vm)
 	return vm_vcpu_recreate(vm, 0);
 }
 
+void kvm_pin_this_task_to_pcpu(uint32_t pcpu)
+{
+	cpu_set_t mask;
+	int r;
+
+	CPU_ZERO(&mask);
+	CPU_SET(pcpu, &mask);
+	r = sched_setaffinity(0, sizeof(mask), &mask);
+	TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
+}
+
+static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
+{
+	uint32_t pcpu = atoi_non_negative(cpu_str);
+
+	TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
+		    "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
+	return pcpu;
+}
+
+void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
+			    int nr_vcpus)
+{
+	cpu_set_t allowed_mask;
+	char *cpu, *cpu_list;
+	char delim[2] = ",";
+	int i, r;
+
+	cpu_list = strdup(pcpus_string);
+	TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
+
+	r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
+	TEST_ASSERT(!r, "sched_getaffinity() failed");
+
+	cpu = strtok(cpu_list, delim);
+
+	/* 1. Get all pcpus for vcpus. */
+	for (i = 0; i < nr_vcpus; i++) {
+		TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
+		vcpu_to_pcpu[i] = parse_pcpu(cpu, &allowed_mask);
+		cpu = strtok(NULL, delim);
+	}
+
+	/* 2. Check if the main worker needs to be pinned. */
+	if (cpu) {
+		kvm_pin_this_task_to_pcpu(parse_pcpu(cpu, &allowed_mask));
+		cpu = strtok(NULL, delim);
+	}
+
+	TEST_ASSERT(!cpu, "pCPU list contains trailing garbage characters '%s'", cpu);
+	free(cpu_list);
+}
+
 /*
  * Userspace Memory Region Find
  *
diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
index 520d1f896d61..1d133007d7de 100644
--- a/tools/testing/selftests/kvm/lib/perf_test_util.c
+++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
@@ -5,7 +5,6 @@
 #define _GNU_SOURCE
 
 #include <inttypes.h>
-#include <sched.h>
 
 #include "kvm_util.h"
 #include "perf_test_util.h"
@@ -243,17 +242,6 @@ void __weak perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_v
 	exit(KSFT_SKIP);
 }
 
-static void pin_this_task_to_pcpu(uint32_t pcpu)
-{
-	cpu_set_t mask;
-	int r;
-
-	CPU_ZERO(&mask);
-	CPU_SET(pcpu, &mask);
-	r = sched_setaffinity(0, sizeof(mask), &mask);
-	TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
-}
-
 static void *vcpu_thread_main(void *data)
 {
 	struct perf_test_vcpu_args *vcpu_args;
@@ -262,7 +250,7 @@ static void *vcpu_thread_main(void *data)
 	vcpu_args = &perf_test_args.vcpu_args[vcpu->vcpu_idx];
 
 	if (perf_test_args.pin_vcpus)
-		pin_this_task_to_pcpu(vcpu_args->pcpu);
+		kvm_pin_this_task_to_pcpu(perf_test_args.vcpu_to_pcpu[vcpu->vcpu_idx]);
 
 	WRITE_ONCE(vcpu->running, true);
 
@@ -312,46 +300,3 @@ void perf_test_join_vcpu_threads(int nr_vcpus)
 	for (i = 0; i < nr_vcpus; i++)
 		pthread_join(vcpu_threads[i].thread, NULL);
 }
-
-static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
-{
-	uint32_t pcpu = atoi_non_negative(cpu_str);
-
-	TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
-		    "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
-	return pcpu;
-}
-
-void perf_test_setup_pinning(const char *pcpus_string, int nr_vcpus)
-{
-	cpu_set_t allowed_mask;
-	char *cpu, *cpu_list;
-	char delim[2] = ",";
-	int i, r;
-
-	cpu_list = strdup(pcpus_string);
-	TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
-
-	r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
-	TEST_ASSERT(!r, "sched_getaffinity() failed");
-
-	cpu = strtok(cpu_list, delim);
-
-	/* 1. Get all pcpus for vcpus. */
-	for (i = 0; i < nr_vcpus; i++) {
-		TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
-		perf_test_args.vcpu_args[i].pcpu = parse_pcpu(cpu, &allowed_mask);
-		cpu = strtok(NULL, delim);
-	}
-
-	perf_test_args.pin_vcpus = true;
-
-	/* 2. Check if the main worker needs to be pinned. */
-	if (cpu) {
-		pin_this_task_to_pcpu(parse_pcpu(cpu, &allowed_mask));
-		cpu = strtok(NULL, delim);
-	}
-
-	TEST_ASSERT(!cpu, "pCPU list contains trailing garbage characters '%s'", cpu);
-	free(cpu_list);
-}

base-commit: 076ac4ca97225d6b8698a9b066153b556e97be7c
--
  
Vipin Sharma Oct. 26, 2022, 6:17 p.m. UTC | #3
On Wed, Oct 26, 2022 at 8:44 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Oct 26, 2022, Wang, Wei W wrote:
> > On Saturday, October 22, 2022 5:18 AM, Vipin Sharma wrote:
> > > +static void pin_this_task_to_pcpu(uint32_t pcpu) {
> > > +   cpu_set_t mask;
> > > +   int r;
> > > +
> > > +   CPU_ZERO(&mask);
> > > +   CPU_SET(pcpu, &mask);
> > > +   r = sched_setaffinity(0, sizeof(mask), &mask);
> > > +   TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
> > > +}
> > > +
> > >  static void *vcpu_thread_main(void *data)  {
> > > +   struct perf_test_vcpu_args *vcpu_args;
> > >     struct vcpu_thread *vcpu = data;
> > >
> > > +   vcpu_args = &perf_test_args.vcpu_args[vcpu->vcpu_idx];
> > > +
> > > +   if (perf_test_args.pin_vcpus)
> > > +           pin_this_task_to_pcpu(vcpu_args->pcpu);
> > > +
> >
> > I think it would be better to do the thread pinning at the time when the
> > thread is created by providing a pthread_attr_t attr, e.g. :
> >
> > pthread_attr_t attr;
> >
> > CPU_SET(vcpu->pcpu, &cpu_set);
> > pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
> > pthread_create(thread, attr,...);
> >
> > Also, pinning a vCPU thread to a pCPU is a general operation
> > which other users would need. I think we could make it more general and
> > put it to kvm_util.
>
> We could, but it taking advantage of the pinning functionality would require
> plumbing a command line option for every test, or alternatively adding partial
> command line parsing with a "hidden" global struct to kvm_selftest_init(), though
> handling error checking for a truly generic case would be a mess.  Either way,
> extending pinning to other tests would require non-trivial effort, and can be
> done on top of this series.
>
> That said, it's also trival to extract the pinning helpers to common code, and I
> can't think of any reason not to do that straightaway.
>
> Vipin, any objection to squashing the below diff with patch 5?
>

Looks fine to me, I will send v7 with this change.

> >  e.g. adding it to the helper function that I'm trying to create
>
> If we go this route in the future, we'd need to add a worker trampoline as the
> pinning needs to happen in the worker task itself to guarantee that the pinning
> takes effect before the worker does anything useful.  That should be very doable.
>
> I do like the idea of extending __vcpu_thread_create(), but we can do that once
> __vcpu_thread_create() lands to avoid further delaying this series.
>
> ---
>  .../selftests/kvm/dirty_log_perf_test.c       |  7 ++-
>  .../selftests/kvm/include/kvm_util_base.h     |  4 ++
>  .../selftests/kvm/include/perf_test_util.h    |  8 +--
>  tools/testing/selftests/kvm/lib/kvm_util.c    | 54 ++++++++++++++++++
>  .../selftests/kvm/lib/perf_test_util.c        | 57 +------------------
>  5 files changed, 68 insertions(+), 62 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> index 35504b36b126..a82fc51d57ca 100644
> --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> @@ -471,8 +471,11 @@ int main(int argc, char *argv[])
>                 }
>         }
>
> -       if (pcpu_list)
> -               perf_test_setup_pinning(pcpu_list, nr_vcpus);
> +       if (pcpu_list) {
> +               kvm_parse_vcpu_pinning(pcpu_list, perf_test_args.vcpu_to_pcpu,
> +                                      nr_vcpus);
> +               perf_test_args.pin_vcpus = true;
> +       }
>
>         TEST_ASSERT(p.iterations >= 2, "The test should have at least two iterations");
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index e42a09cd24a0..3bf2333ef95d 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -688,6 +688,10 @@ static inline struct kvm_vm *vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
>
>  struct kvm_vcpu *vm_recreate_with_one_vcpu(struct kvm_vm *vm);
>
> +void kvm_pin_this_task_to_pcpu(uint32_t pcpu);
> +void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
> +                           int nr_vcpus);
> +
>  unsigned long vm_compute_max_gfn(struct kvm_vm *vm);
>  unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size);
>  unsigned int vm_num_host_pages(enum vm_guest_mode mode, unsigned int num_guest_pages);
> diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
> index ccfe3b9dc6bd..85320e0640fc 100644
> --- a/tools/testing/selftests/kvm/include/perf_test_util.h
> +++ b/tools/testing/selftests/kvm/include/perf_test_util.h
> @@ -27,8 +27,6 @@ struct perf_test_vcpu_args {
>         /* Only used by the host userspace part of the vCPU thread */
>         struct kvm_vcpu *vcpu;
>         int vcpu_idx;
> -       /* The pCPU to which this vCPU is pinned. Only valid if pin_vcpus is true. */
> -       uint32_t pcpu;
>  };
>
>  struct perf_test_args {
> @@ -43,8 +41,12 @@ struct perf_test_args {
>         bool nested;
>         /* True if all vCPUs are pinned to pCPUs */
>         bool pin_vcpus;
> +       /* The vCPU=>pCPU pinning map. Only valid if pin_vcpus is true. */
> +       uint32_t vcpu_to_pcpu[KVM_MAX_VCPUS];
>
>         struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
> +
> +
>  };
>
>  extern struct perf_test_args perf_test_args;
> @@ -64,6 +66,4 @@ void perf_test_guest_code(uint32_t vcpu_id);
>  uint64_t perf_test_nested_pages(int nr_vcpus);
>  void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[]);
>
> -void perf_test_setup_pinning(const char *pcpus_string, int nr_vcpus);
> -
>  #endif /* SELFTEST_KVM_PERF_TEST_UTIL_H */
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index f1cb1627161f..8292af9d7660 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -11,6 +11,7 @@
>  #include "processor.h"
>
>  #include <assert.h>
> +#include <sched.h>
>  #include <sys/mman.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> @@ -443,6 +444,59 @@ struct kvm_vcpu *vm_recreate_with_one_vcpu(struct kvm_vm *vm)
>         return vm_vcpu_recreate(vm, 0);
>  }
>
> +void kvm_pin_this_task_to_pcpu(uint32_t pcpu)
> +{
> +       cpu_set_t mask;
> +       int r;
> +
> +       CPU_ZERO(&mask);
> +       CPU_SET(pcpu, &mask);
> +       r = sched_setaffinity(0, sizeof(mask), &mask);
> +       TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
> +}
> +
> +static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
> +{
> +       uint32_t pcpu = atoi_non_negative(cpu_str);
> +
> +       TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
> +                   "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
> +       return pcpu;
> +}
> +
> +void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
> +                           int nr_vcpus)
> +{
> +       cpu_set_t allowed_mask;
> +       char *cpu, *cpu_list;
> +       char delim[2] = ",";
> +       int i, r;
> +
> +       cpu_list = strdup(pcpus_string);
> +       TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
> +
> +       r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
> +       TEST_ASSERT(!r, "sched_getaffinity() failed");
> +
> +       cpu = strtok(cpu_list, delim);
> +
> +       /* 1. Get all pcpus for vcpus. */
> +       for (i = 0; i < nr_vcpus; i++) {
> +               TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
> +               vcpu_to_pcpu[i] = parse_pcpu(cpu, &allowed_mask);
> +               cpu = strtok(NULL, delim);
> +       }
> +
> +       /* 2. Check if the main worker needs to be pinned. */
> +       if (cpu) {
> +               kvm_pin_this_task_to_pcpu(parse_pcpu(cpu, &allowed_mask));
> +               cpu = strtok(NULL, delim);
> +       }
> +
> +       TEST_ASSERT(!cpu, "pCPU list contains trailing garbage characters '%s'", cpu);
> +       free(cpu_list);
> +}
> +
>  /*
>   * Userspace Memory Region Find
>   *
> diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
> index 520d1f896d61..1d133007d7de 100644
> --- a/tools/testing/selftests/kvm/lib/perf_test_util.c
> +++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
> @@ -5,7 +5,6 @@
>  #define _GNU_SOURCE
>
>  #include <inttypes.h>
> -#include <sched.h>
>
>  #include "kvm_util.h"
>  #include "perf_test_util.h"
> @@ -243,17 +242,6 @@ void __weak perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_v
>         exit(KSFT_SKIP);
>  }
>
> -static void pin_this_task_to_pcpu(uint32_t pcpu)
> -{
> -       cpu_set_t mask;
> -       int r;
> -
> -       CPU_ZERO(&mask);
> -       CPU_SET(pcpu, &mask);
> -       r = sched_setaffinity(0, sizeof(mask), &mask);
> -       TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
> -}
> -
>  static void *vcpu_thread_main(void *data)
>  {
>         struct perf_test_vcpu_args *vcpu_args;
> @@ -262,7 +250,7 @@ static void *vcpu_thread_main(void *data)
>         vcpu_args = &perf_test_args.vcpu_args[vcpu->vcpu_idx];
>
>         if (perf_test_args.pin_vcpus)
> -               pin_this_task_to_pcpu(vcpu_args->pcpu);
> +               kvm_pin_this_task_to_pcpu(perf_test_args.vcpu_to_pcpu[vcpu->vcpu_idx]);
>
>         WRITE_ONCE(vcpu->running, true);
>
> @@ -312,46 +300,3 @@ void perf_test_join_vcpu_threads(int nr_vcpus)
>         for (i = 0; i < nr_vcpus; i++)
>                 pthread_join(vcpu_threads[i].thread, NULL);
>  }
> -
> -static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
> -{
> -       uint32_t pcpu = atoi_non_negative(cpu_str);
> -
> -       TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
> -                   "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
> -       return pcpu;
> -}
> -
> -void perf_test_setup_pinning(const char *pcpus_string, int nr_vcpus)
> -{
> -       cpu_set_t allowed_mask;
> -       char *cpu, *cpu_list;
> -       char delim[2] = ",";
> -       int i, r;
> -
> -       cpu_list = strdup(pcpus_string);
> -       TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
> -
> -       r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
> -       TEST_ASSERT(!r, "sched_getaffinity() failed");
> -
> -       cpu = strtok(cpu_list, delim);
> -
> -       /* 1. Get all pcpus for vcpus. */
> -       for (i = 0; i < nr_vcpus; i++) {
> -               TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
> -               perf_test_args.vcpu_args[i].pcpu = parse_pcpu(cpu, &allowed_mask);
> -               cpu = strtok(NULL, delim);
> -       }
> -
> -       perf_test_args.pin_vcpus = true;
> -
> -       /* 2. Check if the main worker needs to be pinned. */
> -       if (cpu) {
> -               pin_this_task_to_pcpu(parse_pcpu(cpu, &allowed_mask));
> -               cpu = strtok(NULL, delim);
> -       }
> -
> -       TEST_ASSERT(!cpu, "pCPU list contains trailing garbage characters '%s'", cpu);
> -       free(cpu_list);
> -}
>
> base-commit: 076ac4ca97225d6b8698a9b066153b556e97be7c
> --
>
  
Wang, Wei W Oct. 27, 2022, 12:03 p.m. UTC | #4
On Wednesday, October 26, 2022 11:44 PM, Sean Christopherson wrote:
> > I think it would be better to do the thread pinning at the time when
> > the thread is created by providing a pthread_attr_t attr, e.g. :
> >
> > pthread_attr_t attr;
> >
> > CPU_SET(vcpu->pcpu, &cpu_set);
> > pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
> > pthread_create(thread, attr,...);
> >
> > Also, pinning a vCPU thread to a pCPU is a general operation which
> > other users would need. I think we could make it more general and put
> > it to kvm_util.
> 
> We could, but it taking advantage of the pinning functionality would require
> plumbing a command line option for every test, 

I think we could make this "pinning" be optional (no need to force everyone
to use it).

> or alternatively adding partial
> command line parsing with a "hidden" global struct to kvm_selftest_init(),
> though handling error checking for a truly generic case would be a mess.  Either
> way, extending pinning to other tests would require non-trivial effort, and can be
> done on top of this series.
> 
> That said, it's also trival to extract the pinning helpers to common code, and I
> can't think of any reason not to do that straightaway.
> 
> Vipin, any objection to squashing the below diff with patch 5?
> 
> >  e.g. adding it to the helper function that I'm trying to create
> 
> If we go this route in the future, we'd need to add a worker trampoline as the
> pinning needs to happen in the worker task itself to guarantee that the pinning
> takes effect before the worker does anything useful.  That should be very
> doable.

The alternative way is the one I shared before, using this:

/* Thread created with attribute ATTR will be limited to run only on
   the processors represented in CPUSET.  */
extern int pthread_attr_setaffinity_np (pthread_attr_t *__attr,
                                 size_t __cpusetsize,
                                 const cpu_set_t *__cpuset)

Basically, the thread is created on the pCPU as user specified.
I think this is better than "creating the thread on an arbitrary pCPU
and then pinning it to the user specified pCPU in the thread's start routine".

> 
> I do like the idea of extending __vcpu_thread_create(), but we can do that once
> __vcpu_thread_create() lands to avoid further delaying this series.

Sounds good. I can move some of those to vcpu_thread_create() once it's ready later.

>  struct perf_test_args {
> @@ -43,8 +41,12 @@ struct perf_test_args {
>  	bool nested;
>  	/* True if all vCPUs are pinned to pCPUs */
>  	bool pin_vcpus;
> +	/* The vCPU=>pCPU pinning map. Only valid if pin_vcpus is true. */
> +	uint32_t vcpu_to_pcpu[KVM_MAX_VCPUS];

How about putting the pcpu id to "struct kvm_vcpu"? (please see below code
posed to shows how that works). This is helpful when we later make this more generic,
as kvm_vcpu is used by everyone.

Probably we also don't need "bool pin_vcpus". We could initialize
pcpu_id to -1 to indicate that the vcpu doesn't need pinning (this is also what I meant 
above optional for other users).

Put the whole changes together (tested and worked fine), FYI:

diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index b30500cc197e..2829c98078d0 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -304,7 +304,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
        int nr_vcpus = params->nr_vcpus;

        vm = perf_test_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1,
-                                params->backing_src, !overlap_memory_access);
+                                params->backing_src, !overlap_memory_access, NULL);

        perf_test_start_vcpu_threads(nr_vcpus, vcpu_thread_main);

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index dcdb6964b1dc..e19c3ce32c62 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -286,7 +286,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
        int r, i;

        vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
-                                p->src_type, p->partition_vcpu_memory_access);
+                                p->src_type, p->partition_vcpu_memory_access, NULL);

        demand_paging_size = get_backing_src_pagesz(p->src_type);

diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 35504b36b126..cbe7de28e094 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -132,6 +132,7 @@ struct test_params {
        bool partition_vcpu_memory_access;
        enum vm_mem_backing_src_type backing_src;
        int slots;
+       char *pcpu_list;
 };

 static void toggle_dirty_logging(struct kvm_vm *vm, int slots, bool enable)
@@ -223,7 +224,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)

        vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
                                 p->slots, p->backing_src,
-                                p->partition_vcpu_memory_access);
+                                p->partition_vcpu_memory_access,
+                                p->pcpu_list);

        perf_test_set_wr_fract(vm, p->wr_fract);

@@ -401,13 +403,13 @@ static void help(char *name)
 int main(int argc, char *argv[])
 {
        int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
-       const char *pcpu_list = NULL;
        struct test_params p = {
                .iterations = TEST_HOST_LOOP_N,
                .wr_fract = 1,
                .partition_vcpu_memory_access = true,
                .backing_src = DEFAULT_VM_MEM_SRC,
                .slots = 1,
+               .pcpu_list = NULL,
        };
        int opt;
@@ -424,7 +426,7 @@ int main(int argc, char *argv[])
                        guest_percpu_mem_size = parse_size(optarg);
                        break;
                case 'c':
-                       pcpu_list = optarg;
+                       p.pcpu_list = optarg;
                        break;
                case 'e':
                        /* 'e' is for evil. */
@@ -471,9 +473,6 @@ int main(int argc, char *argv[])
                }
        }

-       if (pcpu_list)
-               perf_test_setup_pinning(pcpu_list, nr_vcpus);
-
        TEST_ASSERT(p.iterations >= 2, "The test should have at least two iterations");

        pr_info("Test iterations: %"PRIu64"\n", p.iterations);
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index c9286811a4cb..d403b374bae5 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -7,7 +7,11 @@
 #ifndef SELFTEST_KVM_UTIL_H
 #define SELFTEST_KVM_UTIL_H

+#include <pthread.h>
+
 #include "kvm_util_base.h"
 #include "ucall_common.h"

+void kvm_parse_vcpu_pinning(struct kvm_vcpu **vcpu, int nr_vcpus, const char *pcpus_string);
+
 #endif /* SELFTEST_KVM_UTIL_H */
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index e42a09cd24a0..79867a478a81 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -47,6 +47,7 @@ struct userspace_mem_region {
 struct kvm_vcpu {
        struct list_head list;
        uint32_t id;
+       int pcpu_id;
        int fd;
        struct kvm_vm *vm;
        struct kvm_run *run;
diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
index ccfe3b9dc6bd..81428022bdb5 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -52,7 +52,8 @@ extern struct perf_test_args perf_test_args;
 struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int nr_vcpus,
                                   uint64_t vcpu_memory_bytes, int slots,
                                   enum vm_mem_backing_src_type backing_src,
-                                  bool partition_vcpu_memory_access);
+                                  bool partition_vcpu_memory_access,
+                                  char *pcpu_list);
 void perf_test_destroy_vm(struct kvm_vm *vm);

 void perf_test_set_wr_fract(struct kvm_vm *vm, int wr_fract);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..8acee6d4ccbe 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1114,6 +1114,7 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)

        vcpu->vm = vm;
        vcpu->id = vcpu_id;
+       vcpu->pcpu_id = -1;
        vcpu->fd = __vm_ioctl(vm, KVM_CREATE_VCPU, (void *)(unsigned long)vcpu_id);
        TEST_ASSERT(vcpu->fd >= 0, KVM_IOCTL_ERROR(KVM_CREATE_VCPU, vcpu->fd));

@@ -2021,3 +2022,58 @@ void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data,
                break;
        }
 }
+
+void kvm_pin_this_task_to_pcpu(uint32_t pcpu)
+{
+       cpu_set_t mask;
+       int r;
+
+       CPU_ZERO(&mask);
+       CPU_SET(pcpu, &mask);
+       r = sched_setaffinity(0, sizeof(mask), &mask);
+       TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
+}
+
+static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
+{
+       uint32_t pcpu = atoi_non_negative(cpu_str);
+
+       TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
+                   "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
+       return pcpu;
+}
+
+void kvm_parse_vcpu_pinning(struct kvm_vcpu **vcpu, int nr_vcpus, const char *pcpus_string)
+{
+       cpu_set_t allowed_mask;
+       char *cpu, *cpu_list;
+       char delim[2] = ",";
+       int i, r;
+
+       if (!pcpus_string)
+               return;
+
+       cpu_list = strdup(pcpus_string);
+       TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
+
+       r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
+       TEST_ASSERT(!r, "sched_getaffinity() failed");
+
+       cpu = strtok(cpu_list, delim);
+
+       /* 1. Get all pcpus for vcpus. */
+       for (i = 0; i < nr_vcpus; i++) {
+               TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
+               vcpu[i]->pcpu_id = parse_pcpu(cpu, &allowed_mask);
+               cpu = strtok(NULL, delim);
+       }
+
+       /* 2. Check if the main worker needs to be pinned. */
+       if (cpu) {
+               kvm_pin_this_task_to_pcpu(parse_pcpu(cpu, &allowed_mask));
+               cpu = strtok(NULL, delim);
+       }
+
+       TEST_ASSERT(!cpu, "pCPU list contains trailing garbage characters '%s'", cpu);
+       free(cpu_list);
+}
diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
index 520d1f896d61..95166c5a77f7 100644
--- a/tools/testing/selftests/kvm/lib/perf_test_util.c
+++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
@@ -112,7 +112,8 @@ void perf_test_setup_vcpus(struct kvm_vm *vm, int nr_vcpus,
 struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int nr_vcpus,
                                   uint64_t vcpu_memory_bytes, int slots,
                                   enum vm_mem_backing_src_type backing_src,
-                                  bool partition_vcpu_memory_access)
+                                  bool partition_vcpu_memory_access,
+                                  char *pcpu_list)
 {
        struct perf_test_args *pta = &perf_test_args;
        struct kvm_vm *vm;
@@ -157,7 +158,7 @@ struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int nr_vcpus,
         */
        vm = __vm_create_with_vcpus(mode, nr_vcpus, slot0_pages + guest_num_pages,
                                    perf_test_guest_code, vcpus);
-
+       kvm_parse_vcpu_pinning(vcpus, nr_vcpus, pcpu_list);
        pta->vm = vm;

        /* Put the test region at the top guest physical memory. */
@@ -284,17 +285,29 @@ void perf_test_start_vcpu_threads(int nr_vcpus,
                                  void (*vcpu_fn)(struct perf_test_vcpu_args *))
 {
        int i;
+       pthread_attr_t attr, *attr_p;
+       cpu_set_t cpuset;

        vcpu_thread_fn = vcpu_fn;
        WRITE_ONCE(all_vcpu_threads_running, false);

        for (i = 0; i < nr_vcpus; i++) {
                struct vcpu_thread *vcpu = &vcpu_threads[i];
+               attr_p = NULL;

                vcpu->vcpu_idx = i;
                WRITE_ONCE(vcpu->running, false);

-               pthread_create(&vcpu->thread, NULL, vcpu_thread_main, vcpu);
+               if (vcpus[i]->pcpu_id != -1) {
+                       CPU_ZERO(&cpuset);
+                       CPU_SET(vcpus[i]->pcpu_id, &cpuset);
+                       pthread_attr_init(&attr);
+                       pthread_attr_setaffinity_np(&attr,
+                                               sizeof(cpu_set_t), &cpuset);
+                       attr_p = &attr;
+               }
+
+               pthread_create(&vcpu->thread, attr_p, vcpu_thread_main, vcpu);
        }
        for (i = 0; i < nr_vcpus; i++) {
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index 9968800ec2ec..5dbe09537b2d 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -99,7 +99,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)

        vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
                                 VM_MEM_SRC_ANONYMOUS,
-                                p->partition_vcpu_memory_access);
+                                p->partition_vcpu_memory_access, NULL);

        pr_info("Finished creating vCPUs\n");
  
Sean Christopherson Oct. 27, 2022, 3:56 p.m. UTC | #5
On Thu, Oct 27, 2022, Wang, Wei W wrote:
> On Wednesday, October 26, 2022 11:44 PM, Sean Christopherson wrote:
> > > I think it would be better to do the thread pinning at the time when
> > > the thread is created by providing a pthread_attr_t attr, e.g. :
> > >
> > > pthread_attr_t attr;
> > >
> > > CPU_SET(vcpu->pcpu, &cpu_set);
> > > pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
> > > pthread_create(thread, attr,...);
> > >
> > > Also, pinning a vCPU thread to a pCPU is a general operation which
> > > other users would need. I think we could make it more general and put
> > > it to kvm_util.
> > 
> > We could, but it taking advantage of the pinning functionality would require
> > plumbing a command line option for every test, 
> 
> I think we could make this "pinning" be optional (no need to force everyone
> to use it).

Heh, it's definitely optional.

> > If we go this route in the future, we'd need to add a worker trampoline as the
> > pinning needs to happen in the worker task itself to guarantee that the pinning
> > takes effect before the worker does anything useful.  That should be very
> > doable.
> 
> The alternative way is the one I shared before, using this:
> 
> /* Thread created with attribute ATTR will be limited to run only on
>    the processors represented in CPUSET.  */
> extern int pthread_attr_setaffinity_np (pthread_attr_t *__attr,
>                                  size_t __cpusetsize,
>                                  const cpu_set_t *__cpuset)
> 
> Basically, the thread is created on the pCPU as user specified.
> I think this is better than "creating the thread on an arbitrary pCPU
> and then pinning it to the user specified pCPU in the thread's start routine".

Ah, yeah, that's better.

> > I do like the idea of extending __vcpu_thread_create(), but we can do that once
> > __vcpu_thread_create() lands to avoid further delaying this series.
> 
> Sounds good. I can move some of those to vcpu_thread_create() once it's ready later.
> 
> >  struct perf_test_args {
> > @@ -43,8 +41,12 @@ struct perf_test_args {
> >  	bool nested;
> >  	/* True if all vCPUs are pinned to pCPUs */
> >  	bool pin_vcpus;
> > +	/* The vCPU=>pCPU pinning map. Only valid if pin_vcpus is true. */
> > +	uint32_t vcpu_to_pcpu[KVM_MAX_VCPUS];
> 
> How about putting the pcpu id to "struct kvm_vcpu"? (please see below code
> posed to shows how that works). This is helpful when we later make this more generic,
> as kvm_vcpu is used by everyone.

I don't think "pcpu" belongs in kvm_vcpu, even in the long run.  The vast, vast
majority of tests will never care about pinning, which means that vcpu->pcpu can't
be used for anything except the actual pinning.   And for pinning, the "pcpu"
doesn't need to be persistent information, i.e. doesn't need to live in kvm_vcpu.

> Probably we also don't need "bool pin_vcpus".

Yeah, but for selftests shaving bytes is not exactly top priority, and having a
dedicated flag avoids the need for magic numbers.  If Vipin had used -1, I'd
probably be fine with that, but I'm also totally fine using a dedicated flag too.

> We could initialize pcpu_id to -1 to indicate that the vcpu doesn't need
> pinning (this is also what I meant above optional for other users).
> 
> Put the whole changes together (tested and worked fine), FYI:

The big downside of this is forcing all callers of perf_test_create_vm() to pass
in NULL.  I really want to move away from this pattern as it makes what should be
simple code rather difficult to read due to having a bunch of "dead" params
dangling off the end of function calls.
  
Vipin Sharma Oct. 27, 2022, 8:02 p.m. UTC | #6
On Thu, Oct 27, 2022 at 8:56 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Oct 27, 2022, Wang, Wei W wrote:
> > On Wednesday, October 26, 2022 11:44 PM, Sean Christopherson wrote:
> > > If we go this route in the future, we'd need to add a worker trampoline as the
> > > pinning needs to happen in the worker task itself to guarantee that the pinning
> > > takes effect before the worker does anything useful.  That should be very
> > > doable.
> >
> > The alternative way is the one I shared before, using this:
> >
> > /* Thread created with attribute ATTR will be limited to run only on
> >    the processors represented in CPUSET.  */
> > extern int pthread_attr_setaffinity_np (pthread_attr_t *__attr,
> >                                  size_t __cpusetsize,
> >                                  const cpu_set_t *__cpuset)
> >
> > Basically, the thread is created on the pCPU as user specified.
> > I think this is better than "creating the thread on an arbitrary pCPU
> > and then pinning it to the user specified pCPU in the thread's start routine".
>
> Ah, yeah, that's better.
>

pthread_create() will internally call sched_setaffinity() syscall
after creation of a thread on a random CPU. So, from the performance
side there is not much difference between the two approaches.

However, we will still need pin_this_task_to_pcpu()/sched_affinity()
to move the main thread to a specific pCPU, therefore, I am thinking
of keeping the current approach unless there is a strong objection to
it.

> > Probably we also don't need "bool pin_vcpus".
>
> Yeah, but for selftests shaving bytes is not exactly top priority, and having a
> dedicated flag avoids the need for magic numbers.  If Vipin had used -1, I'd
> probably be fine with that, but I'm also totally fine using a dedicated flag too.
>

Same, it is not performance critical in this case to add a magical -1.
  
Sean Christopherson Oct. 27, 2022, 8:23 p.m. UTC | #7
On Thu, Oct 27, 2022, Vipin Sharma wrote:
> On Thu, Oct 27, 2022 at 8:56 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Oct 27, 2022, Wang, Wei W wrote:
> > > On Wednesday, October 26, 2022 11:44 PM, Sean Christopherson wrote:
> > > > If we go this route in the future, we'd need to add a worker trampoline as the
> > > > pinning needs to happen in the worker task itself to guarantee that the pinning
> > > > takes effect before the worker does anything useful.  That should be very
> > > > doable.
> > >
> > > The alternative way is the one I shared before, using this:
> > >
> > > /* Thread created with attribute ATTR will be limited to run only on
> > >    the processors represented in CPUSET.  */
> > > extern int pthread_attr_setaffinity_np (pthread_attr_t *__attr,
> > >                                  size_t __cpusetsize,
> > >                                  const cpu_set_t *__cpuset)
> > >
> > > Basically, the thread is created on the pCPU as user specified.
> > > I think this is better than "creating the thread on an arbitrary pCPU
> > > and then pinning it to the user specified pCPU in the thread's start routine".
> >
> > Ah, yeah, that's better.
> >
> 
> pthread_create() will internally call sched_setaffinity() syscall
> after creation of a thread on a random CPU. So, from the performance
> side there is not much difference between the two approaches.
> 
> However, we will still need pin_this_task_to_pcpu()/sched_affinity()
> to move the main thread to a specific pCPU, therefore, 

Heh, that's a good point too.

> I am thinking of keeping the current approach unless there is a strong objection
> to it.

No objection here, I don't see an obvious way to make that helper going away.
  
Wang, Wei W Oct. 28, 2022, 2:11 a.m. UTC | #8
On Friday, October 28, 2022 4:03 AM, Vipin Sharma wrote:
> pthread_create() will internally call sched_setaffinity() syscall after creation of a
> thread on a random CPU. So, from the performance side there is not much
> difference between the two approaches.

The main difference I see is that the vcpu could be created on one NUMA node by
default initially and then gets pinned to another NUMA node.

> 
> However, we will still need pin_this_task_to_pcpu()/sched_affinity()
> to move the main thread to a specific pCPU, therefore, I am thinking of keeping
> the current approach unless there is a strong objection to it.

I also don’t have strong objections, and it's up to you for now.
I will re-visit this later after the code consolidation patchset is landed
and see how this could be better consolidated from all user's perspective.
  
Vipin Sharma Oct. 28, 2022, 5:37 p.m. UTC | #9
On Thu, Oct 27, 2022 at 7:11 PM Wang, Wei W <wei.w.wang@intel.com> wrote:
>
> On Friday, October 28, 2022 4:03 AM, Vipin Sharma wrote:
> > pthread_create() will internally call sched_setaffinity() syscall after creation of a
> > thread on a random CPU. So, from the performance side there is not much
> > difference between the two approaches.
>
> The main difference I see is that the vcpu could be created on one NUMA node by
> default initially and then gets pinned to another NUMA node.
>

pthread_create(..., &attr,...) calls clone and then
sched_setaffinity(). This is not different than calling
pthread_create(...,NULL,...) and then explicitly calling
sched_setaffinity() by a user. vCPU creation on one NUMA node and then
getting pinned to another NUMA node is equally probable in both
approaches.
  

Patch

diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 618598ddd993..35504b36b126 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -353,7 +353,7 @@  static void help(char *name)
 	puts("");
 	printf("usage: %s [-h] [-i iterations] [-p offset] [-g] "
 	       "[-m mode] [-n] [-b vcpu bytes] [-v vcpus] [-o] [-s mem type]"
-	       "[-x memslots]\n", name);
+	       "[-x memslots] [-c physical cpus to run test on]\n", name);
 	puts("");
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
 	       TEST_HOST_LOOP_N);
@@ -383,6 +383,17 @@  static void help(char *name)
 	backing_src_help("-s");
 	printf(" -x: Split the memory region into this number of memslots.\n"
 	       "     (default: 1)\n");
+	printf(" -c: Pin tasks to physical CPUs.  Takes a list of comma separated\n"
+	       "     values (target pCPU), one for each vCPU, plus an optional\n"
+	       "     entry for the main application task (specified via entry\n"
+	       "     <nr_vcpus + 1>).  If used, entries must be provided for all\n"
+	       "     vCPUs, i.e. pinning vCPUs is all or nothing.\n\n"
+	       "     E.g. to create 3 vCPUs, pin vCPU0=>pCPU22, vCPU1=>pCPU23,\n"
+	       "     vCPU2=>pCPU24, and pin the application task to pCPU50:\n\n"
+	       "         ./dirty_log_perf_test -v 3 -c 22,23,24,50\n\n"
+	       "     To leave the application task unpinned, drop the final entry:\n\n"
+	       "         ./dirty_log_perf_test -v 3 -c 22,23,24\n\n"
+	       "     (default: no pinning)\n");
 	puts("");
 	exit(0);
 }
@@ -390,6 +401,7 @@  static void help(char *name)
 int main(int argc, char *argv[])
 {
 	int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
+	const char *pcpu_list = NULL;
 	struct test_params p = {
 		.iterations = TEST_HOST_LOOP_N,
 		.wr_fract = 1,
@@ -406,11 +418,14 @@  int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "b:ef:ghi:m:nop:s:v:x:")) != -1) {
+	while ((opt = getopt(argc, argv, "b:c:ef:ghi:m:nop:s:v:x:")) != -1) {
 		switch (opt) {
 		case 'b':
 			guest_percpu_mem_size = parse_size(optarg);
 			break;
+		case 'c':
+			pcpu_list = optarg;
+			break;
 		case 'e':
 			/* 'e' is for evil. */
 			run_vcpus_while_disabling_dirty_logging = true;
@@ -456,6 +471,9 @@  int main(int argc, char *argv[])
 		}
 	}
 
+	if (pcpu_list)
+		perf_test_setup_pinning(pcpu_list, nr_vcpus);
+
 	TEST_ASSERT(p.iterations >= 2, "The test should have at least two iterations");
 
 	pr_info("Test iterations: %"PRIu64"\n",	p.iterations);
diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
index eaa88df0555a..ccfe3b9dc6bd 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -27,6 +27,8 @@  struct perf_test_vcpu_args {
 	/* Only used by the host userspace part of the vCPU thread */
 	struct kvm_vcpu *vcpu;
 	int vcpu_idx;
+	/* The pCPU to which this vCPU is pinned. Only valid if pin_vcpus is true. */
+	uint32_t pcpu;
 };
 
 struct perf_test_args {
@@ -39,6 +41,8 @@  struct perf_test_args {
 
 	/* Run vCPUs in L2 instead of L1, if the architecture supports it. */
 	bool nested;
+	/* True if all vCPUs are pinned to pCPUs */
+	bool pin_vcpus;
 
 	struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
 };
@@ -60,4 +64,6 @@  void perf_test_guest_code(uint32_t vcpu_id);
 uint64_t perf_test_nested_pages(int nr_vcpus);
 void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[]);
 
+void perf_test_setup_pinning(const char *pcpus_string, int nr_vcpus);
+
 #endif /* SELFTEST_KVM_PERF_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
index 9618b37c66f7..520d1f896d61 100644
--- a/tools/testing/selftests/kvm/lib/perf_test_util.c
+++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
@@ -2,7 +2,10 @@ 
 /*
  * Copyright (C) 2020, Google LLC.
  */
+#define _GNU_SOURCE
+
 #include <inttypes.h>
+#include <sched.h>
 
 #include "kvm_util.h"
 #include "perf_test_util.h"
@@ -240,10 +243,27 @@  void __weak perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_v
 	exit(KSFT_SKIP);
 }
 
+static void pin_this_task_to_pcpu(uint32_t pcpu)
+{
+	cpu_set_t mask;
+	int r;
+
+	CPU_ZERO(&mask);
+	CPU_SET(pcpu, &mask);
+	r = sched_setaffinity(0, sizeof(mask), &mask);
+	TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
+}
+
 static void *vcpu_thread_main(void *data)
 {
+	struct perf_test_vcpu_args *vcpu_args;
 	struct vcpu_thread *vcpu = data;
 
+	vcpu_args = &perf_test_args.vcpu_args[vcpu->vcpu_idx];
+
+	if (perf_test_args.pin_vcpus)
+		pin_this_task_to_pcpu(vcpu_args->pcpu);
+
 	WRITE_ONCE(vcpu->running, true);
 
 	/*
@@ -255,7 +275,7 @@  static void *vcpu_thread_main(void *data)
 	while (!READ_ONCE(all_vcpu_threads_running))
 		;
 
-	vcpu_thread_fn(&perf_test_args.vcpu_args[vcpu->vcpu_idx]);
+	vcpu_thread_fn(vcpu_args);
 
 	return NULL;
 }
@@ -292,3 +312,46 @@  void perf_test_join_vcpu_threads(int nr_vcpus)
 	for (i = 0; i < nr_vcpus; i++)
 		pthread_join(vcpu_threads[i].thread, NULL);
 }
+
+static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
+{
+	uint32_t pcpu = atoi_non_negative(cpu_str);
+
+	TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
+		    "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
+	return pcpu;
+}
+
+void perf_test_setup_pinning(const char *pcpus_string, int nr_vcpus)
+{
+	cpu_set_t allowed_mask;
+	char *cpu, *cpu_list;
+	char delim[2] = ",";
+	int i, r;
+
+	cpu_list = strdup(pcpus_string);
+	TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
+
+	r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
+	TEST_ASSERT(!r, "sched_getaffinity() failed");
+
+	cpu = strtok(cpu_list, delim);
+
+	/* 1. Get all pcpus for vcpus. */
+	for (i = 0; i < nr_vcpus; i++) {
+		TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
+		perf_test_args.vcpu_args[i].pcpu = parse_pcpu(cpu, &allowed_mask);
+		cpu = strtok(NULL, delim);
+	}
+
+	perf_test_args.pin_vcpus = true;
+
+	/* 2. Check if the main worker needs to be pinned. */
+	if (cpu) {
+		pin_this_task_to_pcpu(parse_pcpu(cpu, &allowed_mask));
+		cpu = strtok(NULL, delim);
+	}
+
+	TEST_ASSERT(!cpu, "pCPU list contains trailing garbage characters '%s'", cpu);
+	free(cpu_list);
+}