Message ID | 20221212-uvc-race-v4-0-38d7075b03f5@chromium.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp330970wrn; Wed, 14 Dec 2022 08:37:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf7AEZ4/I1fsoyRDQ8ZuBcY9GLV78vSP92hYcoqbyseMQoAG+vvXLZRu1s6aL+jfP69Gk6gT X-Received: by 2002:a17:902:f14c:b0:18f:ac9f:2a02 with SMTP id d12-20020a170902f14c00b0018fac9f2a02mr9485407plb.10.1671035865822; Wed, 14 Dec 2022 08:37:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671035865; cv=none; d=google.com; s=arc-20160816; b=Z7dV3AZFj3QrIzA+T8NLGYYGP18dcvdBNoJqB5dGrCcrDqb3B2+4u0E+7F2JanHmJR KAdGHo/SkWI7w79LPEsG6t4Fn5WhD3v5FYuwDhxIVg1lO3Igwpfngisutraul9XkkE+2 9kYNZ97TsYLe4Xa+LTaAM3QJCBmGoY78duOjJ5zYcHqO5VuTYga6t5gUvuc1YCtAovlP t5xosiBDT+MG6lMAd1c8QjdQ8Ws5ym2v6vMXpdx200ySO2BcxK+Z2QwfdRbE4dWFE0jL kO2jF5HXuMtH/bVcPe333hKLCJ2VfWXW+qT+G0cHu2UxnzPCgmtPfMEXb3Q4WnOLNzkf Q70g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:message-id:content-transfer-encoding :mime-version:subject:date:from:dkim-signature; bh=MVrlltRx17nfXTGhP7yPJwRU3y4M8/9HTbuxEEs49iU=; b=jiTfjYXb5fIpw4X9CX009WsZuB0ccPyAz4rCVdGW+YPaJ7QHVBeliIcJvhR9f0qlQR +e2ulu/hfXNq1FzQGd8Gi2VhGJzZOUVviC3ScD8+fh6jecXGs1GSv5Mbfk+oWCbrHr4b 8ffD/QKi9+1Q7m01JlRRySfuOPI8f8GtoRMGASbEQ5VmHFJXMTHSvpFwQCA0hVtUBfrx beVoTW+vQKymkAJ5/m5QFXjadMHOSk1fSbse2TZij6apZU1FjMUYR1KzOPHoZF+dV1OG J4PylOH7QVhK7uFhQHB4i80mXFMIa8Vheny9Tc+fSht5HV45tOtz1/xTae2bR2456C2K THoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=aXv6T3ty; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q7-20020a170902dac700b00178380f525bsi3603609plx.547.2022.12.14.08.37.32; Wed, 14 Dec 2022 08:37:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=aXv6T3ty; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239038AbiLNQeK (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Wed, 14 Dec 2022 11:34:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238982AbiLNQdk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 14 Dec 2022 11:33:40 -0500 Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73DFF2A436 for <linux-kernel@vger.kernel.org>; Wed, 14 Dec 2022 08:32:22 -0800 (PST) Received: by mail-ed1-x52c.google.com with SMTP id i15so23292947edf.2 for <linux-kernel@vger.kernel.org>; Wed, 14 Dec 2022 08:32:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:from:to:cc:subject:date:message-id:reply-to; bh=MVrlltRx17nfXTGhP7yPJwRU3y4M8/9HTbuxEEs49iU=; b=aXv6T3ty9OZD4OYRcWL5+4mKME8v/e1WDdI9AA+KQJzPhv03AdE8drDDUqdiblA3yP s3/zhNBDtfTEYSkUNDB0AoiAz16cuPGibNxtKMzRSH9ilTXaJbDoVQ2FCuMzaGR2CKlx ERlbbfHgFDDGKgxdZXQ7Jy8oTG0/idZ4EWhqs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MVrlltRx17nfXTGhP7yPJwRU3y4M8/9HTbuxEEs49iU=; b=2Pps8jqq63aWJWU5exAAYio4JldIxeRctdJcYKmljXYkuE+7xYrRt9hdceCQx1ViRQ TbM3YBSuSdmGfjqucpTAIOcE1Z3PYbjOAL9RN3tntop7EhlwmNeM7wpUiGpfzxjB4jxg WQ0CvtNBDeYkOaeX8wgPIItUWIW//Agc8y3q3bKyQArVC9mt59qTJ9zml9QL6EYGTRUk TSAZQ7Ji8sTOmxTR4Veh19TuK0cObb6YXAqJfN2/qYBNnOe/hSMYrdKxnvr3uHMSQ2/r XJalbakLD7hiYPC1ts1JSAlKxEK/SkCjLlpl91DWZdSKaYWUqo26CHMfxGlHrxNk0NZv a3Tg== X-Gm-Message-State: ANoB5pnUJttQNWnZZaYuAemFD2TU8UYta8gA+rRFPQU7l2bf9qam318e OX8/G7HW9GaqPcBfAc9Zr9wLAg== X-Received: by 2002:aa7:cd46:0:b0:46d:e3f8:4ed4 with SMTP id v6-20020aa7cd46000000b0046de3f84ed4mr17653967edw.21.1671035541045; Wed, 14 Dec 2022 08:32:21 -0800 (PST) Received: from alco.roam.corp.google.com ([100.104.168.209]) by smtp.gmail.com with ESMTPSA id ee48-20020a056402293000b004615f7495e0sm6395908edb.8.2022.12.14.08.32.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Dec 2022 08:32:20 -0800 (PST) From: Ricardo Ribalda <ribalda@chromium.org> Date: Wed, 14 Dec 2022 17:31:55 +0100 Subject: [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20221212-uvc-race-v4-0-38d7075b03f5@chromium.org> To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>, Sergey Senozhatsky <senozhatsky@chromium.org>, Mauro Carvalho Chehab <mchehab@kernel.org>, Max Staudt <mstaudt@google.com> Cc: linux-media@vger.kernel.org, stable@vger.kernel.org, Yunke Cao <yunkec@chromium.org>, Ricardo Ribalda <ribalda@chromium.org>, linux-kernel@vger.kernel.org X-Mailer: b4 0.11.0-dev-696ae X-Developer-Signature: v=1; a=openpgp-sha256; l=4414; i=ribalda@chromium.org; h=from:subject:message-id; bh=ogVdjybS9LEoywx0YwN/okaYfuH/y03AYQsWkydp3kQ=; b=owEBbQKS/ZANAwAKAdE30T7POsSIAcsmYgBjmfqNKqeDuIPruv3HRUGHzLrKhw0xXerfWXVR/oWJ Mh7m06eJAjMEAAEKAB0WIQREDzjr+/4oCDLSsx7RN9E+zzrEiAUCY5n6jQAKCRDRN9E+zzrEiGEBD/ 9paXuRWZxtIqzclORW4HCZGYi73FtH4rtSGo1FNlEq1R7vGpC1gQ796/GNRy0vTNoPO+QxaG6poI5J 8vAFLwLc3YCxAUXzcN/QuRqSznplWWU+TiGxWm/I2x+dlNbE0tk30rBktBkJ0rYvQev/gODgWCgghN JKnkI1XV3lxO9nCa/7WllWLHsNpjcGE0reWbaPRdpmwgQxUZyQsn0v1UjsGkI2EPfhhCBA9pLym2ie qJguxwajBM0LmK/ds6p5rVXdEmdz1BoOTa91s5QN89Qfu2n9gPljroekcxObhSEAR7A+TH4JCenRYJ zfstu4W4xhtU5LdCrdBumpkkOmxjw4mDZR4yYtH/asavDzAMEM98ceXIkSRT4ipCkbcQCeDfcD3339 e7BLYFB+fh7dO0R2dEZdw/i0vmr0gxk79TPbMbp/21RZutJPl/Ht2JMQfGOLfv/tojT7MT35xsEjqi ZTlGsBuRhEjLFLnjSEmAKs+AITV2Js6CV0xeCPwkwVkyyS/D4qhsv3NNxGuKjqvutOAcTKVaKNx7OD M75UhkjEUu9w2EF9WyCuIrjRO3YLkRQUrDrN9vqCGLuElX71HVPqqCTn9zxpYlYwQ2RnzYpt+fQe7k 4LxOvbbN2nLIE8NqmuMp9hiq5+moqyLr+SNq2O3Nlg0YoGQiXc424GQnJdlA== X-Developer-Key: i=ribalda@chromium.org; a=openpgp; fpr=9EC3BB66E2FC129A6F90B39556A0D81F9F782DA9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752208104024059594?= X-GMAIL-MSGID: =?utf-8?q?1752208104024059594?= |
Series |
[v4] media: uvcvideo: Fix race condition with usb_kill_urb
|
|
Commit Message
Ricardo Ribalda
Dec. 14, 2022, 4:31 p.m. UTC
usb_kill_urb warranties that all the handlers are finished when it returns, but does not protect against threads that might be handling asynchronously the urb. For UVC, the function uvc_ctrl_status_event_async() takes care of control changes asynchronously. If the code is executed in the following order: CPU 0 CPU 1 ===== ===== uvc_status_complete() uvc_status_stop() uvc_ctrl_status_event_work() uvc_status_start() -> FAIL Then uvc_status_start will keep failing and this error will be shown: <4>[ 5.540139] URB 0000000000000000 submitted while active drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528 Let's improve the current situation, by not re-submiting the urb if we are stopping the status event. Also process the queued work (if any) during stop. CPU 0 CPU 1 ===== ===== uvc_status_complete() uvc_status_stop() uvc_status_start() uvc_ctrl_status_event_work() -> FAIL Hopefully, with the usb layer protection this should be enough to cover all the cases. Cc: stable@vger.kernel.org Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") Reviewed-by: Yunke Cao <yunkec@chromium.org> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> --- uvc: Fix race condition on uvc Make sure that all the async work is finished when we stop the status urb. To: Yunke Cao <yunkec@chromium.org> To: Sergey Senozhatsky <senozhatsky@chromium.org> To: Max Staudt <mstaudt@google.com> To: Laurent Pinchart <laurent.pinchart@ideasonboard.com> To: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-media@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- Changes in v4: - Replace bool with atomic_t to avoid compiler reordering - First complete the async work and then kill the urb to avoid race (Thanks Laurent!) - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org Changes in v3: - Remove the patch for dev->status, makes more sense in another series, and makes the zero day less nervous. - Update reviewed-by (thanks Yunke!). - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org Changes in v2: - Add a patch for not kalloc dev->status - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!) - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org --- drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ drivers/media/usb/uvc/uvc_status.c | 6 ++++++ drivers/media/usb/uvc/uvcvideo.h | 1 + 3 files changed, 10 insertions(+) --- base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c change-id: 20221212-uvc-race-09276ea68bf8 Best regards,
Comments
Hi Ricardo, Thank you for the patch. On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote: > usb_kill_urb warranties that all the handlers are finished when it > returns, but does not protect against threads that might be handling > asynchronously the urb. > > For UVC, the function uvc_ctrl_status_event_async() takes care of > control changes asynchronously. > > If the code is executed in the following order: > > CPU 0 CPU 1 > ===== ===== > uvc_status_complete() > uvc_status_stop() > uvc_ctrl_status_event_work() > uvc_status_start() -> FAIL > > Then uvc_status_start will keep failing and this error will be shown: > > <4>[ 5.540139] URB 0000000000000000 submitted while active > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528 > > Let's improve the current situation, by not re-submiting the urb if > we are stopping the status event. Also process the queued work > (if any) during stop. > > CPU 0 CPU 1 > ===== ===== > uvc_status_complete() > uvc_status_stop() > uvc_status_start() > uvc_ctrl_status_event_work() -> FAIL > > Hopefully, with the usb layer protection this should be enough to cover > all the cases. > > Cc: stable@vger.kernel.org > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") > Reviewed-by: Yunke Cao <yunkec@chromium.org> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > --- > uvc: Fix race condition on uvc > > Make sure that all the async work is finished when we stop the status urb. > > To: Yunke Cao <yunkec@chromium.org> > To: Sergey Senozhatsky <senozhatsky@chromium.org> > To: Max Staudt <mstaudt@google.com> > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > To: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: linux-media@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > Changes in v4: > - Replace bool with atomic_t to avoid compiler reordering > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!) > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org > > Changes in v3: > - Remove the patch for dev->status, makes more sense in another series, and makes > the zero day less nervous. > - Update reviewed-by (thanks Yunke!). > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org > > Changes in v2: > - Add a patch for not kalloc dev->status > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!) > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org > --- > drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ > drivers/media/usb/uvc/uvc_status.c | 6 ++++++ > drivers/media/usb/uvc/uvcvideo.h | 1 + > 3 files changed, 10 insertions(+) > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > index c95a2229f4fa..1be6897a7d6d 100644 > --- a/drivers/media/usb/uvc/uvc_ctrl.c > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work) > > uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > + if (atomic_read(&dev->flush_status)) > + return; > + > /* Resubmit the URB. */ > w->urb->interval = dev->int_ep->desc.bInterval; > ret = usb_submit_urb(w->urb, GFP_KERNEL); > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c > index 7518ffce22ed..4a95850cdc1b 100644 > --- a/drivers/media/usb/uvc/uvc_status.c > +++ b/drivers/media/usb/uvc/uvc_status.c > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) > if (dev->int_urb == NULL) > return 0; > > + atomic_set(&dev->flush_status, 0); > return usb_submit_urb(dev->int_urb, flags); > } > > void uvc_status_stop(struct uvc_device *dev) > { > + struct uvc_ctrl_work *w = &dev->async_ctrl; > + > + atomic_set(&dev->flush_status, 1); Note that atomic_read() and atomic_set() do no imply any memory barrier on most architectures, as far as I can tell. They essentially become READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler will not merge or split loads or stores, or reorder them with respect to load and stores to the *same* memory location, but nothing else. I think you need to add proper barriers, and you can then probably also drop usage of atomic_t. > + if (cancel_work_sync(&w->work)) > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > usb_kill_urb(dev->int_urb); This should get rid of the main race (possibly save the barrier issue), but it's not the most efficient option, and can still be problematic. Consider the following case: CPU0 CPU1 ---- ---- void uvc_status_stop(struct uvc_device *dev) uvc_ctrl_status_event_async() { { ... atomic_set(&dev->flush_status, 1); ... if (cancel_work_sync()) ... schedule_work(); usb_kill_urb(); } } The usb_kill_urb() call ensures that uvc_ctrl_status_event_async() completes before uvc_status_stop() returns, but there will still be work scheduled in that case. uvc_ctrl_status_event_work() will be run later, and as flush_status is set to 1, the function will not resubmit the URB. That fixes the main race, but leaves the asynchronous control status event handling for after uvc_status_stop() returns, which isn't great. Now, if we consider that uvc_status_start() could be called shortly after uvc_status_stop(), we may get in a situation where uvc_status_start() will reset flush_status to 0 before uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async() and uvc_status_start() will thus submit the same URB. You can't fix this by first killing the URB and then cancelling the work, as there would then be a risk that uvc_ctrl_status_event_work() would be running in parallel, going past the flush_status check before flush_status gets set to 1 in uvc_status_stop(), and submitting the URB after usb_kill_urb() is called. I think a good fix would be to check flush_status in uvc_ctrl_status_event_async() and avoid the schedule_work() call in that case. You could then also move the atomic_set(..., 0) call from uvc_status_start() to the end of uvc_status_stop() (again with proper barriers). Could you please check the memory barriers and test the above proposal (after double-checking it of course, I may have missed something) ? > } > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > index df93db259312..1274691f157f 100644 > --- a/drivers/media/usb/uvc/uvcvideo.h > +++ b/drivers/media/usb/uvc/uvcvideo.h > @@ -560,6 +560,7 @@ struct uvc_device { > struct usb_host_endpoint *int_ep; > struct urb *int_urb; > u8 *status; > + atomic_t flush_status; > struct input_dev *input; > char input_phys[64]; >
On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote: > Hi Ricardo, > > Thank you for the patch. > > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote: > > usb_kill_urb warranties that all the handlers are finished when it > > returns, but does not protect against threads that might be handling > > asynchronously the urb. > > > > For UVC, the function uvc_ctrl_status_event_async() takes care of > > control changes asynchronously. > > > > If the code is executed in the following order: > > > > CPU 0 CPU 1 > > ===== ===== > > uvc_status_complete() > > uvc_status_stop() > > uvc_ctrl_status_event_work() > > uvc_status_start() -> FAIL > > > > Then uvc_status_start will keep failing and this error will be shown: > > > > <4>[ 5.540139] URB 0000000000000000 submitted while active > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528 > > > > Let's improve the current situation, by not re-submiting the urb if > > we are stopping the status event. Also process the queued work > > (if any) during stop. > > > > CPU 0 CPU 1 > > ===== ===== > > uvc_status_complete() > > uvc_status_stop() > > uvc_status_start() > > uvc_ctrl_status_event_work() -> FAIL > > > > Hopefully, with the usb layer protection this should be enough to cover > > all the cases. > > > > Cc: stable@vger.kernel.org > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") > > Reviewed-by: Yunke Cao <yunkec@chromium.org> > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > > --- > > uvc: Fix race condition on uvc > > > > Make sure that all the async work is finished when we stop the status urb. > > > > To: Yunke Cao <yunkec@chromium.org> > > To: Sergey Senozhatsky <senozhatsky@chromium.org> > > To: Max Staudt <mstaudt@google.com> > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > > To: Mauro Carvalho Chehab <mchehab@kernel.org> > > Cc: linux-media@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > --- > > Changes in v4: > > - Replace bool with atomic_t to avoid compiler reordering > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!) > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org > > > > Changes in v3: > > - Remove the patch for dev->status, makes more sense in another series, and makes > > the zero day less nervous. > > - Update reviewed-by (thanks Yunke!). > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org > > > > Changes in v2: > > - Add a patch for not kalloc dev->status > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!) > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org > > --- > > drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ > > drivers/media/usb/uvc/uvc_status.c | 6 ++++++ > > drivers/media/usb/uvc/uvcvideo.h | 1 + > > 3 files changed, 10 insertions(+) > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > > index c95a2229f4fa..1be6897a7d6d 100644 > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work) > > > > uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > > + if (atomic_read(&dev->flush_status)) > > + return; > > + > > /* Resubmit the URB. */ > > w->urb->interval = dev->int_ep->desc.bInterval; > > ret = usb_submit_urb(w->urb, GFP_KERNEL); > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c > > index 7518ffce22ed..4a95850cdc1b 100644 > > --- a/drivers/media/usb/uvc/uvc_status.c > > +++ b/drivers/media/usb/uvc/uvc_status.c > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) > > if (dev->int_urb == NULL) > > return 0; > > > > + atomic_set(&dev->flush_status, 0); > > return usb_submit_urb(dev->int_urb, flags); > > } > > > > void uvc_status_stop(struct uvc_device *dev) > > { > > + struct uvc_ctrl_work *w = &dev->async_ctrl; > > + > > + atomic_set(&dev->flush_status, 1); > > Note that atomic_read() and atomic_set() do no imply any memory barrier > on most architectures, as far as I can tell. They essentially become > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler > will not merge or split loads or stores, or reorder them with respect to > load and stores to the *same* memory location, but nothing else. I think > you need to add proper barriers, and you can then probably also drop > usage of atomic_t. > > > + if (cancel_work_sync(&w->work)) > > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > usb_kill_urb(dev->int_urb); > > This should get rid of the main race (possibly save the barrier issue), > but it's not the most efficient option, and can still be problematic. > Consider the following case: > > CPU0 CPU1 > ---- ---- > > void uvc_status_stop(struct uvc_device *dev) uvc_ctrl_status_event_async() > { { > ... > atomic_set(&dev->flush_status, 1); ... > if (cancel_work_sync()) > ... > schedule_work(); > usb_kill_urb(); } > } > > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async() > completes before uvc_status_stop() returns, but there will still be work > scheduled in that case. uvc_ctrl_status_event_work() will be run later, > and as flush_status is set to 1, the function will not resubmit the URB. > That fixes the main race, but leaves the asynchronous control status > event handling for after uvc_status_stop() returns, which isn't great. > > Now, if we consider that uvc_status_start() could be called shortly > after uvc_status_stop(), we may get in a situation where > uvc_status_start() will reset flush_status to 0 before > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async() > and uvc_status_start() will thus submit the same URB. > > You can't fix this by first killing the URB and then cancelling the > work, as there would then be a risk that uvc_ctrl_status_event_work() > would be running in parallel, going past the flush_status check before > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB > after usb_kill_urb() is called. > > I think a good fix would be to check flush_status in > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that > case. You could then also move the atomic_set(..., 0) call from > uvc_status_start() to the end of uvc_status_stop() (again with proper > barriers). Also, as all of this is tricky, comments in appropriate places in the code would be very helpful to avoid breaking things later. > Could you please check the memory barriers and test the above proposal > (after double-checking it of course, I may have missed something) ? > > > } > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > > index df93db259312..1274691f157f 100644 > > --- a/drivers/media/usb/uvc/uvcvideo.h > > +++ b/drivers/media/usb/uvc/uvcvideo.h > > @@ -560,6 +560,7 @@ struct uvc_device { > > struct usb_host_endpoint *int_ep; > > struct urb *int_urb; > > u8 *status; > > + atomic_t flush_status; > > struct input_dev *input; > > char input_phys[64]; > >
Hi Laurent Thanks for your review! On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote: > > Hi Ricardo, > > > > Thank you for the patch. > > > > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote: > > > usb_kill_urb warranties that all the handlers are finished when it > > > returns, but does not protect against threads that might be handling > > > asynchronously the urb. > > > > > > For UVC, the function uvc_ctrl_status_event_async() takes care of > > > control changes asynchronously. > > > > > > If the code is executed in the following order: > > > > > > CPU 0 CPU 1 > > > ===== ===== > > > uvc_status_complete() > > > uvc_status_stop() > > > uvc_ctrl_status_event_work() > > > uvc_status_start() -> FAIL > > > > > > Then uvc_status_start will keep failing and this error will be shown: > > > > > > <4>[ 5.540139] URB 0000000000000000 submitted while active > > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528 > > > > > > Let's improve the current situation, by not re-submiting the urb if > > > we are stopping the status event. Also process the queued work > > > (if any) during stop. > > > > > > CPU 0 CPU 1 > > > ===== ===== > > > uvc_status_complete() > > > uvc_status_stop() > > > uvc_status_start() > > > uvc_ctrl_status_event_work() -> FAIL > > > > > > Hopefully, with the usb layer protection this should be enough to cover > > > all the cases. > > > > > > Cc: stable@vger.kernel.org > > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") > > > Reviewed-by: Yunke Cao <yunkec@chromium.org> > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > > > --- > > > uvc: Fix race condition on uvc > > > > > > Make sure that all the async work is finished when we stop the status urb. > > > > > > To: Yunke Cao <yunkec@chromium.org> > > > To: Sergey Senozhatsky <senozhatsky@chromium.org> > > > To: Max Staudt <mstaudt@google.com> > > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > > > To: Mauro Carvalho Chehab <mchehab@kernel.org> > > > Cc: linux-media@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > --- > > > Changes in v4: > > > - Replace bool with atomic_t to avoid compiler reordering > > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!) > > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org > > > > > > Changes in v3: > > > - Remove the patch for dev->status, makes more sense in another series, and makes > > > the zero day less nervous. > > > - Update reviewed-by (thanks Yunke!). > > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org > > > > > > Changes in v2: > > > - Add a patch for not kalloc dev->status > > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!) > > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org > > > --- > > > drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ > > > drivers/media/usb/uvc/uvc_status.c | 6 ++++++ > > > drivers/media/usb/uvc/uvcvideo.h | 1 + > > > 3 files changed, 10 insertions(+) > > > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > > > index c95a2229f4fa..1be6897a7d6d 100644 > > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work) > > > > > > uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > > > > + if (atomic_read(&dev->flush_status)) > > > + return; > > > + > > > /* Resubmit the URB. */ > > > w->urb->interval = dev->int_ep->desc.bInterval; > > > ret = usb_submit_urb(w->urb, GFP_KERNEL); > > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c > > > index 7518ffce22ed..4a95850cdc1b 100644 > > > --- a/drivers/media/usb/uvc/uvc_status.c > > > +++ b/drivers/media/usb/uvc/uvc_status.c > > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) > > > if (dev->int_urb == NULL) > > > return 0; > > > > > > + atomic_set(&dev->flush_status, 0); > > > return usb_submit_urb(dev->int_urb, flags); > > > } > > > > > > void uvc_status_stop(struct uvc_device *dev) > > > { > > > + struct uvc_ctrl_work *w = &dev->async_ctrl; > > > + > > > + atomic_set(&dev->flush_status, 1); > > > > Note that atomic_read() and atomic_set() do no imply any memory barrier > > on most architectures, as far as I can tell. They essentially become > > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler > > will not merge or split loads or stores, or reorder them with respect to > > load and stores to the *same* memory location, but nothing else. I think > > you need to add proper barriers, and you can then probably also drop > > usage of atomic_t. You are absolutely right! Only a subset of atomic implies memory barriers. Found this doc particularly good: https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html > > > > > + if (cancel_work_sync(&w->work)) > > > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > usb_kill_urb(dev->int_urb); > > > > This should get rid of the main race (possibly save the barrier issue), > > but it's not the most efficient option, and can still be problematic. > > Consider the following case: > > > > CPU0 CPU1 > > ---- ---- > > > > void uvc_status_stop(struct uvc_device *dev) uvc_ctrl_status_event_async() > > { { > > ... > > atomic_set(&dev->flush_status, 1); ... > > if (cancel_work_sync()) > > ... > > schedule_work(); > > usb_kill_urb(); } > > } > > > > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async() > > completes before uvc_status_stop() returns, but there will still be work > > scheduled in that case. uvc_ctrl_status_event_work() will be run later, > > and as flush_status is set to 1, the function will not resubmit the URB. > > That fixes the main race, but leaves the asynchronous control status > > event handling for after uvc_status_stop() returns, which isn't great. > > > > Now, if we consider that uvc_status_start() could be called shortly > > after uvc_status_stop(), we may get in a situation where > > uvc_status_start() will reset flush_status to 0 before > > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async() > > and uvc_status_start() will thus submit the same URB. > > > > You can't fix this by first killing the URB and then cancelling the > > work, as there would then be a risk that uvc_ctrl_status_event_work() > > would be running in parallel, going past the flush_status check before > > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB > > after usb_kill_urb() is called. > > > > I think a good fix would be to check flush_status in > > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that > > case. If we do that, then we will be losing an event. I would rather call cancel_work_sync() again after usb_kill_urb(). > >You could then also move the atomic_set(..., 0) call from > > uvc_status_start() to the end of uvc_status_stop() (again with proper > > barriers). Will do that, it is more "elegant". > > Also, as all of this is tricky, comments in appropriate places in the > code would be very helpful to avoid breaking things later. > > > Could you please check the memory barriers and test the above proposal > > (after double-checking it of course, I may have missed something) ? > > > > > } > > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > > > index df93db259312..1274691f157f 100644 > > > --- a/drivers/media/usb/uvc/uvcvideo.h > > > +++ b/drivers/media/usb/uvc/uvcvideo.h > > > @@ -560,6 +560,7 @@ struct uvc_device { > > > struct usb_host_endpoint *int_ep; > > > struct urb *int_urb; > > > u8 *status; > > > + atomic_t flush_status; > > > struct input_dev *input; > > > char input_phys[64]; > > > > > -- > Regards, > > Laurent Pinchart
Hi Ricardo, On Mon, Jan 02, 2023 at 01:45:06PM +0100, Ricardo Ribalda wrote: > On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart wrote: > > On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote: > > > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote: > > > > usb_kill_urb warranties that all the handlers are finished when it > > > > returns, but does not protect against threads that might be handling > > > > asynchronously the urb. > > > > > > > > For UVC, the function uvc_ctrl_status_event_async() takes care of > > > > control changes asynchronously. > > > > > > > > If the code is executed in the following order: > > > > > > > > CPU 0 CPU 1 > > > > ===== ===== > > > > uvc_status_complete() > > > > uvc_status_stop() > > > > uvc_ctrl_status_event_work() > > > > uvc_status_start() -> FAIL > > > > > > > > Then uvc_status_start will keep failing and this error will be shown: > > > > > > > > <4>[ 5.540139] URB 0000000000000000 submitted while active > > > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528 > > > > > > > > Let's improve the current situation, by not re-submiting the urb if > > > > we are stopping the status event. Also process the queued work > > > > (if any) during stop. > > > > > > > > CPU 0 CPU 1 > > > > ===== ===== > > > > uvc_status_complete() > > > > uvc_status_stop() > > > > uvc_status_start() > > > > uvc_ctrl_status_event_work() -> FAIL > > > > > > > > Hopefully, with the usb layer protection this should be enough to cover > > > > all the cases. > > > > > > > > Cc: stable@vger.kernel.org > > > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") > > > > Reviewed-by: Yunke Cao <yunkec@chromium.org> > > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > > > > --- > > > > uvc: Fix race condition on uvc > > > > > > > > Make sure that all the async work is finished when we stop the status urb. > > > > > > > > To: Yunke Cao <yunkec@chromium.org> > > > > To: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > To: Max Staudt <mstaudt@google.com> > > > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > > > > To: Mauro Carvalho Chehab <mchehab@kernel.org> > > > > Cc: linux-media@vger.kernel.org > > > > Cc: linux-kernel@vger.kernel.org > > > > --- > > > > Changes in v4: > > > > - Replace bool with atomic_t to avoid compiler reordering > > > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!) > > > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org > > > > > > > > Changes in v3: > > > > - Remove the patch for dev->status, makes more sense in another series, and makes > > > > the zero day less nervous. > > > > - Update reviewed-by (thanks Yunke!). > > > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org > > > > > > > > Changes in v2: > > > > - Add a patch for not kalloc dev->status > > > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!) > > > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org > > > > --- > > > > drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ > > > > drivers/media/usb/uvc/uvc_status.c | 6 ++++++ > > > > drivers/media/usb/uvc/uvcvideo.h | 1 + > > > > 3 files changed, 10 insertions(+) > > > > > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > > > > index c95a2229f4fa..1be6897a7d6d 100644 > > > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work) > > > > > > > > uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > > > > > > + if (atomic_read(&dev->flush_status)) > > > > + return; > > > > + > > > > /* Resubmit the URB. */ > > > > w->urb->interval = dev->int_ep->desc.bInterval; > > > > ret = usb_submit_urb(w->urb, GFP_KERNEL); > > > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c > > > > index 7518ffce22ed..4a95850cdc1b 100644 > > > > --- a/drivers/media/usb/uvc/uvc_status.c > > > > +++ b/drivers/media/usb/uvc/uvc_status.c > > > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) > > > > if (dev->int_urb == NULL) > > > > return 0; > > > > > > > > + atomic_set(&dev->flush_status, 0); > > > > return usb_submit_urb(dev->int_urb, flags); > > > > } > > > > > > > > void uvc_status_stop(struct uvc_device *dev) > > > > { > > > > + struct uvc_ctrl_work *w = &dev->async_ctrl; > > > > + > > > > + atomic_set(&dev->flush_status, 1); > > > > > > Note that atomic_read() and atomic_set() do no imply any memory barrier > > > on most architectures, as far as I can tell. They essentially become > > > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler > > > will not merge or split loads or stores, or reorder them with respect to > > > load and stores to the *same* memory location, but nothing else. I think > > > you need to add proper barriers, and you can then probably also drop > > > usage of atomic_t. > > You are absolutely right! Only a subset of atomic implies memory barriers. > > Found this doc particularly good: > https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html > > > > > > > > > + if (cancel_work_sync(&w->work)) > > > > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > > usb_kill_urb(dev->int_urb); > > > > > > This should get rid of the main race (possibly save the barrier issue), > > > but it's not the most efficient option, and can still be problematic. > > > Consider the following case: > > > > > > CPU0 CPU1 > > > ---- ---- > > > > > > void uvc_status_stop(struct uvc_device *dev) uvc_ctrl_status_event_async() > > > { { > > > ... > > > atomic_set(&dev->flush_status, 1); ... > > > if (cancel_work_sync()) > > > ... > > > schedule_work(); > > > usb_kill_urb(); } > > > } > > > > > > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async() > > > completes before uvc_status_stop() returns, but there will still be work > > > scheduled in that case. uvc_ctrl_status_event_work() will be run later, > > > and as flush_status is set to 1, the function will not resubmit the URB. > > > That fixes the main race, but leaves the asynchronous control status > > > event handling for after uvc_status_stop() returns, which isn't great. > > > > > > Now, if we consider that uvc_status_start() could be called shortly > > > after uvc_status_stop(), we may get in a situation where > > > uvc_status_start() will reset flush_status to 0 before > > > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async() > > > and uvc_status_start() will thus submit the same URB. > > > > > > You can't fix this by first killing the URB and then cancelling the > > > work, as there would then be a risk that uvc_ctrl_status_event_work() > > > would be running in parallel, going past the flush_status check before > > > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB > > > after usb_kill_urb() is called. > > > > > > I think a good fix would be to check flush_status in > > > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that > > > case. > > If we do that, then we will be losing an event. I would rather call > cancel_work_sync() again after usb_kill_urb(). I've thought about that, but I don't think losing the event is an issue. We're stopping event handling in the first place, there's no synchronization guarantee with the camera. For all we know the camera could have generate the event right after we stop instead of right before. There's of course no reason to drop the event for the sake of it, but if it leads to simpler code, there's no reason to process it either. > > > You could then also move the atomic_set(..., 0) call from > > > uvc_status_start() to the end of uvc_status_stop() (again with proper > > > barriers). > > Will do that, it is more "elegant". > > > Also, as all of this is tricky, comments in appropriate places in the > > code would be very helpful to avoid breaking things later. > > > > > Could you please check the memory barriers and test the above proposal > > > (after double-checking it of course, I may have missed something) ? > > > > > > > } > > > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > > > > index df93db259312..1274691f157f 100644 > > > > --- a/drivers/media/usb/uvc/uvcvideo.h > > > > +++ b/drivers/media/usb/uvc/uvcvideo.h > > > > @@ -560,6 +560,7 @@ struct uvc_device { > > > > struct usb_host_endpoint *int_ep; > > > > struct urb *int_urb; > > > > u8 *status; > > > > + atomic_t flush_status; > > > > struct input_dev *input; > > > > char input_phys[64]; > > > >
On Mon, 2 Jan 2023 at 13:57, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Ricardo, > > On Mon, Jan 02, 2023 at 01:45:06PM +0100, Ricardo Ribalda wrote: > > On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart wrote: > > > On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote: > > > > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote: > > > > > usb_kill_urb warranties that all the handlers are finished when it > > > > > returns, but does not protect against threads that might be handling > > > > > asynchronously the urb. > > > > > > > > > > For UVC, the function uvc_ctrl_status_event_async() takes care of > > > > > control changes asynchronously. > > > > > > > > > > If the code is executed in the following order: > > > > > > > > > > CPU 0 CPU 1 > > > > > ===== ===== > > > > > uvc_status_complete() > > > > > uvc_status_stop() > > > > > uvc_ctrl_status_event_work() > > > > > uvc_status_start() -> FAIL > > > > > > > > > > Then uvc_status_start will keep failing and this error will be shown: > > > > > > > > > > <4>[ 5.540139] URB 0000000000000000 submitted while active > > > > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528 > > > > > > > > > > Let's improve the current situation, by not re-submiting the urb if > > > > > we are stopping the status event. Also process the queued work > > > > > (if any) during stop. > > > > > > > > > > CPU 0 CPU 1 > > > > > ===== ===== > > > > > uvc_status_complete() > > > > > uvc_status_stop() > > > > > uvc_status_start() > > > > > uvc_ctrl_status_event_work() -> FAIL > > > > > > > > > > Hopefully, with the usb layer protection this should be enough to cover > > > > > all the cases. > > > > > > > > > > Cc: stable@vger.kernel.org > > > > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") > > > > > Reviewed-by: Yunke Cao <yunkec@chromium.org> > > > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > > > > > --- > > > > > uvc: Fix race condition on uvc > > > > > > > > > > Make sure that all the async work is finished when we stop the status urb. > > > > > > > > > > To: Yunke Cao <yunkec@chromium.org> > > > > > To: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > > To: Max Staudt <mstaudt@google.com> > > > > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > > > > > To: Mauro Carvalho Chehab <mchehab@kernel.org> > > > > > Cc: linux-media@vger.kernel.org > > > > > Cc: linux-kernel@vger.kernel.org > > > > > --- > > > > > Changes in v4: > > > > > - Replace bool with atomic_t to avoid compiler reordering > > > > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!) > > > > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org > > > > > > > > > > Changes in v3: > > > > > - Remove the patch for dev->status, makes more sense in another series, and makes > > > > > the zero day less nervous. > > > > > - Update reviewed-by (thanks Yunke!). > > > > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org > > > > > > > > > > Changes in v2: > > > > > - Add a patch for not kalloc dev->status > > > > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!) > > > > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org > > > > > --- > > > > > drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ > > > > > drivers/media/usb/uvc/uvc_status.c | 6 ++++++ > > > > > drivers/media/usb/uvc/uvcvideo.h | 1 + > > > > > 3 files changed, 10 insertions(+) > > > > > > > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > > > > > index c95a2229f4fa..1be6897a7d6d 100644 > > > > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > > > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > > > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work) > > > > > > > > > > uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > > > > > > > > + if (atomic_read(&dev->flush_status)) > > > > > + return; > > > > > + > > > > > /* Resubmit the URB. */ > > > > > w->urb->interval = dev->int_ep->desc.bInterval; > > > > > ret = usb_submit_urb(w->urb, GFP_KERNEL); > > > > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c > > > > > index 7518ffce22ed..4a95850cdc1b 100644 > > > > > --- a/drivers/media/usb/uvc/uvc_status.c > > > > > +++ b/drivers/media/usb/uvc/uvc_status.c > > > > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) > > > > > if (dev->int_urb == NULL) > > > > > return 0; > > > > > > > > > > + atomic_set(&dev->flush_status, 0); > > > > > return usb_submit_urb(dev->int_urb, flags); > > > > > } > > > > > > > > > > void uvc_status_stop(struct uvc_device *dev) > > > > > { > > > > > + struct uvc_ctrl_work *w = &dev->async_ctrl; > > > > > + > > > > > + atomic_set(&dev->flush_status, 1); > > > > > > > > Note that atomic_read() and atomic_set() do no imply any memory barrier > > > > on most architectures, as far as I can tell. They essentially become > > > > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler > > > > will not merge or split loads or stores, or reorder them with respect to > > > > load and stores to the *same* memory location, but nothing else. I think > > > > you need to add proper barriers, and you can then probably also drop > > > > usage of atomic_t. > > > > You are absolutely right! Only a subset of atomic implies memory barriers. > > > > Found this doc particularly good: > > https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html > > > > > > > > > > > > > + if (cancel_work_sync(&w->work)) > > > > > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data); > > > > > usb_kill_urb(dev->int_urb); > > > > > > > > This should get rid of the main race (possibly save the barrier issue), > > > > but it's not the most efficient option, and can still be problematic. > > > > Consider the following case: > > > > > > > > CPU0 CPU1 > > > > ---- ---- > > > > > > > > void uvc_status_stop(struct uvc_device *dev) uvc_ctrl_status_event_async() > > > > { { > > > > ... > > > > atomic_set(&dev->flush_status, 1); ... > > > > if (cancel_work_sync()) > > > > ... > > > > schedule_work(); > > > > usb_kill_urb(); } > > > > } > > > > > > > > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async() > > > > completes before uvc_status_stop() returns, but there will still be work > > > > scheduled in that case. uvc_ctrl_status_event_work() will be run later, > > > > and as flush_status is set to 1, the function will not resubmit the URB. > > > > That fixes the main race, but leaves the asynchronous control status > > > > event handling for after uvc_status_stop() returns, which isn't great. > > > > > > > > Now, if we consider that uvc_status_start() could be called shortly > > > > after uvc_status_stop(), we may get in a situation where > > > > uvc_status_start() will reset flush_status to 0 before > > > > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async() > > > > and uvc_status_start() will thus submit the same URB. > > > > > > > > You can't fix this by first killing the URB and then cancelling the > > > > work, as there would then be a risk that uvc_ctrl_status_event_work() > > > > would be running in parallel, going past the flush_status check before > > > > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB > > > > after usb_kill_urb() is called. > > > > > > > > I think a good fix would be to check flush_status in > > > > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that > > > > case. > > > > If we do that, then we will be losing an event. I would rather call > > cancel_work_sync() again after usb_kill_urb(). > > I've thought about that, but I don't think losing the event is an issue. > We're stopping event handling in the first place, there's no > synchronization guarantee with the camera. For all we know the camera > could have generate the event right after we stop instead of right > before. There's of course no reason to drop the event for the sake of > it, but if it leads to simpler code, there's no reason to process it > either. Please take a look to v5, it does not look complicated, and we have all the synchronization code in two functions: uvc_status_stop() and uvc_ctrl_status_event_work(), instead of 3. Plus we do not lose any received event. Regards! > > > > > You could then also move the atomic_set(..., 0) call from > > > > uvc_status_start() to the end of uvc_status_stop() (again with proper > > > > barriers). > > > > Will do that, it is more "elegant". > > > > > Also, as all of this is tricky, comments in appropriate places in the > > > code would be very helpful to avoid breaking things later. > > > > > > > Could you please check the memory barriers and test the above proposal > > > > (after double-checking it of course, I may have missed something) ? > > > > > > > > > } > > > > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > > > > > index df93db259312..1274691f157f 100644 > > > > > --- a/drivers/media/usb/uvc/uvcvideo.h > > > > > +++ b/drivers/media/usb/uvc/uvcvideo.h > > > > > @@ -560,6 +560,7 @@ struct uvc_device { > > > > > struct usb_host_endpoint *int_ep; > > > > > struct urb *int_urb; > > > > > u8 *status; > > > > > + atomic_t flush_status; > > > > > struct input_dev *input; > > > > > char input_phys[64]; > > > > > > > -- > Regards, > > Laurent Pinchart
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index c95a2229f4fa..1be6897a7d6d 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work) uvc_ctrl_status_event(w->chain, w->ctrl, w->data); + if (atomic_read(&dev->flush_status)) + return; + /* Resubmit the URB. */ w->urb->interval = dev->int_ep->desc.bInterval; ret = usb_submit_urb(w->urb, GFP_KERNEL); diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c index 7518ffce22ed..4a95850cdc1b 100644 --- a/drivers/media/usb/uvc/uvc_status.c +++ b/drivers/media/usb/uvc/uvc_status.c @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) if (dev->int_urb == NULL) return 0; + atomic_set(&dev->flush_status, 0); return usb_submit_urb(dev->int_urb, flags); } void uvc_status_stop(struct uvc_device *dev) { + struct uvc_ctrl_work *w = &dev->async_ctrl; + + atomic_set(&dev->flush_status, 1); + if (cancel_work_sync(&w->work)) + uvc_ctrl_status_event(w->chain, w->ctrl, w->data); usb_kill_urb(dev->int_urb); } diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index df93db259312..1274691f157f 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -560,6 +560,7 @@ struct uvc_device { struct usb_host_endpoint *int_ep; struct urb *int_urb; u8 *status; + atomic_t flush_status; struct input_dev *input; char input_phys[64];