Message ID | 1670851464-8106-1-git-send-email-quic_prashk@quicinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2240357wrr; Mon, 12 Dec 2022 05:25:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Z4w12OyemsGGdIGalrMPTlW62vYAWL6nGjbb044mkC9Kyam51AK1kvI5IclXT5MD+mQgQ X-Received: by 2002:a05:6a20:9589:b0:a3:878d:c126 with SMTP id iu9-20020a056a20958900b000a3878dc126mr23521020pzb.42.1670851514294; Mon, 12 Dec 2022 05:25:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670851514; cv=none; d=google.com; s=arc-20160816; b=ia2/7L9rX+yvGHXGgdNLAf8U9v/94RLMxK7fnV89ePLYeAHsUhV2BdrUoMesoGAjyY KMjxmRs+8+qU2R4OKzLhRTEbh/BgCGpALbW7agi8iElgc/6N2omQBZhPypjSPe7fBGzv GOdpST8vMjLyoaOzEsksYJiQL0GHtj/xWIwh2wAuc+1Iq6qloE4YygtjVm/7hAiFJuXJ NIeiLEnn8uOZL4VBeMz3Rcb0JTPFd0RyijO3TDXK7b6Vubi+VGmq0Ui8qdfVvB9uO7Vt i6uO6famNIyR8q11pmE6fftMpWYcLFwdnTQDLgqd0Sb+OWdKVZ+jTakhESRk4E556Skj oRwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=91cpds4v69UzBBGb1l3cLOexQaCW6Fg6qXLj+1lY1nQ=; b=POUbffngo+ew4ByvUCJoyFoYU46Nh3UCr8dKOWYNEDNKdwJGPVUGFpgatt9/aMTOAH mEOVj2jFnHq/YQO1g2idk3SOh1jQhYAMKexJn/8909Pf6tPDV8ZgiMtHvw6gDOZX1wGI tgCsht30QG6EPaUgdzXKUJ/C+Pt7Ak+rmUC4euFmmvb0IAn3KzHHao7NRGdSeeSF1jPE fYAh5FcOp0rYmkbDzl+EDga6WU4fmJswbwcYnr5333b69BbEEvHsZZA5OuXjb/cTVX+O Csg6gf/I6bVvo2po8MzVUePsjRFur+3++mpXKnXVXtNVRjjjE2v2lf0SB5ll21V9p+6d Ej8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=mrkSTDNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y14-20020a17090aca8e00b001f06a6fdb2fsi9341253pjt.27.2022.12.12.05.25.01; Mon, 12 Dec 2022 05:25:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=mrkSTDNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232707AbiLLNY7 (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Mon, 12 Dec 2022 08:24:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232700AbiLLNYv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 12 Dec 2022 08:24:51 -0500 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C0615F90; Mon, 12 Dec 2022 05:24:49 -0800 (PST) Received: from pps.filterd (m0279868.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BCCx1iV002490; Mon, 12 Dec 2022 13:24:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=qcppdkim1; bh=91cpds4v69UzBBGb1l3cLOexQaCW6Fg6qXLj+1lY1nQ=; b=mrkSTDNyN69Vam0+Uc+EwDMRxr341Js4lwPeE4dmDMypicevmXAseEvfDtmrC7RKb8ps ltRqeoJxfKRZUJS4FxW1vTS48WVPwHDxetyHwrBWtG1xrOmIikRq1c7XrE7eQqUZGW83 3+CWVOxziw5wJl7sKhSBC/WPighui6XktoHu59lZKCdq02rmM5oBWAL3YfKs7xF5tUAx XAwevaKzmMlo38uuuoxm7XItr4ypDURQANwoR7E9swDhuNBiypaexFyxCGMvh2vczMP/ 0GiPQEJPfvq3y/OG9Ii1klGDAP9ACrlYLuI6DmH9YVOUXYnJi1SC2N6lFKrKjo2b/rz6 Ag== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3mchesc7dj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 12 Dec 2022 13:24:34 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 2BCDOX4q004025 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 12 Dec 2022 13:24:33 GMT Received: from hu-prashk-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 12 Dec 2022 05:24:29 -0800 From: Prashanth K <quic_prashk@quicinc.com> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "Gustavo A . R . Silva" <gustavoars@kernel.org>, Shuah Khan <skhan@linuxfoundation.org>, John Keeping <john@metanate.com>, Linyu Yuan <quic_linyyuan@quicinc.com>, Pratham Pratap <quic_ppratap@quicinc.com>, Vincent Pelletier <plr.vincent@gmail.com>, "Dan Carpenter" <error27@gmail.com>, Udipto Goswami <quic_ugoswami@quicinc.com> CC: <linux-usb@vger.kernel.org>, <linux-kernel@vger.kernel.org>, "# 5 . 15" <stable@vger.kernel.org>, Prashanth K <quic_prashk@quicinc.com> Subject: usb: f_fs: Fix CFI failure in ki_complete Date: Mon, 12 Dec 2022 18:54:24 +0530 Message-ID: <1670851464-8106-1-git-send-email-quic_prashk@quicinc.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: z2Ie-FTzCK62ZzaEgSBQhZSQ3M8w3Z81 X-Proofpoint-GUID: z2Ie-FTzCK62ZzaEgSBQhZSQ3M8w3Z81 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-12_02,2022-12-12_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 adultscore=0 priorityscore=1501 lowpriorityscore=0 impostorscore=0 mlxscore=0 mlxlogscore=607 suspectscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2212120123 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752014797327577945?= X-GMAIL-MSGID: =?utf-8?q?1752014797327577945?= |
Series |
usb: f_fs: Fix CFI failure in ki_complete
|
|
Commit Message
Prashanth K
Dec. 12, 2022, 1:24 p.m. UTC
Function pointer ki_complete() expects 'long' as its second
argument, but we pass integer from ffs_user_copy_worker. This
might cause a CFI failure, as ki_complete is an indirect call
with mismatched prototype. Fix this by typecasting the second
argument to long.
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
---
drivers/usb/gadget/function/f_fs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > Function pointer ki_complete() expects 'long' as its second > argument, but we pass integer from ffs_user_copy_worker. This > might cause a CFI failure, as ki_complete is an indirect call > with mismatched prototype. Fix this by typecasting the second > argument to long. "might"? Does it or not? If it does, why hasn't this been reported before? > Cc: <stable@vger.kernel.org> # 5.15 CFI first showed up in 6.1, not 5.15, right? > Signed-off-by: Prashanth K <quic_prashk@quicinc.com> > > --- > drivers/usb/gadget/function/f_fs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > index 73dc10a7..9c26561 100644 > --- a/drivers/usb/gadget/function/f_fs.c > +++ b/drivers/usb/gadget/function/f_fs.c > @@ -835,7 +835,7 @@ static void ffs_user_copy_worker(struct work_struct *work) > kthread_unuse_mm(io_data->mm); > } > > - io_data->kiocb->ki_complete(io_data->kiocb, ret); > + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); Why just fix up this one instance? What about ep_user_copy_worker()? And what about all other calls to ki_complete that are not using a (long) cast? This feels wrong, what exactly is the reported error and how come other kernel calls to this function pointer have not had a problem with CFI? ceph_aio_complete() would be another example, right? thanks, greg k-h
On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > Function pointer ki_complete() expects 'long' as its second > argument, but we pass integer from ffs_user_copy_worker. This > might cause a CFI failure, as ki_complete is an indirect call > with mismatched prototype. Fix this by typecasting the second > argument to long. > > Cc: <stable@vger.kernel.org> # 5.15 > Signed-off-by: Prashanth K <quic_prashk@quicinc.com> > > --- > drivers/usb/gadget/function/f_fs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > index 73dc10a7..9c26561 100644 > --- a/drivers/usb/gadget/function/f_fs.c > +++ b/drivers/usb/gadget/function/f_fs.c > @@ -835,7 +835,7 @@ static void ffs_user_copy_worker(struct work_struct *work) > kthread_unuse_mm(io_data->mm); > } > > - io_data->kiocb->ki_complete(io_data->kiocb, ret); > + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); I don't understand the problem here, ki_complete() is declared with a long parameter, so won't integer promotion kick in so that the function is called with a long? Why is the explicit cast necessary? > > if (io_data->ffs->ffs_eventfd && !kiocb_has_eventfd) > eventfd_signal(io_data->ffs->ffs_eventfd, 1); > -- > 2.7.4 >
On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > Function pointer ki_complete() expects 'long' as its second > argument, but we pass integer from ffs_user_copy_worker. This > might cause a CFI failure, as ki_complete is an indirect call > with mismatched prototype. Fix this by typecasting the second > argument to long. > > Cc: <stable@vger.kernel.org> # 5.15 > Signed-off-by: Prashanth K <quic_prashk@quicinc.com> > > --- > drivers/usb/gadget/function/f_fs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > index 73dc10a7..9c26561 100644 > --- a/drivers/usb/gadget/function/f_fs.c > +++ b/drivers/usb/gadget/function/f_fs.c > @@ -835,7 +835,7 @@ static void ffs_user_copy_worker(struct work_struct *work) > kthread_unuse_mm(io_data->mm); > } > > - io_data->kiocb->ki_complete(io_data->kiocb, ret); > + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); I don't understand this commit at all. CFI is Control Flow Integrity or Common Flash Interface depending on which subsystem we're talking about. I really think that Clang needs to be fixed if this really causes an issue for Clang. How on earth are we going to know where to add all the casts? The commit message says "this might cause a CFI" failure. Either it does or it doesn't. Please someone test this so we can know what's going on. Why is it backported to 5.15? I thought CFI was not going to backported that far and I has seen people complaining about CFI backports. regards, dan carpenter
On 12-12-22 07:05 pm, Greg Kroah-Hartman wrote: > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: >> Function pointer ki_complete() expects 'long' as its second >> argument, but we pass integer from ffs_user_copy_worker. This >> might cause a CFI failure, as ki_complete is an indirect call >> with mismatched prototype. Fix this by typecasting the second >> argument to long. > > "might"? Does it or not? If it does, why hasn't this been reported > before? Sorry for the confusion in commit text, We caught a CFI (Control Flow Integrity) failure internally on 5.15, hence pushed this patch. But later I came to know that CFI was implemented on 5.4 kernel for Android. Will push the same on ACK and share the related details there. Thanks. > >> Cc: <stable@vger.kernel.org> # 5.15 > > CFI first showed up in 6.1, not 5.15, right? > >> Signed-off-by: Prashanth K <quic_prashk@quicinc.com> >> >> --- >> drivers/usb/gadget/function/f_fs.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c >> index 73dc10a7..9c26561 100644 >> --- a/drivers/usb/gadget/function/f_fs.c >> +++ b/drivers/usb/gadget/function/f_fs.c >> @@ -835,7 +835,7 @@ static void ffs_user_copy_worker(struct work_struct *work) >> kthread_unuse_mm(io_data->mm); >> } >> >> - io_data->kiocb->ki_complete(io_data->kiocb, ret); >> + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); > > Why just fix up this one instance? What about ep_user_copy_worker()? > And what about all other calls to ki_complete that are not using a > (long) cast? > > This feels wrong, what exactly is the reported error and how come other > kernel calls to this function pointer have not had a problem with CFI? > ceph_aio_complete() would be another example, right? > > thanks, > > greg k-h
On Wed, Dec 14, 2022 at 06:38:17PM +0530, Prashanth K wrote: > > > On 12-12-22 07:05 pm, Greg Kroah-Hartman wrote: > > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > > > Function pointer ki_complete() expects 'long' as its second > > > argument, but we pass integer from ffs_user_copy_worker. This > > > might cause a CFI failure, as ki_complete is an indirect call > > > with mismatched prototype. Fix this by typecasting the second > > > argument to long. > > > > "might"? Does it or not? If it does, why hasn't this been reported > > before? > Sorry for the confusion in commit text, We caught a CFI (Control Flow > Integrity) failure internally on 5.15, hence pushed this patch. But later I > came to know that CFI was implemented on 5.4 kernel for Android. Will push > the same on ACK and share the related details there. I will have the same questions there, namely, "why just this one instance and why is it trigging anything"? So please, work this out here, in public, don't bury stuff in random vendor kernel trees. That's not the way to solve anything properly, you know this :) thanks, greg k-h
From: Greg Kroah-Hartman > Sent: 12 December 2022 13:35 > > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > > Function pointer ki_complete() expects 'long' as its second > > argument, but we pass integer from ffs_user_copy_worker. This > > might cause a CFI failure, as ki_complete is an indirect call > > with mismatched prototype. Fix this by typecasting the second > > argument to long. > > "might"? Does it or not? If it does, why hasn't this been reported > before? Does the cast even help at all. ... > > - io_data->kiocb->ki_complete(io_data->kiocb, ret); > > + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); ... If definition of the parameter in the structure member ki_complete() definition is 'long' then the compiler has to promote 'ret' to long anyway. CFI has nothing to do with it. OTOH if you've used a cast to assign a function with a different prototype to ki_complete then 'all bets are off' and you get all the run time errors you deserve. CFI just converts some of them to compile time errors. For instance if you assign xx_complete(long) to (*ki_complete)(int) then it is very likely that xx_complete() will an argument with some of the high bits set. But adding a cast to the call - ki_complete((long)int_var) will make absolutely no difference. The compiler wont zero/sign extend int_var to 64bits for you, that will just get optimised away and the high bits will be unchanged. You're description seems to be the other way around (which might be safe, but CFI probably still barfs). But you need to fix the indirect calls so the function types match. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 14-12-22 11:05 pm, David Laight wrote: > From: Greg Kroah-Hartman >> Sent: 12 December 2022 13:35 >> >> On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: >>> Function pointer ki_complete() expects 'long' as its second >>> argument, but we pass integer from ffs_user_copy_worker. This >>> might cause a CFI failure, as ki_complete is an indirect call >>> with mismatched prototype. Fix this by typecasting the second >>> argument to long. >> >> "might"? Does it or not? If it does, why hasn't this been reported >> before? > > Does the cast even help at all. Actually I also have these same questions - why we haven't seen any instances other than this one? - why its not seen on other indirect function calls? Here is the the call stack of the failure that we got. [ 323.288681][ T7] Kernel panic - not syncing: CFI failure (target: 0xffffffe5fc811f98) [ 323.288710][ T7] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G S W OE 5.15.41-android13-8-g5ffc5644bd20 #1 [ 323.288730][ T7] Workqueue: adb ffs_user_copy_worker.cfi_jt [ 323.288752][ T7] Call trace: [ 323.288755][ T7] dump_backtrace.cfi_jt+0x0/0x8 [ 323.288772][ T7] dump_stack_lvl+0x80/0xb8 [ 323.288785][ T7] panic+0x180/0x444 [ 323.288797][ T7] find_check_fn+0x0/0x218 [ 323.288810][ T7] ffs_user_copy_worker+0x1dc/0x204 [ 323.288822][ T7] kretprobe_trampoline.cfi_jt+0x0/0x8 [ 323.288837][ T7] worker_thread+0x3ec/0x920 [ 323.288850][ T7] kthread+0x168/0x1dc [ 323.288859][ T7] ret_from_fork+0x10/0x20 [ 323.288866][ T7] SMP: stopping secondary CPUs And from address to line translation, we got know the issue is from ffs_user_copy_worker+0x1dc/0x204 || io_data->kiocb->ki_complete(io_data->kiocb, ret); And "find_check_fn" was getting invoked from ki_complete. Only thing that I found suspicious about ki_complete() is its argument types. That's why I pushed this patch here, so that we can discuss this out here. Thanks in advance > > ... >>> - io_data->kiocb->ki_complete(io_data->kiocb, ret); >>> + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); > ... > > If definition of the parameter in the structure member ki_complete() > definition is 'long' then the compiler has to promote 'ret' to long > anyway. CFI has nothing to do with it. > > OTOH if you've used a cast to assign a function with a > different prototype to ki_complete then 'all bets are off' > and you get all the run time errors you deserve. > CFI just converts some of them to compile time errors. > > For instance if you assign xx_complete(long) to (*ki_complete)(int) > then it is very likely that xx_complete() will an argument > with some of the high bits set. > But adding a cast to the call - ki_complete((long)int_var) > will make absolutely no difference. > The compiler wont zero/sign extend int_var to 64bits for you, > that will just get optimised away and the high bits will > be unchanged. > > You're description seems to be the other way around (which might > be safe, but CFI probably still barfs). > But you need to fix the indirect calls so the function types > match. So does that mean, we need to add casts in al indirect calls to match the function signature? > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
On Thu, Dec 22, 2022 at 06:21:03PM +0530, Prashanth K wrote: > > > On 14-12-22 11:05 pm, David Laight wrote: > > From: Greg Kroah-Hartman > > > Sent: 12 December 2022 13:35 > > > > > > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > > > > Function pointer ki_complete() expects 'long' as its second > > > > argument, but we pass integer from ffs_user_copy_worker. This > > > > might cause a CFI failure, as ki_complete is an indirect call > > > > with mismatched prototype. Fix this by typecasting the second > > > > argument to long. > > > > > > "might"? Does it or not? If it does, why hasn't this been reported > > > before? > > > > Does the cast even help at all. > Actually I also have these same questions > - why we haven't seen any instances other than this one? > - why its not seen on other indirect function calls? > > Here is the the call stack of the failure that we got. > > [ 323.288681][ T7] Kernel panic - not syncing: CFI failure (target: > 0xffffffe5fc811f98) > [ 323.288710][ T7] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G S W > OE 5.15.41-android13-8-g5ffc5644bd20 #1 > [ 323.288730][ T7] Workqueue: adb ffs_user_copy_worker.cfi_jt > [ 323.288752][ T7] Call trace: > [ 323.288755][ T7] dump_backtrace.cfi_jt+0x0/0x8 > [ 323.288772][ T7] dump_stack_lvl+0x80/0xb8 > [ 323.288785][ T7] panic+0x180/0x444 > [ 323.288797][ T7] find_check_fn+0x0/0x218 > [ 323.288810][ T7] ffs_user_copy_worker+0x1dc/0x204 > [ 323.288822][ T7] kretprobe_trampoline.cfi_jt+0x0/0x8 > [ 323.288837][ T7] worker_thread+0x3ec/0x920 > [ 323.288850][ T7] kthread+0x168/0x1dc > [ 323.288859][ T7] ret_from_fork+0x10/0x20 > [ 323.288866][ T7] SMP: stopping secondary CPUs > > And from address to line translation, we got know the issue is from > ffs_user_copy_worker+0x1dc/0x204 > || > io_data->kiocb->ki_complete(io_data->kiocb, ret); > > And "find_check_fn" was getting invoked from ki_complete. Only thing that I > found suspicious about ki_complete() is its argument types. That's why I > pushed this patch here, so that we can discuss this out here. I think the problem is more likely whatever ->ki_complete() points to but I have no idea what that is on your system. You're using an Android kernel so it could be something out of tree as well... regards, dan carpenter
From: Dan Carpenter <error27@gmail.com> Date: Fri, 23 Dec 2022 12:04:18 +0300 > On Thu, Dec 22, 2022 at 06:21:03PM +0530, Prashanth K wrote: > > > > > > On 14-12-22 11:05 pm, David Laight wrote: > > > From: Greg Kroah-Hartman > > > > Sent: 12 December 2022 13:35 > > > > > > > > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > > > > > Function pointer ki_complete() expects 'long' as its second > > > > > argument, but we pass integer from ffs_user_copy_worker. This > > > > > might cause a CFI failure, as ki_complete is an indirect call > > > > > with mismatched prototype. Fix this by typecasting the second > > > > > argument to long. > > > > > > > > "might"? Does it or not? If it does, why hasn't this been reported > > > > before? > > > > > > Does the cast even help at all. > > Actually I also have these same questions > > - why we haven't seen any instances other than this one? > > - why its not seen on other indirect function calls? > > > > Here is the the call stack of the failure that we got. > > > > [ 323.288681][ T7] Kernel panic - not syncing: CFI failure (target: > > 0xffffffe5fc811f98) > > [ 323.288710][ T7] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G S W > > OE 5.15.41-android13-8-g5ffc5644bd20 #1 > > [ 323.288730][ T7] Workqueue: adb ffs_user_copy_worker.cfi_jt > > [ 323.288752][ T7] Call trace: > > [ 323.288755][ T7] dump_backtrace.cfi_jt+0x0/0x8 > > [ 323.288772][ T7] dump_stack_lvl+0x80/0xb8 > > [ 323.288785][ T7] panic+0x180/0x444 > > [ 323.288797][ T7] find_check_fn+0x0/0x218 > > [ 323.288810][ T7] ffs_user_copy_worker+0x1dc/0x204 > > [ 323.288822][ T7] kretprobe_trampoline.cfi_jt+0x0/0x8 > > [ 323.288837][ T7] worker_thread+0x3ec/0x920 > > [ 323.288850][ T7] kthread+0x168/0x1dc > > [ 323.288859][ T7] ret_from_fork+0x10/0x20 > > [ 323.288866][ T7] SMP: stopping secondary CPUs > > > > And from address to line translation, we got know the issue is from > > ffs_user_copy_worker+0x1dc/0x204 > > || > > io_data->kiocb->ki_complete(io_data->kiocb, ret); > > > > And "find_check_fn" was getting invoked from ki_complete. Only thing that I > > found suspicious about ki_complete() is its argument types. That's why I > > pushed this patch here, so that we can discuss this out here. > > I think the problem is more likely whatever ->ki_complete() points to > but I have no idea what that is on your system. You're using an Android > kernel so it could be something out of tree as well... Correct, CFI would *never* trigger a failure due to passing int as long. It triggers only on prototype-implementation mismatches. The author should go and check carefully whether there are any places where some implementation differs and then ::ki_complete() gets passed with a function typecast. Also, there can be places where a proto has an argument as enum and an implementation has it as int. Compilers don't warn on such mismatches, CFI does. The latest LLVM Git snapshot with `-Wcast-function-type-strict` enabled could help hunt such. > > regards, > dan carpenter Thanks, Olek
On Thu, Dec 22, 2022 at 06:21:03PM +0530, Prashanth K wrote: > > > On 14-12-22 11:05 pm, David Laight wrote: > > From: Greg Kroah-Hartman > > > Sent: 12 December 2022 13:35 > > > > > > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > > > > Function pointer ki_complete() expects 'long' as its second > > > > argument, but we pass integer from ffs_user_copy_worker. This > > > > might cause a CFI failure, as ki_complete is an indirect call > > > > with mismatched prototype. Fix this by typecasting the second > > > > argument to long. > > > > > > "might"? Does it or not? If it does, why hasn't this been reported > > > before? > > > > Does the cast even help at all. > Actually I also have these same questions > - why we haven't seen any instances other than this one? > - why its not seen on other indirect function calls? Great, please work on figuring these out before you resubmit this again as obviously we can't take this change without knowing the answers here. good luck! greg k-h
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 73dc10a7..9c26561 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -835,7 +835,7 @@ static void ffs_user_copy_worker(struct work_struct *work) kthread_unuse_mm(io_data->mm); } - io_data->kiocb->ki_complete(io_data->kiocb, ret); + io_data->kiocb->ki_complete(io_data->kiocb, (long)ret); if (io_data->ffs->ffs_eventfd && !kiocb_has_eventfd) eventfd_signal(io_data->ffs->ffs_eventfd, 1);