Message ID | 20221017191611.2577466-1-jane.chu@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp1605140wrs; Mon, 17 Oct 2022 12:20:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6YQ7FCQq2Xu52XOTY9Bx+k38e8cjf74KYqM5jDOAvIbS67WWZKnAQ+vCmk20B3VTtTwyqC X-Received: by 2002:a62:164d:0:b0:562:bc4e:253 with SMTP id 74-20020a62164d000000b00562bc4e0253mr13901688pfw.26.1666034417688; Mon, 17 Oct 2022 12:20:17 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666034417; cv=pass; d=google.com; s=arc-20160816; b=zATX2mswPRubKXhpLIEIFc38mT6hulgXfLxVH2xxsVunDYeZTbV7jrk1WtqpC+AF/+ X6S3KW0CXlnT9F3zeJ+9tcrGKHLV5ZDutOie3Lf4FmRJf71S5GSn6wGwN1p8jt8XIyrk twV4Z3X6pqleadArJqBk64e9IF7jyIZhI2Qf5MZ7kHvH9E9EMULFCmdOcUk1y4Vn9dt/ g+8kbR6DhTGq4Q40cnst4ofEF/V9c5E4x+xHMQAMoq4Qi+p8DYyzBOnqrFadp0zflxfb gyMD2joMV1qV131Hmh74FWTCcdzROhQ87M9pC3fZYP6H2mIP/k0M0Bu7XbsrNZ5DLJfb CwMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=s6WxsB4P4/NnMWPkhiW++UDBO33GhhI2iU+dl7SeAgQ=; b=QzS64B65WevkbnCleeOT3s0AGWR0cpWE9FiTkFUVG/Dt0qpFHFdGxR6hLMbPP0n7u6 rYb76EKhyWt4YIwxQ+7xgbxZ2dcRryNyxGp/jUihn8T7Jtok4ssBC3SDlpLRD3mNOatd DmGrwv792fgOP1ms2aZuXTH8krY+wUn4+UYJV/OojaroErJtXkFtnKIWLLKERVSH2Duj o7iOqp/P0GrvzCUhAVh/ywbFReNW+UNhwBQDPfPkpxvuLrRrtvhYCIy8lOeC03rP8nmb o0RYCr0icsEF5Od+6/hbloonOuTLLt+f7EORljy/aLvzVa8m9hq1f8Yu7wl7hg8vOXp4 f6WA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b=2DByNDsR; dkim=pass header.i=@oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=Rbn8SQOH; arc=pass (i=1 spf=pass spfdomain=oracle.com dkim=pass dkdomain=oracle.com dmarc=pass fromdomain=oracle.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q14-20020a170902dace00b0016ee19bc5f9si14589231plx.553.2022.10.17.12.20.03; Mon, 17 Oct 2022 12:20:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b=2DByNDsR; dkim=pass header.i=@oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=Rbn8SQOH; arc=pass (i=1 spf=pass spfdomain=oracle.com dkim=pass dkdomain=oracle.com dmarc=pass fromdomain=oracle.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229673AbiJQTQz (ORCPT <rfc822;kernel.ruili@gmail.com> + 99 others); Mon, 17 Oct 2022 15:16:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230323AbiJQTQv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 17 Oct 2022 15:16:51 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED1027645F for <linux-kernel@vger.kernel.org>; Mon, 17 Oct 2022 12:16:47 -0700 (PDT) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29HITdQk024075; Mon, 17 Oct 2022 19:16:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : content-type : mime-version; s=corp-2022-7-12; bh=s6WxsB4P4/NnMWPkhiW++UDBO33GhhI2iU+dl7SeAgQ=; b=2DByNDsR5CzLJG8PdFRWp7agK++tgjg8wALbwGFvUllayt4k0zICY9rtQtRVTI0V7wKS XaOQskMH4ughGh5++0goxg2mWCeFD+U5Z7/Ar2zSqOvUyrH1PLWRhOKI6SZXi6G6tIM/ 4cn031PQfEmQ7CdqOawIB1srDUHFel4BB+lon6Q+tAOTfHjWgpxaD98UcrcMIcOFXALI Zqz5EpYzExDJ/BwZuOhlnskQQLMYzGHS5++72nZDGjio2G0WoADabr34BoEWh0UMoeEN HBib9cMiDa1EoyzboIANooiU567Q1zK23kFmvcpCi9ZzbGK6q5ZhEc0K3ZHwlLIq3zHG lQ== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3k7ndtcxrc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 17 Oct 2022 19:16:41 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 29HITwqd019156; Mon, 17 Oct 2022 19:16:40 GMT Received: from nam10-dm6-obe.outbound.protection.outlook.com (mail-dm6nam10lp2102.outbound.protection.outlook.com [104.47.58.102]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3k8j0pjxut-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 17 Oct 2022 19:16:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=clVcvumHLX8RRNZNZFMwwmUOT/XTPGxiOu8Blfy/hQVd+3N9fqhe/pZfZPevIgSpks3lHLekZWOnKTSImK8vHVb9mi9kfrqGPo+1iCRWLE4gRfSM7dpWBJjeHibII3uZh1aIv338ZqHako7jahCREyRP8oq/jnvOCI9nNtTHile5GraKU/ihRPSga9YCQuwZBEJbDUwrIXDO/yJ6P0/bQyynGZpNZCL53ZMVvDEwZJkRJBpdgeIMy4pAkSMkYiCrfSUr/xKF3vS4Zj1rLlKjYMJ1lNt6oHjH6t3bPS5R6iYgGw1WuEiKq5inb6hQIpPMt7leeyrpdWuY8KRTS9tYbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=s6WxsB4P4/NnMWPkhiW++UDBO33GhhI2iU+dl7SeAgQ=; b=CyJ3wmzhIK0OslpsCf8Ryy0y4lwsR3q889HIES2pWcfr2D/tneKByAYIasKKq69DxaGyRQSYXiK50ok79ZIs9Z4bxTRqrWaPeWnDsIRG1bRTZp6JYVNhNQfzdgO0m1zh3AvpYEHlZeH3yY9DePDL/zfNZhWbO0W/kqgNdIu9rY1bt9PAX+M9ySQ+I+lFxWQC+F+6WUXVsqlPkAcrAUzar8m0pK/6AmBfmlcDzD4Eq6ahmPShfEqtj8deKF/v9Nr3NNRoyOzTxEWw9VnOymUXgxS2xbv32MhJgjhZzY2pFm8th4D9mL0ft1hntx+lWZcZg3K16ZmKw2h/XjwyDXqoUA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=s6WxsB4P4/NnMWPkhiW++UDBO33GhhI2iU+dl7SeAgQ=; b=Rbn8SQOHmPAWNlurtDHbpC0z4ymrlTpqrzozCUepPTuP7f5FwmL8bYUANmjO1kEiGpIdVigrRatA15STZ3JPkr0I57fvnQBsVMuYh4HUQwUxaVL1EM2DBLx4zsozWTfKmqNoi6vQMHQFh5hagX65yJD/FttlRK/T3SCiTZlskkY= Received: from SJ0PR10MB4429.namprd10.prod.outlook.com (2603:10b6:a03:2d1::14) by CH0PR10MB4937.namprd10.prod.outlook.com (2603:10b6:610:c5::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5723.30; Mon, 17 Oct 2022 19:16:38 +0000 Received: from SJ0PR10MB4429.namprd10.prod.outlook.com ([fe80::b281:7552:94f5:4606]) by SJ0PR10MB4429.namprd10.prod.outlook.com ([fe80::b281:7552:94f5:4606%7]) with mapi id 15.20.5723.033; Mon, 17 Oct 2022 19:16:38 +0000 From: Jane Chu <jane.chu@oracle.com> To: pmladek@suse.com, rostedt@goodmis.org, senozhatsky@chromium.org, andriy.shevchenko@linux.intel.com, linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org Cc: jane.chu@oracle.com Subject: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference Date: Mon, 17 Oct 2022 13:16:11 -0600 Message-Id: <20221017191611.2577466-1-jane.chu@oracle.com> X-Mailer: git-send-email 2.18.4 Content-Type: text/plain X-ClientProxiedBy: BYAPR11CA0092.namprd11.prod.outlook.com (2603:10b6:a03:f4::33) To SJ0PR10MB4429.namprd10.prod.outlook.com (2603:10b6:a03:2d1::14) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR10MB4429:EE_|CH0PR10MB4937:EE_ X-MS-Office365-Filtering-Correlation-Id: efb280a3-ffc4-46df-1c2e-08dab0741d3c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: emCHekRmf9zSYRHLS9WUVJAgMdEJsmoIre2YhwKHITIBHvnr1jnXqxVYR4QFljfUVzN+FiZqtQoZC4ZgwLY3k8A7xlUc/LM9Vwe491R1p0Fn0nwEAiPYe/RGgDi8C2G/j4/sdrUl/RAbcLjoCdBgNXJ4izmOwdtX+XA6lO4xj+qHOagmZ66YxZVWTI/IzRkqZ5zkSZYrDRpC2UbmNPy4UW7HAm8X7j64J4n9vt6noMPonr5H2bjhew6VKTDLmgUYj/UcfFDmy9Oq7uNqG2ChOEa6v9YMonzez4F04gjqmnv2hluQASKmKqn8WEkOCeZP1L+sSWuZbHtUPBsp5RC6FCznguSIF6ywEHT8VxjWKC7JvOub8WdxIq/xTH8M3n8596LE/QcFS+t2QeEWmyfw7u9qhV8TSc5nIFVWwHMjsje/yqBI//vfPeulxC21m7fY6vmcunYCKWklaaCZx7yQ+LtP4yirSdk41cIIYF7Vilux/HB+xXMk66AwB2Ux8baKcXbv1sPxTASzNuc/imGleaOgXbhpthTfLNxuo6TDQ3q88TvWpjBhxNr2qSvdeiP2RzIFfNM2W5RmgnufwxmMZsyHzAyxjhvzUUZEr7DSA01sOLlgfdxFHpUELuGFa82BHJFc5CSJVRYJ/glTE8rz9Hnggdxgkin1eO8HLuUFOijPhx01QtYHw+o/M540c3PTb6X91uMNxtPpZWJryJX65A== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR10MB4429.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(366004)(346002)(396003)(376002)(136003)(39860400002)(451199015)(186003)(316002)(36756003)(2616005)(44832011)(4326008)(6506007)(2906002)(5660300002)(6512007)(86362001)(8936002)(52116002)(83380400001)(41300700001)(66476007)(6666004)(66946007)(107886003)(66556008)(4744005)(8676002)(1076003)(6486002)(478600001)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: mvaCSVNL+KA6/zzzrU7T04+FLGO0vJPpTgehYJuKnb2tHxxyqUjcr4XShIKwRB/KwjiMG5zy3Vu4qQaWDs0JxYRq6hZxjbYpNwpXan/2TCcAKWSenKvn5t13Aeov8YdJmPGi1dMYjrOCt5JIY/JJ373JNo/PYMH9ca+kfQ5SyEZ9ej/pt0+Xh339OqPlJMH3my6XtXxm5wHTjWpdbtJH8LqH+5dytfK/xhkF2GbSp1R+q6C1M3YybfBK5IvEUpjCVQJBJDVMdFrUvHdj+ayMpi6Y3eEsvdilPyxhKNP+25MKKoe210QzL60s6BNjfl87HkK6JmH4MZww4sUNe+Kp+V3KEh4Wp4gLJgtoV6gkUR+Tq0E9r6p8c7WIVsRXQWZzOuVQJLIVxR4mPPIv3W3c3u4oQ6XY/a14zhEv5hl8M3wogA5ihuvYhBviId0deDWnUu4+dHqncXjfa0o2NQeoQtmk7wWv65HOkiIKNfb5UDxQIFvRXwhN+XFYmoxDT3m+59bo5qOFWH2z2LzrrwKhoK8UlKlwTZ5j4OJw+5NjkwmId41EgBDL8F8EqGzvplsZhodT8Y57JHIKIyoMfkjrFcWbJGsHaGC9QgB8VeF5y0a7C/nq5JnuHjf/wXd+y6aT/QYLYPt1kVoGpcziHGbrNpKEEDS/I5Y9nzzA30ggwqb3RYBYlkiRdh1MfPAc5WP4AWyf4YGo+8DHsw0xQHRxdaYIW/vgnsvYeU4gaT7R0z2r9ERvI3QE7dKDSg1//dPz3tZLtxsNJxtXFVLm7prapjsgrx3TKBHdky4nIWloGqx+cujUpalfWks+HhXl6mDBpDy62uaIOHwE6ZfiOXpQC0gumEkjRUWK8YoL/u1C6JQXKECbvN9mtedcaW93yiG8u3Af7dWF1ul0sXge8+kPh/AUv34iNZMyDn9ezTbFOZMLXU9J2U6/G90ntgiRay63hg0cD5FdxuXpjWTqLL15Lt6QMpYCVBnaoJKZzjbHr7K38uh58u1JjxHKwvCf3rX64eyPuJDZhPbPf7Iu2ArAjGzOPF5BjS4ds2jDjEPz8BM3bU3uIaEVpOfYmP0oK9bBZE8//ypFhU6oTeECqsI58Ghzfncu1rI9C7PQEhwcoWqRwwV0u/wE3horGZ9nIl98H1uo+NAxef6yCmTn1u6B3R82T0WpA7WS73/jiSSu6w+/Se9f4f0fCO1YEidjOj9QZDIY0bwAEzVToxVdDdXGtx9zlZ+jCsl0guYjIthJALp8dBSTztx2MaraWHFeIKZiPH8sIFUsPveETjbOkVhl6OtAR3TYuMazy5GTWjQFk4xvBIB6LI0LilOcPhZFXJ4qj6mqePUG4Bsb5ygDF6oKL/fnyCAiGgXAl8Vk9Vx4C6XU3HKQEbInaaLvGtn0HEf4xCji2KZn9CNiEPuWuigaD15/HhVvzqLPEUWGseAgFTmGvFewJ4qRn6aLU7B9D0figGBAFGRFaf7flEt+xmONv5qGonRztHBnenShv2RiUuAbAfIX+oInDJRNwSLs8EBgutsXHYnLkR7+vzxTfS6LNpmOuNfZbShMVzQDVQYrJgv3UTobePACeEZR7kfxtaNFvOyHQKhGjh+fF2Gr1xynng== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: efb280a3-ffc4-46df-1c2e-08dab0741d3c X-MS-Exchange-CrossTenant-AuthSource: SJ0PR10MB4429.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Oct 2022 19:16:38.0926 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: OEuI13nD1IibcEA2oVRn8U8mBT2IlaN0qwTsbHbmca7LTJhOOwXLOOwpdrtX2Xq1XVN2ip4Xm/jYM2nIoVhJew== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR10MB4937 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-17_13,2022-10-17_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 bulkscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2210170111 X-Proofpoint-ORIG-GUID: gDxEXUwJUlumqUIAzW9-No7ubg4I8x6V X-Proofpoint-GUID: gDxEXUwJUlumqUIAzW9-No7ubg4I8x6V X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746963705611837473?= X-GMAIL-MSGID: =?utf-8?q?1746963705611837473?= |
Series |
vsprintf: protect kernel from panic due to non-canonical pointer dereference
|
|
Commit Message
Jane Chu
Oct. 17, 2022, 7:16 p.m. UTC
While debugging a separate issue, it was found that an invalid string
pointer could very well contain a non-canical address, such as
0x7665645f63616465. In that case, this line of defense isn't enough
to protect the kernel from crashing due to general protection fault
if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
return "(efault)";
So instead, use kern_addr_valid() to validate the string pointer.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
lib/vsprintf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > While debugging a separate issue, it was found that an invalid string > pointer could very well contain a non-canical address, such as > 0x7665645f63616465. In that case, this line of defense isn't enough > to protect the kernel from crashing due to general protection fault > > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > return "(efault)"; > > So instead, use kern_addr_valid() to validate the string pointer. How did you check that value of the (invalid string) pointer?
On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >> While debugging a separate issue, it was found that an invalid string >> pointer could very well contain a non-canical address, such as >> 0x7665645f63616465. In that case, this line of defense isn't enough >> to protect the kernel from crashing due to general protection fault >> >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >> return "(efault)"; >> >> So instead, use kern_addr_valid() to validate the string pointer. > > How did you check that value of the (invalid string) pointer? > In the bug scenario, the invalid string pointer was an out-of-bound string pointer. While the OOB referencing is fixed, the lingering issue is that the kernel ought to be able to protect itself, as the pointer contains a non-canonical address. That said, I realized that not all architecture implement meaningful kern_addr_valid(), so this line if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) is still need. I'll send v2. thanks, -jane >
On Mon 2022-10-17 19:31:53, Jane Chu wrote: > On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > > On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > >> While debugging a separate issue, it was found that an invalid string > >> pointer could very well contain a non-canical address, such as > >> 0x7665645f63616465. In that case, this line of defense isn't enough > >> to protect the kernel from crashing due to general protection fault > >> > >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > >> return "(efault)"; > >> > >> So instead, use kern_addr_valid() to validate the string pointer. > > > > How did you check that value of the (invalid string) pointer? > > > > In the bug scenario, the invalid string pointer was an out-of-bound > string pointer. While the OOB referencing is fixed, Could you please provide more details about the fixed OOB? What exact vsprintf()/printk() call was broken and eventually how it was fixed, please? > the lingering issue > is that the kernel ought to be able to protect itself, as the pointer > contains a non-canonical address. Was the pointer used only by the vsprintf()? Or was it accessed also by another code, please? I wonder if this patch would prevent the crash or if the broken kernel would crash later anyway. > That said, I realized that not all > architecture implement meaningful kern_addr_valid(), so this line > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > is still need. I'll send v2. Please, add linux-mm@kvack.org into CC. I wonder if kern_addr_valid() is safe to use anywhere, especially during early boot. I wonder if it would make sense to implement it on all architectures. Best Regards, Petr
On 10/18/2022 5:45 AM, Petr Mladek wrote: > On Mon 2022-10-17 19:31:53, Jane Chu wrote: >> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: >>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >>>> While debugging a separate issue, it was found that an invalid string >>>> pointer could very well contain a non-canical address, such as >>>> 0x7665645f63616465. In that case, this line of defense isn't enough >>>> to protect the kernel from crashing due to general protection fault >>>> >>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>> return "(efault)"; >>>> >>>> So instead, use kern_addr_valid() to validate the string pointer. >>> >>> How did you check that value of the (invalid string) pointer? >>> >> >> In the bug scenario, the invalid string pointer was an out-of-bound >> string pointer. While the OOB referencing is fixed, > > Could you please provide more details about the fixed OOB? > What exact vsprintf()/printk() call was broken and eventually > how it was fixed, please? For sensitive reason, I'd like to avoid mentioning the specific name of the sysfs attribute in the bug, instead, just call it "devX_attrY[]", and describe the precise nature of the issue. devX_attrY[] is a string array, declared and filled at compile time, like const char const devX_attrY[] = { [ATTRY_A] = "Dev X AttributeY A", [ATTRY_B] = "Dev X AttributeY B", ... [ATTRY_G] = "Dev X AttributeY G", } such that, when user "cat /sys/devices/systems/.../attry_1", "Dev X AttributeY B" will show up in the terminal. That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. The bug was that the index to the array was wrongfully produced, leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the calculation and that is not an upstream fix. > >> the lingering issue >> is that the kernel ought to be able to protect itself, as the pointer >> contains a non-canonical address. > > Was the pointer used only by the vsprintf()? > Or was it accessed also by another code, please? The OOB pointer was used only by vsprintf() for the "cat" sysfs case. No other code uses the OOB pointer, verified both by code examination and test. Here is a snippet of the crash backtrace from an instrumented kernel, scratched one line for sensitive reason - crash> bt PID: 3250 TASK: ffff9cb50fe23d80 CPU: 18 COMMAND: "cat" #0 [ffffc0bacf377998] machine_kexec at ffffffff9b06c7c1 #1 [ffffc0bacf3779f8] __crash_kexec at ffffffff9b13bb52 #2 [ffffc0bacf377ac8] crash_kexec at ffffffff9b13cdac #3 [ffffc0bacf377ae8] oops_end at ffffffff9b03357a #4 [ffffc0bacf377b10] die at ffffffff9b033c32 #5 [ffffc0bacf377b40] do_general_protection at ffffffff9b030c52 #6 [ffffc0bacf377b70] general_protection at ffffffff9ba03db4 [exception RIP: string_nocheck+19] RIP: ffffffff9b87cc73 RSP: ffffc0bacf377c20 RFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff9da13fc17fff RCX: ffff0a00ffffff04 RDX: 726f635f63616465 RSI: ffff9da13fc17fff RDI: ffffffffffffffff RBP: ffffc0bacf377c20 R8: ffff9da0bfd2f010 R9: ffff9da0bfc18000 R10: 0000000000001000 R11: 0000000000000000 R12: 726f635f63616465 R13: ffff0a00ffffff04 R14: ffffffff9c1a6a4f R15: ffffffff9c1a6a4f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffc0bacf377c28] string at ffffffff9b87ce98 #8 [ffffc0bacf377c58] vsnprintf at ffffffff9b87efe3 #9 [ffffc0bacf377cb8] sprintf at ffffffff9b87f506 #10 [ffffc0bacf377d18] <------------------------------> #11 [ffffc0bacf377d28] dev_attr_show at ffffffff9b56d183 #12 [ffffc0bacf377d48] sysfs_kf_seq_show at ffffffff9b3272dc #13 [ffffc0bacf377d68] kernfs_seq_show at ffffffff9b32576c #14 [ffffc0bacf377d78] seq_read at ffffffff9b2be407 #15 [ffffc0bacf377de8] kernfs_fop_read at ffffffff9b325ffe #16 [ffffc0bacf377e28] __vfs_read at ffffffff9b2940ea #17 [ffffc0bacf377eb0] vfs_read at ffffffff9b2942ac #18 [ffffc0bacf377ee0] sys_read at ffffffff9b29485c #19 [ffffc0bacf377f28] do_syscall_64 at ffffffff9b003ca9 #20 [ffffc0bacf377f50] entry_SYSCALL_64_after_hwframe at ffffffff9ba001b1 crash> dis ffffffff9b87cc73 0xffffffff9b87cc73 <string_nocheck+19>: movzbl (%rdx),%r8d and RDX: 726f635f63616465 was a non-canonical address. After applying this patch to the instrumented kernel, instead of panic, the "cat" command produced "(efault)" > > I wonder if this patch would prevent the crash or if the broken > kernel would crash later anyway. A broken kernel has a different issue to be fixed, the upstream kernel isn't broken, it could just offer better protect in case a bug was introduced in future. > >> That said, I realized that not all >> architecture implement meaningful kern_addr_valid(), so this line >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >> is still need. I'll send v2. > > Please, add linux-mm@kvack.org into CC. Will do. > I wonder if kern_addr_valid() > is safe to use anywhere, especially during early boot. I wonder if > it would make sense to implement it on all architectures. On x86 architecture, kern_addr_valid() looks safe to me though, on several other architectures, it's defined (1). > > Best Regards, > Petr Thanks! -jane
Hi-- On 10/18/22 11:56, Jane Chu wrote: > On 10/18/2022 5:45 AM, Petr Mladek wrote: >> On Mon 2022-10-17 19:31:53, Jane Chu wrote: >>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: >>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >>>>> While debugging a separate issue, it was found that an invalid string >>>>> pointer could very well contain a non-canical address, such as >>>>> 0x7665645f63616465. In that case, this line of defense isn't enough >>>>> to protect the kernel from crashing due to general protection fault >>>>> >>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>>> return "(efault)"; >>>>> >>>>> So instead, use kern_addr_valid() to validate the string pointer. >>>> >>>> How did you check that value of the (invalid string) pointer? >>>> >>> >>> In the bug scenario, the invalid string pointer was an out-of-bound >>> string pointer. While the OOB referencing is fixed, >> >> Could you please provide more details about the fixed OOB? >> What exact vsprintf()/printk() call was broken and eventually >> how it was fixed, please? > > For sensitive reason, I'd like to avoid mentioning the specific name of > the sysfs attribute in the bug, instead, just call it "devX_attrY[]", > and describe the precise nature of the issue. > > devX_attrY[] is a string array, declared and filled at compile time, > like > const char const devX_attrY[] = { > [ATTRY_A] = "Dev X AttributeY A", > [ATTRY_B] = "Dev X AttributeY B", > ... > [ATTRY_G] = "Dev X AttributeY G", > } > such that, when user "cat /sys/devices/systems/.../attry_1", > "Dev X AttributeY B" will show up in the terminal. > That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. > > The bug was that the index to the array was wrongfully produced, > leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the > calculation and that is not an upstream fix. > >> >>> the lingering issue >>> is that the kernel ought to be able to protect itself, as the pointer >>> contains a non-canonical address. >> >> Was the pointer used only by the vsprintf()? >> Or was it accessed also by another code, please? > > The OOB pointer was used only by vsprintf() for the "cat" sysfs case. > No other code uses the OOB pointer, verified both by code examination > and test. > > Here is a snippet of the crash backtrace from an instrumented kernel, > scratched one line for sensitive reason - > > crash> bt > PID: 3250 TASK: ffff9cb50fe23d80 CPU: 18 COMMAND: "cat" > #0 [ffffc0bacf377998] machine_kexec at ffffffff9b06c7c1 > #1 [ffffc0bacf3779f8] __crash_kexec at ffffffff9b13bb52 > #2 [ffffc0bacf377ac8] crash_kexec at ffffffff9b13cdac > #3 [ffffc0bacf377ae8] oops_end at ffffffff9b03357a > #4 [ffffc0bacf377b10] die at ffffffff9b033c32 > #5 [ffffc0bacf377b40] do_general_protection at ffffffff9b030c52 > #6 [ffffc0bacf377b70] general_protection at ffffffff9ba03db4 > [exception RIP: string_nocheck+19] > RIP: ffffffff9b87cc73 RSP: ffffc0bacf377c20 RFLAGS: 00010286 > RAX: 0000000000000000 RBX: ffff9da13fc17fff RCX: ffff0a00ffffff04 > RDX: 726f635f63616465 RSI: ffff9da13fc17fff RDI: ffffffffffffffff > RBP: ffffc0bacf377c20 R8: ffff9da0bfd2f010 R9: ffff9da0bfc18000 > R10: 0000000000001000 R11: 0000000000000000 R12: 726f635f63616465 > R13: ffff0a00ffffff04 R14: ffffffff9c1a6a4f R15: ffffffff9c1a6a4f > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffffc0bacf377c28] string at ffffffff9b87ce98 > #8 [ffffc0bacf377c58] vsnprintf at ffffffff9b87efe3 > #9 [ffffc0bacf377cb8] sprintf at ffffffff9b87f506 > #10 [ffffc0bacf377d18] <------------------------------> > #11 [ffffc0bacf377d28] dev_attr_show at ffffffff9b56d183 > #12 [ffffc0bacf377d48] sysfs_kf_seq_show at ffffffff9b3272dc > #13 [ffffc0bacf377d68] kernfs_seq_show at ffffffff9b32576c > #14 [ffffc0bacf377d78] seq_read at ffffffff9b2be407 > #15 [ffffc0bacf377de8] kernfs_fop_read at ffffffff9b325ffe > #16 [ffffc0bacf377e28] __vfs_read at ffffffff9b2940ea > #17 [ffffc0bacf377eb0] vfs_read at ffffffff9b2942ac > #18 [ffffc0bacf377ee0] sys_read at ffffffff9b29485c > #19 [ffffc0bacf377f28] do_syscall_64 at ffffffff9b003ca9 > #20 [ffffc0bacf377f50] entry_SYSCALL_64_after_hwframe at ffffffff9ba001b1 > > crash> dis ffffffff9b87cc73 > 0xffffffff9b87cc73 <string_nocheck+19>: movzbl (%rdx),%r8d > > and RDX: 726f635f63616465 was a non-canonical address. > > After applying this patch to the instrumented kernel, instead of panic, > the "cat" command produced "(efault)" > >> >> I wonder if this patch would prevent the crash or if the broken >> kernel would crash later anyway. > > A broken kernel has a different issue to be fixed, the upstream kernel > isn't broken, it could just offer better protect in case a bug was > introduced in future. > >> >>> That said, I realized that not all >>> architecture implement meaningful kern_addr_valid(), so this line >>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>> is still need. I'll send v2. >> >> Please, add linux-mm@kvack.org into CC. > > Will do. > >> I wonder if kern_addr_valid() >> is safe to use anywhere, especially during early boot. I wonder if >> it would make sense to implement it on all architectures. > > On x86 architecture, kern_addr_valid() looks safe to me though, on > several other architectures, it's defined (1). You might want to compare this patch, which seems to have some support: https://lore.kernel.org/lkml/20221018074014.185687-1-wangkefeng.wang@huawei.com/
On 10/18/2022 12:28 PM, Randy Dunlap wrote: > Hi-- > [..] >>>> That said, I realized that not all >>>> architecture implement meaningful kern_addr_valid(), so this line >>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>> is still need. I'll send v2. >>> >>> Please, add linux-mm@kvack.org into CC. >> >> Will do. >> >>> I wonder if kern_addr_valid() >>> is safe to use anywhere, especially during early boot. I wonder if >>> it would make sense to implement it on all architectures. >> >> On x86 architecture, kern_addr_valid() looks safe to me though, on >> several other architectures, it's defined (1). > > You might want to compare this patch, which seems to have some support: > > https://lore.kernel.org/lkml/20221018074014.185687-1-wangkefeng.wang@huawei.com/ > Thank you for alerting me, appreciated! The patch comment says "copy_from_kernel_nofault() which could check whether the address is a valid kernel address, so no need kern_addr_valid()", I'm afraid copy_from_kernel_nofault() is more of a heavy hammer, and less appropriate for this patch. I'll take a closer look before responding to the submitter. thanks! -jane
On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: > On 10/18/2022 5:45 AM, Petr Mladek wrote: > > On Mon 2022-10-17 19:31:53, Jane Chu wrote: > >> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > >>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > >>>> While debugging a separate issue, it was found that an invalid string > >>>> pointer could very well contain a non-canical address, such as > >>>> 0x7665645f63616465. In that case, this line of defense isn't enough > >>>> to protect the kernel from crashing due to general protection fault > >>>> > >>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > >>>> return "(efault)"; > >>>> > >>>> So instead, use kern_addr_valid() to validate the string pointer. > >>> > >>> How did you check that value of the (invalid string) pointer? > >>> > >> > >> In the bug scenario, the invalid string pointer was an out-of-bound > >> string pointer. While the OOB referencing is fixed, > > > > Could you please provide more details about the fixed OOB? > > What exact vsprintf()/printk() call was broken and eventually > > how it was fixed, please? > > For sensitive reason, I'd like to avoid mentioning the specific name of > the sysfs attribute in the bug, instead, just call it "devX_attrY[]", > and describe the precise nature of the issue. > > devX_attrY[] is a string array, declared and filled at compile time, > like > const char const devX_attrY[] = { > [ATTRY_A] = "Dev X AttributeY A", > [ATTRY_B] = "Dev X AttributeY B", > ... > [ATTRY_G] = "Dev X AttributeY G", > } > such that, when user "cat /sys/devices/systems/.../attry_1", > "Dev X AttributeY B" will show up in the terminal. > That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. > > The bug was that the index to the array was wrongfully produced, > leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the > calculation and that is not an upstream fix. > > > > >> the lingering issue > >> is that the kernel ought to be able to protect itself, as the pointer > >> contains a non-canonical address. > > > > Was the pointer used only by the vsprintf()? > > Or was it accessed also by another code, please? > > The OOB pointer was used only by vsprintf() for the "cat" sysfs case. > No other code uses the OOB pointer, verified both by code examination > and test. So, then the vsprintf() is _the_ point to crash and why should we hide that? Because of the crash you found the culprit, right? The efault will hide very important details. So to me it sounds like I like this change less and less... > Here is a snippet of the crash backtrace from an instrumented kernel, > scratched one line for sensitive reason - > > crash> bt > PID: 3250 TASK: ffff9cb50fe23d80 CPU: 18 COMMAND: "cat" > #0 [ffffc0bacf377998] machine_kexec at ffffffff9b06c7c1 > #1 [ffffc0bacf3779f8] __crash_kexec at ffffffff9b13bb52 > #2 [ffffc0bacf377ac8] crash_kexec at ffffffff9b13cdac > #3 [ffffc0bacf377ae8] oops_end at ffffffff9b03357a > #4 [ffffc0bacf377b10] die at ffffffff9b033c32 > #5 [ffffc0bacf377b40] do_general_protection at ffffffff9b030c52 > #6 [ffffc0bacf377b70] general_protection at ffffffff9ba03db4 > [exception RIP: string_nocheck+19] > RIP: ffffffff9b87cc73 RSP: ffffc0bacf377c20 RFLAGS: 00010286 > RAX: 0000000000000000 RBX: ffff9da13fc17fff RCX: ffff0a00ffffff04 > RDX: 726f635f63616465 RSI: ffff9da13fc17fff RDI: ffffffffffffffff > RBP: ffffc0bacf377c20 R8: ffff9da0bfd2f010 R9: ffff9da0bfc18000 > R10: 0000000000001000 R11: 0000000000000000 R12: 726f635f63616465 > R13: ffff0a00ffffff04 R14: ffffffff9c1a6a4f R15: ffffffff9c1a6a4f > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffffc0bacf377c28] string at ffffffff9b87ce98 > #8 [ffffc0bacf377c58] vsnprintf at ffffffff9b87efe3 > #9 [ffffc0bacf377cb8] sprintf at ffffffff9b87f506 > #10 [ffffc0bacf377d18] <------------------------------> > #11 [ffffc0bacf377d28] dev_attr_show at ffffffff9b56d183 > #12 [ffffc0bacf377d48] sysfs_kf_seq_show at ffffffff9b3272dc > #13 [ffffc0bacf377d68] kernfs_seq_show at ffffffff9b32576c > #14 [ffffc0bacf377d78] seq_read at ffffffff9b2be407 > #15 [ffffc0bacf377de8] kernfs_fop_read at ffffffff9b325ffe > #16 [ffffc0bacf377e28] __vfs_read at ffffffff9b2940ea > #17 [ffffc0bacf377eb0] vfs_read at ffffffff9b2942ac > #18 [ffffc0bacf377ee0] sys_read at ffffffff9b29485c > #19 [ffffc0bacf377f28] do_syscall_64 at ffffffff9b003ca9 > #20 [ffffc0bacf377f50] entry_SYSCALL_64_after_hwframe at ffffffff9ba001b1 > > crash> dis ffffffff9b87cc73 > 0xffffffff9b87cc73 <string_nocheck+19>: movzbl (%rdx),%r8d > > and RDX: 726f635f63616465 was a non-canonical address. > > After applying this patch to the instrumented kernel, instead of panic, > the "cat" command produced "(efault)" > > > > > I wonder if this patch would prevent the crash or if the broken > > kernel would crash later anyway. > > A broken kernel has a different issue to be fixed, the upstream kernel > isn't broken, it could just offer better protect in case a bug was > introduced in future.
On 10/18/2022 1:07 PM, Andy Shevchenko wrote: > On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: >> On 10/18/2022 5:45 AM, Petr Mladek wrote: >>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: >>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: >>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >>>>>> While debugging a separate issue, it was found that an invalid string >>>>>> pointer could very well contain a non-canical address, such as >>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough >>>>>> to protect the kernel from crashing due to general protection fault >>>>>> >>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>>>> return "(efault)"; >>>>>> >>>>>> So instead, use kern_addr_valid() to validate the string pointer. >>>>> >>>>> How did you check that value of the (invalid string) pointer? >>>>> >>>> >>>> In the bug scenario, the invalid string pointer was an out-of-bound >>>> string pointer. While the OOB referencing is fixed, >>> >>> Could you please provide more details about the fixed OOB? >>> What exact vsprintf()/printk() call was broken and eventually >>> how it was fixed, please? >> >> For sensitive reason, I'd like to avoid mentioning the specific name of >> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", >> and describe the precise nature of the issue. >> >> devX_attrY[] is a string array, declared and filled at compile time, >> like >> const char const devX_attrY[] = { >> [ATTRY_A] = "Dev X AttributeY A", >> [ATTRY_B] = "Dev X AttributeY B", >> ... >> [ATTRY_G] = "Dev X AttributeY G", >> } >> such that, when user "cat /sys/devices/systems/.../attry_1", >> "Dev X AttributeY B" will show up in the terminal. >> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. >> >> The bug was that the index to the array was wrongfully produced, >> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the >> calculation and that is not an upstream fix. >> >>> >>>> the lingering issue >>>> is that the kernel ought to be able to protect itself, as the pointer >>>> contains a non-canonical address. >>> >>> Was the pointer used only by the vsprintf()? >>> Or was it accessed also by another code, please? >> >> The OOB pointer was used only by vsprintf() for the "cat" sysfs case. >> No other code uses the OOB pointer, verified both by code examination >> and test. > > So, then the vsprintf() is _the_ point to crash and why should we hide that? > Because of the crash you found the culprit, right? The efault will hide very > important details. > > So to me it sounds like I like this change less and less... What about the existing check if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) return "(efault)"; ? In an experiment just to print the raw OOB pointer values, I saw below (the devX attrY stuff are substitutes of the real attributes, other values and strings are verbatim copy from "dmesg"): [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G [ 3002.772349] devX_attrY[29]: (0) (null) [ 3002.772350] devX_attrY[30]: (0) (null) [ 3002.772351] devX_attrY[31]: (0) (null) [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval) [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval) [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval) [ 3002.772355] devX_attrY[35]: (746e75) (einval) where starting from index 29 are all OOB pointers. As you can see, if the OOBs are NULL, "(null)" was printed due to the existing checking, but when the OOBs are turned to non-canonical which is detectable, the fact the pointer value deviates from (ffffffff84d60aee + 4 * sizeof(void *)) evidently shown that the OOBs are detectable. The question then is why should the non-canonical OOBs be treated differently from NULL and ERR_VALUE? Thanks, -jane
On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: > On 10/18/2022 1:07 PM, Andy Shevchenko wrote: > > On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: > >> On 10/18/2022 5:45 AM, Petr Mladek wrote: > >>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: > >>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > >>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > >>>>>> While debugging a separate issue, it was found that an invalid string > >>>>>> pointer could very well contain a non-canical address, such as > >>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough > >>>>>> to protect the kernel from crashing due to general protection fault > >>>>>> > >>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > >>>>>> return "(efault)"; > >>>>>> > >>>>>> So instead, use kern_addr_valid() to validate the string pointer. > >>>>> > >>>>> How did you check that value of the (invalid string) pointer? > >>>>> > >>>> > >>>> In the bug scenario, the invalid string pointer was an out-of-bound > >>>> string pointer. While the OOB referencing is fixed, > >>> > >>> Could you please provide more details about the fixed OOB? > >>> What exact vsprintf()/printk() call was broken and eventually > >>> how it was fixed, please? > >> > >> For sensitive reason, I'd like to avoid mentioning the specific name of > >> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", > >> and describe the precise nature of the issue. > >> > >> devX_attrY[] is a string array, declared and filled at compile time, > >> like > >> const char const devX_attrY[] = { > >> [ATTRY_A] = "Dev X AttributeY A", > >> [ATTRY_B] = "Dev X AttributeY B", > >> ... > >> [ATTRY_G] = "Dev X AttributeY G", > >> } > >> such that, when user "cat /sys/devices/systems/.../attry_1", > >> "Dev X AttributeY B" will show up in the terminal. > >> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. > >> > >> The bug was that the index to the array was wrongfully produced, > >> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the > >> calculation and that is not an upstream fix. > >> > >>> > >>>> the lingering issue > >>>> is that the kernel ought to be able to protect itself, as the pointer > >>>> contains a non-canonical address. > >>> > >>> Was the pointer used only by the vsprintf()? > >>> Or was it accessed also by another code, please? > >> > >> The OOB pointer was used only by vsprintf() for the "cat" sysfs case. > >> No other code uses the OOB pointer, verified both by code examination > >> and test. > > > > So, then the vsprintf() is _the_ point to crash and why should we hide that? > > Because of the crash you found the culprit, right? The efault will hide very > > important details. > > > > So to me it sounds like I like this change less and less... > > What about the existing check > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > return "(efault)"; > ? Because it's _special_. We know that First page is equivalent to a NULL pointer and the last one is dedicated for so called error pointers. There are no more special exceptions to the addresses in the Linux kernel (I don't talk about alignment requirements by the certain architectures). > In an experiment just to print the raw OOB pointer values, I saw below > (the devX attrY stuff are substitutes of the real attributes, other > values and strings are verbatim copy from "dmesg"): > > [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E > [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F > [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G > [ 3002.772349] devX_attrY[29]: (0) (null) > [ 3002.772350] devX_attrY[30]: (0) (null) > [ 3002.772351] devX_attrY[31]: (0) (null) > [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval) > [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval) > [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval) > [ 3002.772355] devX_attrY[35]: (746e75) (einval) > > where starting from index 29 are all OOB pointers. > > As you can see, if the OOBs are NULL, "(null)" was printed due to the > existing checking, but when the OOBs are turned to non-canonical which > is detectable, the fact the pointer value deviates from > (ffffffff84d60aee + 4 * sizeof(void *)) > evidently shown that the OOBs are detectable. > > The question then is why should the non-canonical OOBs be treated > differently from NULL and ERR_VALUE? Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need to see a bug as early as possible?
> On 18 Oct 2022, at 22:49, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: >> On 10/18/2022 1:07 PM, Andy Shevchenko wrote: >>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: >>>> On 10/18/2022 5:45 AM, Petr Mladek wrote: >>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: >>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: >>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >>>>>>>> While debugging a separate issue, it was found that an invalid string >>>>>>>> pointer could very well contain a non-canical address, such as >>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough >>>>>>>> to protect the kernel from crashing due to general protection fault >>>>>>>> >>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>>>>>> return "(efault)"; >>>>>>>> >>>>>>>> So instead, use kern_addr_valid() to validate the string pointer. >>>>>>> >>>>>>> How did you check that value of the (invalid string) pointer? >>>>>>> >>>>>> >>>>>> In the bug scenario, the invalid string pointer was an out-of-bound >>>>>> string pointer. While the OOB referencing is fixed, >>>>> >>>>> Could you please provide more details about the fixed OOB? >>>>> What exact vsprintf()/printk() call was broken and eventually >>>>> how it was fixed, please? >>>> >>>> For sensitive reason, I'd like to avoid mentioning the specific name of >>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", >>>> and describe the precise nature of the issue. >>>> >>>> devX_attrY[] is a string array, declared and filled at compile time, >>>> like >>>> const char const devX_attrY[] = { >>>> [ATTRY_A] = "Dev X AttributeY A", >>>> [ATTRY_B] = "Dev X AttributeY B", >>>> ... >>>> [ATTRY_G] = "Dev X AttributeY G", >>>> } >>>> such that, when user "cat /sys/devices/systems/.../attry_1", >>>> "Dev X AttributeY B" will show up in the terminal. >>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. >>>> >>>> The bug was that the index to the array was wrongfully produced, >>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the >>>> calculation and that is not an upstream fix. >>>> >>>>> >>>>>> the lingering issue >>>>>> is that the kernel ought to be able to protect itself, as the pointer >>>>>> contains a non-canonical address. >>>>> >>>>> Was the pointer used only by the vsprintf()? >>>>> Or was it accessed also by another code, please? >>>> >>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case. >>>> No other code uses the OOB pointer, verified both by code examination >>>> and test. >>> >>> So, then the vsprintf() is _the_ point to crash and why should we hide that? >>> Because of the crash you found the culprit, right? The efault will hide very >>> important details. >>> >>> So to me it sounds like I like this change less and less... >> >> What about the existing check >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >> return "(efault)"; >> ? > > Because it's _special_. We know that First page is equivalent to a NULL pointer > and the last one is dedicated for so called error pointers. There are no more > special exceptions to the addresses in the Linux kernel (I don't talk about > alignment requirements by the certain architectures). > >> In an experiment just to print the raw OOB pointer values, I saw below >> (the devX attrY stuff are substitutes of the real attributes, other >> values and strings are verbatim copy from "dmesg"): >> >> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E >> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F >> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G >> [ 3002.772349] devX_attrY[29]: (0) (null) >> [ 3002.772350] devX_attrY[30]: (0) (null) >> [ 3002.772351] devX_attrY[31]: (0) (null) >> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval) >> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval) >> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval) >> [ 3002.772355] devX_attrY[35]: (746e75) (einval) >> >> where starting from index 29 are all OOB pointers. >> >> As you can see, if the OOBs are NULL, "(null)" was printed due to the >> existing checking, but when the OOBs are turned to non-canonical which >> is detectable, the fact the pointer value deviates from >> (ffffffff84d60aee + 4 * sizeof(void *)) >> evidently shown that the OOBs are detectable. >> >> The question then is why should the non-canonical OOBs be treated >> differently from NULL and ERR_VALUE? > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > to see a bug as early as possible? If you follow that argument, why doesn't the kernel crash when the pointer is, e.g., a NULL pointer? According to you, shouldn't it crash a early as possible in that case also? Thxs, HÃ¥kon
On Wed, Oct 19, 2022 at 10:43:07AM +0000, Haakon Bugge wrote: > > On 18 Oct 2022, at 22:49, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: ... > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > > to see a bug as early as possible? > > If you follow that argument, why doesn't the kernel crash when the pointer > is, e.g., a NULL pointer? According to you, shouldn't it crash a early as > possible in that case also? Because it is _special_. It's not just an invalid pointer. There may be very well good cases where we supply (valid!) NULL pointers to the printf().
On 10/18/2022 1:49 PM, Andy Shevchenko wrote: > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: >> On 10/18/2022 1:07 PM, Andy Shevchenko wrote: >>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: >>>> On 10/18/2022 5:45 AM, Petr Mladek wrote: >>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: >>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: >>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >>>>>>>> While debugging a separate issue, it was found that an invalid string >>>>>>>> pointer could very well contain a non-canical address, such as >>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough >>>>>>>> to protect the kernel from crashing due to general protection fault >>>>>>>> >>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>>>>>> return "(efault)"; >>>>>>>> >>>>>>>> So instead, use kern_addr_valid() to validate the string pointer. >>>>>>> >>>>>>> How did you check that value of the (invalid string) pointer? >>>>>>> >>>>>> >>>>>> In the bug scenario, the invalid string pointer was an out-of-bound >>>>>> string pointer. While the OOB referencing is fixed, >>>>> >>>>> Could you please provide more details about the fixed OOB? >>>>> What exact vsprintf()/printk() call was broken and eventually >>>>> how it was fixed, please? >>>> >>>> For sensitive reason, I'd like to avoid mentioning the specific name of >>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", >>>> and describe the precise nature of the issue. >>>> >>>> devX_attrY[] is a string array, declared and filled at compile time, >>>> like >>>> const char const devX_attrY[] = { >>>> [ATTRY_A] = "Dev X AttributeY A", >>>> [ATTRY_B] = "Dev X AttributeY B", >>>> ... >>>> [ATTRY_G] = "Dev X AttributeY G", >>>> } >>>> such that, when user "cat /sys/devices/systems/.../attry_1", >>>> "Dev X AttributeY B" will show up in the terminal. >>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. >>>> >>>> The bug was that the index to the array was wrongfully produced, >>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the >>>> calculation and that is not an upstream fix. >>>> >>>>> >>>>>> the lingering issue >>>>>> is that the kernel ought to be able to protect itself, as the pointer >>>>>> contains a non-canonical address. >>>>> >>>>> Was the pointer used only by the vsprintf()? >>>>> Or was it accessed also by another code, please? >>>> >>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case. >>>> No other code uses the OOB pointer, verified both by code examination >>>> and test. >>> >>> So, then the vsprintf() is _the_ point to crash and why should we hide that? >>> Because of the crash you found the culprit, right? The efault will hide very >>> important details. >>> >>> So to me it sounds like I like this change less and less... >> >> What about the existing check >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >> return "(efault)"; >> ? > > Because it's _special_. We know that First page is equivalent to a NULL pointer > and the last one is dedicated for so called error pointers. There are no more > special exceptions to the addresses in the Linux kernel (I don't talk about > alignment requirements by the certain architectures). > >> In an experiment just to print the raw OOB pointer values, I saw below >> (the devX attrY stuff are substitutes of the real attributes, other >> values and strings are verbatim copy from "dmesg"): >> >> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E >> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F >> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G >> [ 3002.772349] devX_attrY[29]: (0) (null) >> [ 3002.772350] devX_attrY[30]: (0) (null) >> [ 3002.772351] devX_attrY[31]: (0) (null) >> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval) >> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval) >> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval) >> [ 3002.772355] devX_attrY[35]: (746e75) (einval) >> >> where starting from index 29 are all OOB pointers. >> >> As you can see, if the OOBs are NULL, "(null)" was printed due to the >> existing checking, but when the OOBs are turned to non-canonical which >> is detectable, the fact the pointer value deviates from >> (ffffffff84d60aee + 4 * sizeof(void *)) >> evidently shown that the OOBs are detectable. >> >> The question then is why should the non-canonical OOBs be treated >> differently from NULL and ERR_VALUE? > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > to see a bug as early as possible? > If the purpose is to see the bug as early as possible, then getting "(efault)" from reading sysfs attribute would serve the purpose, right? The fact an OOB pointer has already being turned into either NULL or non-canonical value implies that *if* kernel code other than vsprintf() references the pointer, it'll crash else where; but *if* no other code referencing the pointer, why crash? thanks, -jane
On Wed, Oct 19, 2022 at 06:36:07PM +0000, Jane Chu wrote: > On 10/18/2022 1:49 PM, Andy Shevchenko wrote: > > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: > >> On 10/18/2022 1:07 PM, Andy Shevchenko wrote: > >>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: > >>>> On 10/18/2022 5:45 AM, Petr Mladek wrote: > >>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: > >>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > >>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > >>>>>>>> While debugging a separate issue, it was found that an invalid string > >>>>>>>> pointer could very well contain a non-canical address, such as > >>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough > >>>>>>>> to protect the kernel from crashing due to general protection fault > >>>>>>>> > >>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > >>>>>>>> return "(efault)"; > >>>>>>>> > >>>>>>>> So instead, use kern_addr_valid() to validate the string pointer. > >>>>>>> > >>>>>>> How did you check that value of the (invalid string) pointer? > >>>>>>> > >>>>>> > >>>>>> In the bug scenario, the invalid string pointer was an out-of-bound > >>>>>> string pointer. While the OOB referencing is fixed, > >>>>> > >>>>> Could you please provide more details about the fixed OOB? > >>>>> What exact vsprintf()/printk() call was broken and eventually > >>>>> how it was fixed, please? > >>>> > >>>> For sensitive reason, I'd like to avoid mentioning the specific name of > >>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", > >>>> and describe the precise nature of the issue. > >>>> > >>>> devX_attrY[] is a string array, declared and filled at compile time, > >>>> like > >>>> const char const devX_attrY[] = { > >>>> [ATTRY_A] = "Dev X AttributeY A", > >>>> [ATTRY_B] = "Dev X AttributeY B", > >>>> ... > >>>> [ATTRY_G] = "Dev X AttributeY G", > >>>> } > >>>> such that, when user "cat /sys/devices/systems/.../attry_1", > >>>> "Dev X AttributeY B" will show up in the terminal. > >>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. > >>>> > >>>> The bug was that the index to the array was wrongfully produced, > >>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the > >>>> calculation and that is not an upstream fix. > >>>> > >>>>> > >>>>>> the lingering issue > >>>>>> is that the kernel ought to be able to protect itself, as the pointer > >>>>>> contains a non-canonical address. > >>>>> > >>>>> Was the pointer used only by the vsprintf()? > >>>>> Or was it accessed also by another code, please? > >>>> > >>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case. > >>>> No other code uses the OOB pointer, verified both by code examination > >>>> and test. > >>> > >>> So, then the vsprintf() is _the_ point to crash and why should we hide that? > >>> Because of the crash you found the culprit, right? The efault will hide very > >>> important details. > >>> > >>> So to me it sounds like I like this change less and less... > >> > >> What about the existing check > >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > >> return "(efault)"; > >> ? > > > > Because it's _special_. We know that First page is equivalent to a NULL pointer > > and the last one is dedicated for so called error pointers. There are no more > > special exceptions to the addresses in the Linux kernel (I don't talk about > > alignment requirements by the certain architectures). > > > >> In an experiment just to print the raw OOB pointer values, I saw below > >> (the devX attrY stuff are substitutes of the real attributes, other > >> values and strings are verbatim copy from "dmesg"): > >> > >> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E > >> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F > >> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G > >> [ 3002.772349] devX_attrY[29]: (0) (null) > >> [ 3002.772350] devX_attrY[30]: (0) (null) > >> [ 3002.772351] devX_attrY[31]: (0) (null) > >> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval) > >> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval) > >> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval) > >> [ 3002.772355] devX_attrY[35]: (746e75) (einval) > >> > >> where starting from index 29 are all OOB pointers. > >> > >> As you can see, if the OOBs are NULL, "(null)" was printed due to the > >> existing checking, but when the OOBs are turned to non-canonical which > >> is detectable, the fact the pointer value deviates from > >> (ffffffff84d60aee + 4 * sizeof(void *)) > >> evidently shown that the OOBs are detectable. > >> > >> The question then is why should the non-canonical OOBs be treated > >> differently from NULL and ERR_VALUE? > > > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > > to see a bug as early as possible? > > > > If the purpose is to see the bug as early as possible, then getting > "(efault)" from reading sysfs attribute would serve the purpose, right? > > The fact an OOB pointer has already being turned into either NULL or > non-canonical value implies that *if* kernel code other than > vsprintf() references the pointer, it'll crash else where; No, not the case for error pointers and NULL. > but *if* no > other code referencing the pointer, why crash? Because how else you can see the bug?! The trace will give you essential information about registers, etc that gives you a hint what the _cause_ of the crash. And we need that cause. The "(efault)" has not even a bit close to what crash gives us. So, this is my last message in the discussion. Here is a formal NAK. Up to maintainers to decide what to do with this.
On 10/19/2022 12:26 PM, Andy Shevchenko wrote: > On Wed, Oct 19, 2022 at 06:36:07PM +0000, Jane Chu wrote: >> On 10/18/2022 1:49 PM, Andy Shevchenko wrote: >>> On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: >>>> On 10/18/2022 1:07 PM, Andy Shevchenko wrote: >>>>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: >>>>>> On 10/18/2022 5:45 AM, Petr Mladek wrote: >>>>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: >>>>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: >>>>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: >>>>>>>>>> While debugging a separate issue, it was found that an invalid string >>>>>>>>>> pointer could very well contain a non-canical address, such as >>>>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough >>>>>>>>>> to protect the kernel from crashing due to general protection fault >>>>>>>>>> >>>>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>>>>>>>> return "(efault)"; >>>>>>>>>> >>>>>>>>>> So instead, use kern_addr_valid() to validate the string pointer. >>>>>>>>> >>>>>>>>> How did you check that value of the (invalid string) pointer? >>>>>>>>> >>>>>>>> >>>>>>>> In the bug scenario, the invalid string pointer was an out-of-bound >>>>>>>> string pointer. While the OOB referencing is fixed, >>>>>>> >>>>>>> Could you please provide more details about the fixed OOB? >>>>>>> What exact vsprintf()/printk() call was broken and eventually >>>>>>> how it was fixed, please? >>>>>> >>>>>> For sensitive reason, I'd like to avoid mentioning the specific name of >>>>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", >>>>>> and describe the precise nature of the issue. >>>>>> >>>>>> devX_attrY[] is a string array, declared and filled at compile time, >>>>>> like >>>>>> const char const devX_attrY[] = { >>>>>> [ATTRY_A] = "Dev X AttributeY A", >>>>>> [ATTRY_B] = "Dev X AttributeY B", >>>>>> ... >>>>>> [ATTRY_G] = "Dev X AttributeY G", >>>>>> } >>>>>> such that, when user "cat /sys/devices/systems/.../attry_1", >>>>>> "Dev X AttributeY B" will show up in the terminal. >>>>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. >>>>>> >>>>>> The bug was that the index to the array was wrongfully produced, >>>>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the >>>>>> calculation and that is not an upstream fix. >>>>>> >>>>>>> >>>>>>>> the lingering issue >>>>>>>> is that the kernel ought to be able to protect itself, as the pointer >>>>>>>> contains a non-canonical address. >>>>>>> >>>>>>> Was the pointer used only by the vsprintf()? >>>>>>> Or was it accessed also by another code, please? >>>>>> >>>>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case. >>>>>> No other code uses the OOB pointer, verified both by code examination >>>>>> and test. >>>>> >>>>> So, then the vsprintf() is _the_ point to crash and why should we hide that? >>>>> Because of the crash you found the culprit, right? The efault will hide very >>>>> important details. >>>>> >>>>> So to me it sounds like I like this change less and less... >>>> >>>> What about the existing check >>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) >>>> return "(efault)"; >>>> ? >>> >>> Because it's _special_. We know that First page is equivalent to a NULL pointer >>> and the last one is dedicated for so called error pointers. There are no more >>> special exceptions to the addresses in the Linux kernel (I don't talk about >>> alignment requirements by the certain architectures). >>> >>>> In an experiment just to print the raw OOB pointer values, I saw below >>>> (the devX attrY stuff are substitutes of the real attributes, other >>>> values and strings are verbatim copy from "dmesg"): >>>> >>>> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E >>>> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F >>>> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G >>>> [ 3002.772349] devX_attrY[29]: (0) (null) >>>> [ 3002.772350] devX_attrY[30]: (0) (null) >>>> [ 3002.772351] devX_attrY[31]: (0) (null) >>>> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval) >>>> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval) >>>> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval) >>>> [ 3002.772355] devX_attrY[35]: (746e75) (einval) >>>> >>>> where starting from index 29 are all OOB pointers. >>>> >>>> As you can see, if the OOBs are NULL, "(null)" was printed due to the >>>> existing checking, but when the OOBs are turned to non-canonical which >>>> is detectable, the fact the pointer value deviates from >>>> (ffffffff84d60aee + 4 * sizeof(void *)) >>>> evidently shown that the OOBs are detectable. >>>> >>>> The question then is why should the non-canonical OOBs be treated >>>> differently from NULL and ERR_VALUE? >>> >>> Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need >>> to see a bug as early as possible? >>> >> >> If the purpose is to see the bug as early as possible, then getting >> "(efault)" from reading sysfs attribute would serve the purpose, right? >> >> The fact an OOB pointer has already being turned into either NULL or >> non-canonical value implies that *if* kernel code other than >> vsprintf() references the pointer, it'll crash else where; > > No, not the case for error pointers and NULL. Sorry, I don't understand, what about Oops from NUll pointer dereference? > >> but *if* no >> other code referencing the pointer, why crash? > > Because how else you can see the bug?! The trace will give you essential > information about registers, etc that gives you a hint what the _cause_ of the > crash. And we need that cause. The "(efault)" has not even a bit close to what > crash gives us. > > So, this is my last message in the discussion. > > Here is a formal NAK. Up to maintainers to decide what to do with this. > Sigh, but thanks for taking the time articulating your point of view. -jane
On Tue 2022-10-18 23:49:27, Andy Shevchenko wrote: > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: > > On 10/18/2022 1:07 PM, Andy Shevchenko wrote: > > > On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: > > >> On 10/18/2022 5:45 AM, Petr Mladek wrote: > > >>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: > > >>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > > >>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > > >>>>>> While debugging a separate issue, it was found that an invalid string > > >>>>>> pointer could very well contain a non-canical address, such as > > >>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough > > >>>>>> to protect the kernel from crashing due to general protection fault > > >>>>>> > > >>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > >>>>>> return "(efault)"; > > >>>>>> > > >>>>>> So instead, use kern_addr_valid() to validate the string pointer. > > >>>>> > > >>>>> How did you check that value of the (invalid string) pointer? > > >>>>> > > >>>> > > >>>> In the bug scenario, the invalid string pointer was an out-of-bound > > >>>> string pointer. While the OOB referencing is fixed, > > >>> > > >>> Could you please provide more details about the fixed OOB? > > >>> What exact vsprintf()/printk() call was broken and eventually > > >>> how it was fixed, please? > > >> > > >> For sensitive reason, I'd like to avoid mentioning the specific name of > > >> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", > > >> and describe the precise nature of the issue. > > >> > > >> devX_attrY[] is a string array, declared and filled at compile time, > > >> like > > >> const char const devX_attrY[] = { > > >> [ATTRY_A] = "Dev X AttributeY A", > > >> [ATTRY_B] = "Dev X AttributeY B", > > >> ... > > >> [ATTRY_G] = "Dev X AttributeY G", > > >> } > > >> such that, when user "cat /sys/devices/systems/.../attry_1", > > >> "Dev X AttributeY B" will show up in the terminal. > > >> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. > > >> > > >> The bug was that the index to the array was wrongfully produced, > > >> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the > > >> calculation and that is not an upstream fix. I see. printk()/vsprintf() is the only code that accesses this pointer. If vsprintf() survives than the system survives. > > As you can see, if the OOBs are NULL, "(null)" was printed due to the > > existing checking, but when the OOBs are turned to non-canonical which > > is detectable, the fact the pointer value deviates from > > (ffffffff84d60aee + 4 * sizeof(void *)) > > evidently shown that the OOBs are detectable. > > > > The question then is why should the non-canonical OOBs be treated > > differently from NULL and ERR_VALUE? > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > to see a bug as early as possible? I do not agree here. Kernel tries to survive many situations when thighs does not work as expected. It prints a warning so that users/developers are aware of the problem and could fix it. In our case, the crash happened when reading a sysfs file. IMHO, it is much better to show (-EINVAL) than crash. The bug when accessing devX_attrY[] does not affect the stability of the system at all. And the broken string might be passed in a very rare case, e.g. in an error path. So that it might be hard to catch when testing. Best Regards, Petr
On Thu 2022-10-20 09:44:06, Petr Mladek wrote: > On Tue 2022-10-18 23:49:27, Andy Shevchenko wrote: > > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: > > > On 10/18/2022 1:07 PM, Andy Shevchenko wrote: > > > > On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote: > > > >> On 10/18/2022 5:45 AM, Petr Mladek wrote: > > > >>> On Mon 2022-10-17 19:31:53, Jane Chu wrote: > > > >>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote: > > > >>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote: > > > >>>>>> While debugging a separate issue, it was found that an invalid string > > > >>>>>> pointer could very well contain a non-canical address, such as > > > >>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough > > > >>>>>> to protect the kernel from crashing due to general protection fault > > > >>>>>> > > > >>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > > >>>>>> return "(efault)"; > > > >>>>>> > > > >>>>>> So instead, use kern_addr_valid() to validate the string pointer. > > > >>>>> > > > >>>>> How did you check that value of the (invalid string) pointer? > > > >>>>> > > > >>>> > > > >>>> In the bug scenario, the invalid string pointer was an out-of-bound > > > >>>> string pointer. While the OOB referencing is fixed, > > > >>> > > > >>> Could you please provide more details about the fixed OOB? > > > >>> What exact vsprintf()/printk() call was broken and eventually > > > >>> how it was fixed, please? > > > >> > > > >> For sensitive reason, I'd like to avoid mentioning the specific name of > > > >> the sysfs attribute in the bug, instead, just call it "devX_attrY[]", > > > >> and describe the precise nature of the issue. > > > >> > > > >> devX_attrY[] is a string array, declared and filled at compile time, > > > >> like > > > >> const char const devX_attrY[] = { > > > >> [ATTRY_A] = "Dev X AttributeY A", > > > >> [ATTRY_B] = "Dev X AttributeY B", > > > >> ... > > > >> [ATTRY_G] = "Dev X AttributeY G", > > > >> } > > > >> such that, when user "cat /sys/devices/systems/.../attry_1", > > > >> "Dev X AttributeY B" will show up in the terminal. > > > >> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that. > > > >> > > > >> The bug was that the index to the array was wrongfully produced, > > > >> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the > > > >> calculation and that is not an upstream fix. > > I see. printk()/vsprintf() is the only code that accesses this pointer. > If vsprintf() survives than the system survives. > > > > As you can see, if the OOBs are NULL, "(null)" was printed due to the > > > existing checking, but when the OOBs are turned to non-canonical which > > > is detectable, the fact the pointer value deviates from > > > (ffffffff84d60aee + 4 * sizeof(void *)) > > > evidently shown that the OOBs are detectable. > > > > > > The question then is why should the non-canonical OOBs be treated > > > differently from NULL and ERR_VALUE? > > > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > > to see a bug as early as possible? > > I do not agree here. Kernel tries to survive many situations when > thighs does not work as expected. It prints a warning so that > users/developers are aware of the problem and could fix it. > > In our case, the crash happened when reading a sysfs file. > IMHO, it is much better to show (-EINVAL) than crash. The bug > when accessing devX_attrY[] does not affect the stability of > the system at all. > > And the broken string might be passed in a very rare case, > e.g. in an error path. So that it might be hard to catch > when testing. That said, there is definitely a difference between NULL or error code code and a random pointer address. The pointers in ERR_RANGE are likely to stay in this range. It means that this pointer is hardly usable as a security attack. On the other hand, "random" pointer has a bigger chance to be used for a security attack. From this POV, it is more important to catch and fix random pointer issues. And shoving just -EINVAL might not be enough to catch attention. I guess that this was what Andy wanted to explain. And kernel crash would definitely catch attention. Showing some warning with KERN_WARNING or even WARN() might be an alternative. Anyway, I think that this patch is not worth it: + kern_addr_valid() always succeeds on all architectures except on x86_64. It means that the check would help only on x86_64. + kern_addr_valid() always fails on x86 when build with SPARSEMEM. This is not acceptable for vsprintf(). + the situation when only vsprintf() would access the wrong pointer are rare. In most cases, the pointer is later used and the kernel crashes anyway. Best Regards, Petr
On Thu, Oct 20, 2022 at 09:44:05AM +0200, Petr Mladek wrote: > On Tue 2022-10-18 23:49:27, Andy Shevchenko wrote: > > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote: ... > > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need > > to see a bug as early as possible? > > I do not agree here. Kernel tries to survive many situations when > thighs does not work as expected. It prints a warning so that > users/developers are aware of the problem and could fix it. How the user will know what the root cause and how to fix it? The crash report will give all needed information, the "(eXXXXXX)" will hide it all, which I consider inappropriate approach. I.o.w. consider "(eXXXXXX)" vs. something like "your stuff crashed kernel because of misaligned / etc pointer which has value of 0xXXXXXXXX and other registers have these values" and so on, so on... > In our case, the crash happened when reading a sysfs file. > IMHO, it is much better to show (-EINVAL) than crash. The bug > when accessing devX_attrY[] does not affect the stability of > the system at all. When I got "eXXXXX" from cat /sys/... I think "OK, something went wrong, I shouldn't really take it seriously". And completely different feelings when you got a crash, right? > And the broken string might be passed in a very rare case, > e.g. in an error path. So that it might be hard to catch > when testing.
diff --git a/lib/vsprintf.c b/lib/vsprintf.c index c414a8d9f1ea..2e8a9efc7c12 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -695,7 +695,7 @@ static const char *check_pointer_msg(const void *ptr) if (!ptr) return "(null)"; - if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) + if (!kern_addr_valid((unsigned long)ptr)) return "(efault)"; return NULL;