Message ID | 20230929102726.2985188-18-john.g.garry@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp4001638vqu; Fri, 29 Sep 2023 05:49:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFI8RC3KgIXNMEYS4ZwLuK0vFOk9jme+IRav9GC3JyIiKl9X0qniwg6em5RJc5ubacERxlT X-Received: by 2002:a92:cda4:0:b0:351:1d2f:5f99 with SMTP id g4-20020a92cda4000000b003511d2f5f99mr5663731ild.26.1695991754171; Fri, 29 Sep 2023 05:49:14 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1695991754; cv=pass; d=google.com; s=arc-20160816; b=IOnpV5Z+69LXqSQGmdren2qTuSSvw0VsHO4sWLuvRI/0S79+MlS+ns2FtyxeFHXr27 E5SeNm/mGxjonPlGVQs3qSVo2d+7NXzQdM58pQKWWWmmCZngt4noOblGvspjhXtE+N99 W7EbgIP1f2oB+/CuNhV/XfMBguHE6kv2koU7XFuxtPMYY0rP5ejZDnLff0bEaQiSBLri u6tVPiCfM1+K9z/uIfIHT2rQTIrVAW+6H3288Ag4/yTKii/aqKvizjk7wqCT0vqSxw7T bOqTfl+4l1Q/+FZDSJHPa+zpQAuYcGJl+1sgHf2ZWnRWNj/B4BP2RtyF1gtlWOcXFCUp Jqvw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=9b6c13tAVvtzoHZgCJ6dVum0FdVhaxRpl1nF8MNGk0I=; fh=1iRuEHlOF8pPIIm//KnskQClXHAWQgKYJIM9xXRawHY=; b=REYZbCstMBo0QNZMscIXHq6uV1WrdXHkStW0Z2nF+IGsm3eeVcBoHoSJUKSnJ6Os9u sJe14x5oTkhvCe0mvWxzk44Z5Q7bKG91iNB3TyUMUPVEzUfGGQqVuJlWPAx2BjqO4CS3 AoPrj+fTKOmEef4zrgfkNlJTaLScuej6i5Shg0dCYTUTJ8GPj+8SwjjP5JEqHU9+P06U pSE0AjAcIEqbcmQ6W5dqpv7IeUbSC+XL8LGwEfd/nJDlywJd5OmO3ldNFtTN5eKplQWL UTBFvrA5C8n4YkKgirTkhz7UUMza/S/SqcVu/4UnEnuGNXq0G7GnbNPEL6SC0s7pbFi6 T2LA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-03-30 header.b=mI2s4dSQ; dkim=pass header.i=@oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=UyRDeGhe; arc=pass (i=1 spf=pass spfdomain=oracle.com dkim=pass dkdomain=oracle.com dmarc=pass fromdomain=oracle.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id u69-20020a638548000000b00578b4980785si21237348pgd.36.2023.09.29.05.49.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 05:49:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-03-30 header.b=mI2s4dSQ; dkim=pass header.i=@oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=UyRDeGhe; arc=pass (i=1 spf=pass spfdomain=oracle.com dkim=pass dkdomain=oracle.com dmarc=pass fromdomain=oracle.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 009DE8366672; Fri, 29 Sep 2023 03:32:03 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233255AbjI2KaL (ORCPT <rfc822;pwkd43@gmail.com> + 20 others); Fri, 29 Sep 2023 06:30:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233132AbjI2K3d (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 29 Sep 2023 06:29:33 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 605D8CD0; Fri, 29 Sep 2023 03:29:06 -0700 (PDT) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38T1C5Xk018309; Fri, 29 Sep 2023 10:28:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2023-03-30; bh=9b6c13tAVvtzoHZgCJ6dVum0FdVhaxRpl1nF8MNGk0I=; b=mI2s4dSQf/pyE/4niR0TSWn72bcyYI69IjURsoVDuj+j2rnbMVfWyOWcdXqAIdWRIhNK wq/hs2/l+rToJSMOKYkPz6A02pjMphUxR+KLr/cNziF4HaHZ6PWwmlnl+mHRITpTJVWR UlGUVjxH8m9IKv8nD7QAdm7C72KDNm764qv80NBOI9OAVN4mNOF2FPxXftU+eCM1tc5T aPBKUYyUIvuDDQJHWxuEtAuWzwoAQ5WoCv6wzCIGjTC+AA8QEpHainxri8sSVOoWeY1R UW7uRK2xlgFgzxz17iEPEtAQLHzcXKQwjsiXkEAuih4II0u7U6/litcMAhjESPZoawxH dA== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3t9qmupe0p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 29 Sep 2023 10:28:50 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 38T9iSOL015821; Fri, 29 Sep 2023 10:28:49 GMT Received: from nam10-bn7-obe.outbound.protection.outlook.com (mail-bn7nam10lp2102.outbound.protection.outlook.com [104.47.70.102]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3t9pfh4vtc-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 29 Sep 2023 10:28:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=izS1bUDazAXyv51Cka9AF+2hs5rZrWqu/hwYB4TUFwdPrJ5rffkGr0yLb5QwziDqer0E06DUHvJUHd9dM8mwFSokiTTSRpxfoW4DKo8hlc8xqLWPF37Wg68nnPjnxHY2LHImIZZgZp4V/BOHsI1ZBnS1i2w+ddy/td9no+Dh805bFw3AA6TbrTfBA3dS/O51WRsnPsvH1fogK7T8M8tVcZS1db3UDSntnkf31ddBPb/GdC8gWZFv0TJXQ0II8KxafFpT2sXqcBLuSiH7O1cTD4AlM7O9UixNY89MeHFASq/TweuIzB+ov2TJpClwjhDqj9/E6cB/KqPl7bNVIv1XxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9b6c13tAVvtzoHZgCJ6dVum0FdVhaxRpl1nF8MNGk0I=; b=iTDkk9WmNL0bwBrFCsTzqoUfuglnFTmgCodtvJTUiTMFub00Hz6SfJbm9x1AtJ6O1aJUPwpGv4A4pGD39XTcm4a9eJaeTHwdVR0+/9FhautImObeFqX8hWOrAdug3XTUsXckpX2Ofrn5/+kmJ1Sh/Yj5CtxPGqijr1H3q4o8vUsX/SJNfQi1uw5Xiflw3AhVmqgEtLp8LEjkIjeokLduMZwkh4kkRWRL55HDcCl4J+ynRiwSLtCPd+Rx3bLUWVu9XpJ8vJjclBQ70z7nxY/Ii14RZ20tSFK5uE5n8v6Gsjzssg/ZcuC/3AyoEKP8ec6gwEX9AY4R8Og+65hMu1GLfw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9b6c13tAVvtzoHZgCJ6dVum0FdVhaxRpl1nF8MNGk0I=; b=UyRDeGhe0q0/83XHac4FZpPcJKU2TtUabgNxh6QK7JT9UfHpVy6yqjpIqlLoQnOSlqT68G2ccJR00gigTqH/UfAgbjwY47C0hq3bk+aJch3ksWtPVSkRHmeErRExhygwEy4nEGO+iPueK9ERGXJjdKxzoChxYtC0a94SwNsX1DM= Received: from DM6PR10MB4313.namprd10.prod.outlook.com (2603:10b6:5:212::20) by PH7PR10MB6153.namprd10.prod.outlook.com (2603:10b6:510:1f7::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Fri, 29 Sep 2023 10:28:34 +0000 Received: from DM6PR10MB4313.namprd10.prod.outlook.com ([fe80::ebfd:c49c:6b8:6fce]) by DM6PR10MB4313.namprd10.prod.outlook.com ([fe80::ebfd:c49c:6b8:6fce%7]) with mapi id 15.20.6813.027; Fri, 29 Sep 2023 10:28:34 +0000 From: John Garry <john.g.garry@oracle.com> To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, chandan.babu@oracle.com, dchinner@redhat.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-api@vger.kernel.org, John Garry <john.g.garry@oracle.com> Subject: [PATCH 17/21] fs: xfs: iomap atomic write support Date: Fri, 29 Sep 2023 10:27:22 +0000 Message-Id: <20230929102726.2985188-18-john.g.garry@oracle.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230929102726.2985188-1-john.g.garry@oracle.com> References: <20230929102726.2985188-1-john.g.garry@oracle.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: MN2PR13CA0034.namprd13.prod.outlook.com (2603:10b6:208:160::47) To DM6PR10MB4313.namprd10.prod.outlook.com (2603:10b6:5:212::20) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6PR10MB4313:EE_|PH7PR10MB6153:EE_ X-MS-Office365-Filtering-Correlation-Id: d8497fe7-4294-4c72-a6ad-08dbc0d6d569 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fKy9qvKGSWBDCNx4BarVUvpD9SzsjuA7RjZSMOA7sXI2dcx+hepyOxq0K+hrO4YrRBqFhoB2ISIEJL5+oOvD0QY3+1NYRK1PtnGLMA8xo6fahWSZjzsi7PhwzNCZTSNeTU3ZWQxJ2+lxAK0kyt5QNJVfrOdge2R8+llzbImFz0lHe0nhKi0ZZJK/pI6tXsViKatcqaQBeQo1bpiuQ0txIY9SNML31cz+RbI+86u9TMVdJ34yvY6fEhe3v1y6eDVbFA8cqwI7VhebaTjW9cA9A02xJVyeD+FNzHjPGNaxxlpVZ8StPs23As3w85NMawuC85tRR3TORfvuek3jLo1j/A+k7ZBZ3UBGMi8hVmwxCYn/suzpoi8PDZvBNKara+2J4aN6Tu5VcdUhT82eX2XOLN+6f20v1JyxRO8pS7yiNKiGXyY5X8s3FFR4AACvj3NowGPW07OqTju3OYDOHPbKKsD6i7vhXQLuf50aVqjo70RRQbCZL3hPEuUO3ax7xeY+W425NkDwwvaZ9HmKfj7USrz/hF2dGejmRYzXAq+fQQbPO/Vy+Aa1i+OYGyJnybYb6VqaXXQNCq+s4t/Ew47XBga/fge/w+79M4354DxkTKg= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4313.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(39860400002)(366004)(346002)(376002)(396003)(230922051799003)(186009)(64100799003)(1800799009)(451199024)(6506007)(6512007)(921005)(107886003)(1076003)(26005)(2616005)(8936002)(4326008)(36756003)(103116003)(66556008)(7416002)(2906002)(5660300002)(41300700001)(66946007)(86362001)(316002)(8676002)(66476007)(6666004)(478600001)(38100700002)(6486002)(83380400001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Y8Ze9QeWGFkIJU4JlDjmSCRv86n52LqQOCBgjJv3Efrok5GZZJWBdilcRXT6TJyL86I4qYXnAo+ubSsf1uMu9Mc7VpmC71AM78AoCGkEiAIqvvG/eQzaxFzx86OZPxsru9RYU3+d3rdx7TXw0rRVoXI9XMoGFrn0VVr/gcA2Wba1VIQumcKqvK7EkRQkLrottCHD57LNagZ9ijchZVevA8tHqVuMlRMGK1hQt782DO7Y2huJ4ZVK7lsMQ1SJBFZTYu1cwxtAZgKkbrsxTZKsxiwW8OX8xoAmgh5akz6bGEMVH3FKAhmJz6nUnU5P99rhEHGbIoJ5rMnrpf0wiijxdQO66DLzPigMfnsnCwRJlVQozkpSuJ0iZ8oBN/K8FOivev4zQ24k3tn/rawGJTCOuayklD12goZT9jFMSmYqpROoMouCG//nM+YY09dBSgLJIs+ajg+oyuUR8JP6AWBEn0eNvnznu39ZZXVIQsicCsRZxj+MgLEGk6EyNDtG9Ov9ykumowRWvn1IwmmlU4zLPwoflUY9MMCyKcl4VZNJGD6uh+r3hV8CRs6+BgZkjJGDDHPMWC1pRE/bXowdZbbXNQNStjWd+qB0gbkGKvInWpBKocVBty5OmFYhr1w3Q2Rtb0+n7Lfl0HPp3qTNeWlg+jmbgTJ8A0it0/S/LzuMhIQGcu10vovIO0TPu5UrwQHOASg0gHmOe3lYp75kZEaHoAFxSm/LlzR2aFzS3xenhvEEiiFnXfcfzRamOV1Cl0tZlppc60dDNOHr2TQcim72iurJmjPPSe+KXmQasxUdlKCrJe4FrHVWfcyPRvvBPewYoYZH7G5PTm7MRk8EVYZvxHcMqzFeID1LooJGczGrz2Wyqmjd0b2NiXae2/BvLSD9QT7jF5ei67GYt9RTxHRX754r8KceQ/yeSlJd7/n9kjzQwFZiDKa859NHlAMtHcIK9Zgbfmdu4aMZkZ+8sEzsFCgWeSiUo0M9kVXy0BrkTfwodQR5vSmefRiEOHJfCrcNZuxxgjGWT1DF8qYZd3Pv2efnkq5P6xw59IUmzGaBw1WQBxdNdmFg41067dqNDqKBdCb2I9Pey7WFQOKCftT21/uwo2s2DINHx64QPYerHma9StPTjLS2UtUsFMPRNvZeB8sUKZ2MkbQpZrnKQHcYsTvl8hy39Q+T5hMeAxSRnKltYknThUAihTVBm/gEvRgw221DnraQJQHAG1DTr5oCxlHCsvT4mJW0mqUcb8ag14Q/aah3Zh6moQsmbhWJmN3+oS/LMjk8sx2e4ppcYdW6/ksfiU48bJ4HGUd00qcPs8STgtH9VCUs9dy6PI4IsgP/YkssST9xoe6D/SsEvwBlpTJZeTdX21TPTxC8M2rie9cIPgStJGFe8V/hwoW3vYUH3pcMianrQH+NhdHPV8aW2MulCeXYKuyCgK8bzaffvI8pljDGfOKjPejh+DzUZ9wJr8pW42bOegC1ppAe5wUoH2YVqkweaOXBGX12VU1UOHdGJGx7p64Ad8Jk1EZnlONm2QEnLEVf+e59+OplJ4RcPH5y/jBadqPv4uSAA5JOIJ+7IClRlcqnLxGMUzHHV2Jm4hdjZ+KqGFSBgE2dsejL3A== X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: TpIdIMvPJKn3oF4RDxjWFhAcvNTxRG82iR2Bso7tmmIbt/iS1supHRa0OzR6J/GCTqDFUfTDmYb5ebPW8Fp8s5fWtXzBJVDL19jwTJmtWDKHH0LZ8t7LLoQtYBfsRPTYl7yo3v5f7AWl1QbCZ28R1Rk4CQAlsmMi2RO7WAt65NGYovkfYRBfWgI/hpYKXkdEC3HgmBJfwE+lMPJDH/rMTux2p80rYEiKN6wdlFpa96cl3VMvqN876BXEGSvwek+Xbkg7BK3UxdHyAtHMIaQTaN/JCgROPf/1fwiO07kC7bRr3gnkixpHHKJV9igdNzugrD4FY+gLUGWvz/4Lak1ibMfPqz8cBZzcFs/t1wZ8K+/4iN9tU6dNHvbHHsqDTdG27Om99pMQbs1bMHsbJlrmKBpEJiCCgiSz400lft1cbh6KT3uTmE/5+dL3gwQGegjLXTouICpd/dXNmQXI909HcEpAGpBdOXSIFTybAFAeAaUy+cIXoRnMeUVEntHoJV9zb5MMHlwuSxcg2hcQ7U8bIrFO7VZs9Xycp8YJrMISYXH5dtKwV21Geo82wTZax2ukjFGYf49rimd5idA23K51tCKSnp5u1FneG/IP6f031EZsl6Fo51OTbIOQKj7ch0OUKUbdariguQEeCJxMQS+pimT4P6+CeJWk3M81euc6eWwHA7/mfh59xxoDPFE8jwU//0XWZT3FE6efuxU5GPrN7R111G2Ry4ESIjKvacRXAs3lObvg70BBjsx7IaEExNmVFa4ySQ0m9DMzWGExD7P6/Y0u2aT4QMpPKrQ/Fi2oPwrWeXxUFnNGhbaq9Xijj7xhgRX95UXQRbrX0blakCdtTZLJNcRCoaatx/mdKwoSNEdAoqz+woo0bMeQKpR5vLxW6nQe/1P519lPtE0FbOQ/CGXfMUn38ighOAM9me40GpJG7r035oXnFx1dQ1x/+3QMPmrKlPkOmATDbUjNOiv/kJKn91/4phS/KTrLuXdwef+0A4T5A9PQpeq2e1VItIES7VHMnZH/FuWzGlHVJmzooR9VaOujYtWEyKQ76KVE0WHdGM0pu9MprCB6mWYElKjiGY+GoJbhp3eFIurYdsUjkQ== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: d8497fe7-4294-4c72-a6ad-08dbc0d6d569 X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4313.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Sep 2023 10:28:34.0346 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: esquxsa4veJJfdniqEOXoAaSAXtbupiu8ukks5j5MnAobazRj3tKwthK3q2swuPJFfkC6zwY0f1u4KE7MsKWpQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR10MB6153 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-09-29_08,2023-09-28_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 spamscore=0 mlxscore=0 malwarescore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2309290090 X-Proofpoint-GUID: NrC-vvlxbzBnVYD5evesSzlbbMJJVpsS X-Proofpoint-ORIG-GUID: NrC-vvlxbzBnVYD5evesSzlbbMJJVpsS X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 29 Sep 2023 03:32:03 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778376249383126641 X-GMAIL-MSGID: 1778376249383126641 |
Series |
block atomic writes
|
|
Commit Message
John Garry
Sept. 29, 2023, 10:27 a.m. UTC
Ensure that when creating a mapping that we adhere to all the atomic
write rules.
We check that the mapping covers the complete range of the write to ensure
that we'll be just creating a single mapping.
Currently minimum granularity is the FS block size, but it should be
possibly to support lower in future.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
Comments
On Fri, Sep 29, 2023 at 10:27:22AM +0000, John Garry wrote: > Ensure that when creating a mapping that we adhere to all the atomic > write rules. > > We check that the mapping covers the complete range of the write to ensure > that we'll be just creating a single mapping. > > Currently minimum granularity is the FS block size, but it should be > possibly to support lower in future. I really dislike how this forces aligned allocations. Aligned allocations are a nice optimization to offload some of the work to the storage hard/firmware, but we need to support it in general. And I think with out of place writes into the COW fork, and atomic transactions to swap it in we can do that pretty easily. That should also allow to get rid of the horrible forcealign mode, as we can still try align if possible and just fall back to the out of place writes.
On 09/11/2023 15:26, Christoph Hellwig wrote: > On Fri, Sep 29, 2023 at 10:27:22AM +0000, John Garry wrote: >> Ensure that when creating a mapping that we adhere to all the atomic >> write rules. >> >> We check that the mapping covers the complete range of the write to ensure >> that we'll be just creating a single mapping. >> >> Currently minimum granularity is the FS block size, but it should be >> possibly to support lower in future. > I really dislike how this forces aligned allocations. Aligned > allocations are a nice optimization to offload some of the work > to the storage hard/firmware, but we need to support it in general. > And I think with out of place writes into the COW fork, and atomic > transactions to swap it in we can do that pretty easily. > > That should also allow to get rid of the horrible forcealign mode, > as we can still try align if possible and just fall back to the > out of place writes. > > How could we try to align? Do you mean that we try to align up to some stage in the block allocator search? That seems like some middle ground between no alignment and forcealign. And what would we be aligning to? Thanks, John
Hi Christoph, >>> >>> Currently minimum granularity is the FS block size, but it should be >>> possibly to support lower in future. >> I really dislike how this forces aligned allocations. Aligned >> allocations are a nice optimization to offload some of the work >> to the storage hard/firmware, but we need to support it in general. >> And I think with out of place writes into the COW fork, and atomic >> transactions to swap it in we can do that pretty easily. >> >> That should also allow to get rid of the horrible forcealign mode, >> as we can still try align if possible and just fall back to the >> out of place writes. >> Can you try to explain your idea a bit more? This is blocking us. Are you suggesting some sort of hybrid between the atomic write series you had a few years ago and this solution? To me that would be continuing with the following: - per-IO RWF_ATOMIC (and not O_ATOMIC semantics of nothing is written until some data sync) - writes must be a power-of-two and at a naturally-aligned offset - relying on atomic write HW support always But for extents which are misaligned, we CoW to a new extent? I suppose we would align that extent to alignment of the write (i.e. length of write). BTW, we also have rtvol support which does not use forcealign as it already can guarantee alignment, but still does rely on the same principle of requiring alignment - would you want CoW support there also? Thanks, John
On Tue, Nov 28, 2023 at 08:56:37AM +0000, John Garry wrote: > Are you suggesting some sort of hybrid between the atomic write series you > had a few years ago and this solution? Very roughly, yes. > To me that would be continuing with the following: > - per-IO RWF_ATOMIC (and not O_ATOMIC semantics of nothing is written until > some data sync) Yes. > - writes must be a power-of-two and at a naturally-aligned offset Where offset is offset in the file? It would not require it. You probably want to do it for optimal performance, but requiring it feeels rather limited. > - relying on atomic write HW support always And I think that's where we have different opinions. I think the hw offload is a nice optimization and we should use it wherever we can. But building the entire userspace API around it feels like a mistake. > BTW, we also have rtvol support which does not use forcealign as it already > can guarantee alignment, but still does rely on the same principle of > requiring alignment - would you want CoW support there also? Upstream doesn't have out of place write support for the RT subvolume yet. But Darrick has a series for it and we're actively working on upstreaming it.
On 28/11/2023 13:56, Christoph Hellwig wrote: > On Tue, Nov 28, 2023 at 08:56:37AM +0000, John Garry wrote: >> Are you suggesting some sort of hybrid between the atomic write series you >> had a few years ago and this solution? > Very roughly, yes. > >> To me that would be continuing with the following: >> - per-IO RWF_ATOMIC (and not O_ATOMIC semantics of nothing is written until >> some data sync) > Yes. > >> - writes must be a power-of-two and at a naturally-aligned offset > Where offset is offset in the file? ok, fine, it would not be required for XFS with CoW. Some concerns still: a. device atomic write boundary, if any b. other FSes which do not have CoW support. ext4 is already being used for "atomic writes" in the field - see dubious amazon torn-write prevention. About b., we could add the pow-of-2 and file offset alignment requirement for other FSes, but then need to add some method to advertise that restriction. > It would not require it. You > probably want to do it for optimal performance, but requiring it > feeels rather limited. > >> - relying on atomic write HW support always > And I think that's where we have different opinions. I'm just trying to understand your idea and that is not necessarily my final opinion. > I think the hw > offload is a nice optimization and we should use it wherever we can. Sure, but to me it is a concern that we have 2x paths to make robust a. offload via hw, which may involve CoW b. no HW support, i.e. CoW always And for no HW support, if we don't follow the O_ATOMIC model of committing nothing until a SYNC is issued, would we allocate, write, and later free a new extent for each write, right? > But building the entire userspace API around it feels like a mistake. > ok, but FWIW it works for the usecases which we know. >> BTW, we also have rtvol support which does not use forcealign as it already >> can guarantee alignment, but still does rely on the same principle of >> requiring alignment - would you want CoW support there also? > Upstream doesn't have out of place write support for the RT subvolume > yet. But Darrick has a series for it and we're actively working on > upstreaming it. Yeah, I thought that I heard this. Thanks, John
> b. other FSes which do not have CoW support. ext4 is already being > used for "atomic writes" in the field We also need raw block device access to work within the constraints required by the hardware. >> probably want to do it for optimal performance, but requiring it >> feeels rather limited. The application developers we are working with generally prefer an error when things are not aligned properly. Predictable performance is key. Removing the performance variability of doing double writes is the reason for supporting atomics in the first place. I think there is value in providing a more generic (file-centric) atomic user API. And I think the I/O stack plumbing we provide would be useful in supporting such an endeavor. But I am not convinced that atomic operations in general should be limited to the couple of filesystems that can do CoW.
On Tue, Nov 28, 2023 at 05:42:10PM +0000, John Garry wrote: > ok, fine, it would not be required for XFS with CoW. Some concerns still: > a. device atomic write boundary, if any > b. other FSes which do not have CoW support. ext4 is already being used for > "atomic writes" in the field - see dubious amazon torn-write prevention. What is the 'dubious amazon torn-write prevention'? > About b., we could add the pow-of-2 and file offset alignment requirement > for other FSes, but then need to add some method to advertise that > restriction. We really need a better way to communicate I/O limitations anyway. Something like XFS_IOC_DIOINFO on steroids. > Sure, but to me it is a concern that we have 2x paths to make robust a. > offload via hw, which may involve CoW b. no HW support, i.e. CoW always Relying just on the hardware seems very limited, especially as there is plenty of hardware that won't guarantee anything larger than 4k, and plenty of NVMe hardware without has some other small limit like 32k because it doesn't support multiple atomicy mode. > And for no HW support, if we don't follow the O_ATOMIC model of committing > nothing until a SYNC is issued, would we allocate, write, and later free a > new extent for each write, right? Yes. Then again if you do data journalling you do that anyway, and as one little project I'm doing right now shows that data journling is often the fastest thing we can do for very small writes.
On 04/12/2023 13:45, Christoph Hellwig wrote: > On Tue, Nov 28, 2023 at 05:42:10PM +0000, John Garry wrote: >> ok, fine, it would not be required for XFS with CoW. Some concerns still: >> a. device atomic write boundary, if any >> b. other FSes which do not have CoW support. ext4 is already being used for >> "atomic writes" in the field - see dubious amazon torn-write prevention. > > What is the 'dubious amazon torn-write prevention'? https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html AFAICS, this is without any kernel changes, so no guarantee of unwanted splitting or merging of bios. Anyway, there will still be !CoW FSes which people want to support. > >> About b., we could add the pow-of-2 and file offset alignment requirement >> for other FSes, but then need to add some method to advertise that >> restriction. > > We really need a better way to communicate I/O limitations anyway. > Something like XFS_IOC_DIOINFO on steroids. > >> Sure, but to me it is a concern that we have 2x paths to make robust a. >> offload via hw, which may involve CoW b. no HW support, i.e. CoW always > > Relying just on the hardware seems very limited, especially as there is > plenty of hardware that won't guarantee anything larger than 4k, and > plenty of NVMe hardware without has some other small limit like 32k > because it doesn't support multiple atomicy mode. So what would you propose as the next step? Would it to be first achieve atomic write support for XFS with HW support + CoW to ensure contiguous extents (and without XFS forcealign)? > >> And for no HW support, if we don't follow the O_ATOMIC model of committing >> nothing until a SYNC is issued, would we allocate, write, and later free a >> new extent for each write, right? > > Yes. Then again if you do data journalling you do that anyway, and as > one little project I'm doing right now shows that data journling is > often the fastest thing we can do for very small writes. Ignoring FSes, then how is this supposed to work for block devices? We just always need HW support, right? Thanks, John
On Mon, Dec 04, 2023 at 03:19:15PM +0000, John Garry wrote: > On 04/12/2023 13:45, Christoph Hellwig wrote: >> On Tue, Nov 28, 2023 at 05:42:10PM +0000, John Garry wrote: >>> ok, fine, it would not be required for XFS with CoW. Some concerns still: >>> a. device atomic write boundary, if any >>> b. other FSes which do not have CoW support. ext4 is already being used for >>> "atomic writes" in the field - see dubious amazon torn-write prevention. >> >> What is the 'dubious amazon torn-write prevention'? > > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html > > AFAICS, this is without any kernel changes, so no guarantee of unwanted > splitting or merging of bios. > > Anyway, there will still be !CoW FSes which people want to support. Ugg, so they badly reimplement NVMe atomic write support and use it without software stack enablement. Calling it dubious is way to gentle.. >> Relying just on the hardware seems very limited, especially as there is >> plenty of hardware that won't guarantee anything larger than 4k, and >> plenty of NVMe hardware without has some other small limit like 32k >> because it doesn't support multiple atomicy mode. > > So what would you propose as the next step? Would it to be first achieve > atomic write support for XFS with HW support + CoW to ensure contiguous > extents (and without XFS forcealign)? I think the very first priority is just block device support without any fs enablement. We just need to make sure the API isn't too limited for additional use cases. > Ignoring FSes, then how is this supposed to work for block devices? We just > always need HW support, right? Yes.
On 04/12/2023 15:39, Christoph Hellwig wrote: >> So what would you propose as the next step? Would it to be first achieve >> atomic write support for XFS with HW support + CoW to ensure contiguous >> extents (and without XFS forcealign)? > I think the very first priority is just block device support without > any fs enablement. We just need to make sure the API isn't too limited > for additional use cases. Sounds ok
On Mon, Dec 04, 2023 at 03:19:15PM +0000, John Garry wrote: > > > > What is the 'dubious amazon torn-write prevention'? > > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html > > AFAICS, this is without any kernel changes, so no guarantee of unwanted > splitting or merging of bios. Well, more than one company has audited the kernel paths, and it turns out that for selected Kernel versions, after doing desk-check verification of the relevant kernel baths, as well as experimental verification via testing to try to find torn writes in the kernel, we can make it safe for specific kernel versions which might be used in hosted MySQL instances where we control the kernel, the mysql server, and the emulated block device (and we know the database is doing Direct I/O writes --- this won't work for PostgreSQL). I gave a talk about this at Google I/O Next '18, five years ago[1]. [1] https://www.youtube.com/watch?v=gIeuiGg-_iw Given the performance gains (see the talk (see the comparison of the at time 19:31 and at 29:57) --- it's quite compelling. Of course, I wouldn't recommend this approach for a naive sysadmin, since most database adminsitrators won't know how to audit kernel code (see the discussion at time 35:10 of the video), and reverify the entire software stack before every kernel upgrade. The challenge is how to do this safely. The fact remains that both Amazon's EBS and Google's Persistent Disk products are implemented in such a way that writes will not be torn below the virtual machine, and the guarantees are in fact quite a bit stronger than what we will probably end up advertising via NVMe and/or SCSI. It wouldn't surprise me if this is the case (or could be made to be the case) For Oracle Cloud as well. The question is how to make this guarantee so that the kernel knows when various cloud-provided block devicse do provide these greater guarantees, and then how to make it be an architected feature, as opposed to a happy implementation detail that has to be verified at every kernel upgrade. Cheers, - Ted
On 05/12/2023 04:55, Theodore Ts'o wrote: >> AFAICS, this is without any kernel changes, so no guarantee of unwanted >> splitting or merging of bios. > Well, more than one company has audited the kernel paths, and it turns > out that for selected Kernel versions, after doing desk-check > verification of the relevant kernel baths, as well as experimental > verification via testing to try to find torn writes in the kernel, we > can make it safe for specific kernel versions which might be used in > hosted MySQL instances where we control the kernel, the mysql server, > and the emulated block device (and we know the database is doing > Direct I/O writes --- this won't work for PostgreSQL). I gave a talk > about this at Google I/O Next '18, five years ago[1]. > > [1]https://urldefense.com/v3/__https://www.youtube.com/watch?v=gIeuiGg-_iw__;!!ACWV5N9M2RV99hQ!I4iRp4xUyzAT0UwuEcnUBBCPKLXFKfk5FNmysFbKcQYfl0marAll5xEEVyB5mMFDqeckCWLmjU1aCR2Z$ > > Given the performance gains (see the talk (see the comparison of the > at time 19:31 and at 29:57) --- it's quite compelling. > > Of course, I wouldn't recommend this approach for a naive sysadmin, > since most database adminsitrators won't know how to audit kernel code > (see the discussion at time 35:10 of the video), and reverify the > entire software stack before every kernel upgrade. Sure > The challenge is > how to do this safely. Right, and that is why I would be concerned about advertising torn-write protection support, but someone has not gone through the effort of auditing and verification phase to ensure that this does not happen in their software stack ever. > > The fact remains that both Amazon's EBS and Google's Persistent Disk > products are implemented in such a way that writes will not be torn > below the virtual machine, and the guarantees are in fact quite a bit > stronger than what we will probably end up advertising via NVMe and/or > SCSI. It wouldn't surprise me if this is the case (or could be made > to be the case) For Oracle Cloud as well. > > The question is how to make this guarantee so that the kernel knows > when various cloud-provided block devicse do provide these greater > guarantees, and then how to make it be an architected feature, as > opposed to a happy implementation detail that has to be verified at > every kernel upgrade. The kernel can only judge atomic write support from what the HW product data tells us, so cloud-provided block devices need to provide that information as best possible if emulating the some storage technology. Thanks, John
On Mon, Dec 04, 2023 at 03:19:15PM +0000, John Garry wrote: > On 04/12/2023 13:45, Christoph Hellwig wrote: > > On Tue, Nov 28, 2023 at 05:42:10PM +0000, John Garry wrote: > > > ok, fine, it would not be required for XFS with CoW. Some concerns still: > > > a. device atomic write boundary, if any > > > b. other FSes which do not have CoW support. ext4 is already being used for > > > "atomic writes" in the field - see dubious amazon torn-write prevention. > > > > What is the 'dubious amazon torn-write prevention'? > > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html > > AFAICS, this is without any kernel changes, so no guarantee of unwanted > splitting or merging of bios. > > Anyway, there will still be !CoW FSes which people want to support. > > > > > > About b., we could add the pow-of-2 and file offset alignment requirement > > > for other FSes, but then need to add some method to advertise that > > > restriction. > > > > We really need a better way to communicate I/O limitations anyway. > > Something like XFS_IOC_DIOINFO on steroids. > > > > > Sure, but to me it is a concern that we have 2x paths to make robust a. > > > offload via hw, which may involve CoW b. no HW support, i.e. CoW always > > > > Relying just on the hardware seems very limited, especially as there is > > plenty of hardware that won't guarantee anything larger than 4k, and > > plenty of NVMe hardware without has some other small limit like 32k > > because it doesn't support multiple atomicy mode. > > So what would you propose as the next step? Would it to be first achieve > atomic write support for XFS with HW support + CoW to ensure contiguous > extents (and without XFS forcealign)? > > > > > > And for no HW support, if we don't follow the O_ATOMIC model of committing > > > nothing until a SYNC is issued, would we allocate, write, and later free a > > > new extent for each write, right? > > > > Yes. Then again if you do data journalling you do that anyway, and as > > one little project I'm doing right now shows that data journling is > > often the fastest thing we can do for very small writes. > > Ignoring FSes, then how is this supposed to work for block devices? We just > always need HW support, right? Looks the HW support could be minimized, just like what Google and Amazon did, 16KB physical block size with proper queue limit setting. Now seems it is easy to make such device with ublk-loop by: - use one backing disk with 16KB/32KB/.. physical block size - expose proper physical bs & chunk_sectors & max sectors queue limit Then any 16KB aligned direct WRITE with N*16KB length(N in [1, 8] with 256 chunk_sectors) can be atomic-write. Thanks, Ming
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 70fe873951f3..3424fcfc04f5 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -783,6 +783,7 @@ xfs_direct_write_iomap_begin( { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; + struct xfs_sb *m_sb = &mp->m_sb; struct xfs_bmbt_irec imap, cmap; xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); @@ -814,6 +815,41 @@ xfs_direct_write_iomap_begin( if (error) goto out_unlock; + if (flags & IOMAP_ATOMIC_WRITE) { + xfs_filblks_t unit_min_fsb, unit_max_fsb; + + xfs_ip_atomic_write_attr(ip, &unit_min_fsb, &unit_max_fsb); + + if (!imap_spans_range(&imap, offset_fsb, end_fsb)) { + error = -EIO; + goto out_unlock; + } + + if (offset % m_sb->sb_blocksize || + length % m_sb->sb_blocksize) { + error = -EIO; + goto out_unlock; + } + + if (imap.br_blockcount == unit_min_fsb || + imap.br_blockcount == unit_max_fsb) { + /* min and max must be a power-of-2 */ + } else if (imap.br_blockcount < unit_min_fsb || + imap.br_blockcount > unit_max_fsb) { + error = -EIO; + goto out_unlock; + } else if (!is_power_of_2(imap.br_blockcount)) { + error = -EIO; + goto out_unlock; + } + + if (imap.br_startoff && + imap.br_startoff % imap.br_blockcount) { + error = -EIO; + goto out_unlock; + } + } + if (imap_needs_cow(ip, flags, &imap, nimaps)) { error = -EAGAIN; if (flags & IOMAP_NOWAIT)