[RFC] fix csum_and_copy_..._user() idiocy. Re: AW: [PATCH] amd64: Fix csum_partial_copy_generic()

Message ID 20231022194020.GA972254@ZenIV
State New
Headers
Series [RFC] fix csum_and_copy_..._user() idiocy. Re: AW: [PATCH] amd64: Fix csum_partial_copy_generic() |

Commit Message

Al Viro Oct. 22, 2023, 7:40 p.m. UTC
  On Sat, Oct 21, 2023 at 11:22:03PM +0100, Al Viro wrote:
> On Sat, Oct 21, 2023 at 08:15:25AM +0100, Al Viro wrote:
> 
> > I don't think -rc7 is a good time for that, though.  At the
> > very least it needs a review on linux-arch - I think I hadn't
> > fucked the ABI for returning u64 up, but...
> > 
> > Anyway, completely untested patch follows:
> 
> ... and now something that at least builds (with some brainos fixed); it's still
> slightly suboptimal representation on big-endian 32bit - there it would be better to
> have have the csum in upper half of the 64bit getting returned and use the lower
> half as fault indicator, but dealing with that cleanly takes some massage of
> includes in several places, so I'd left that alone for now.  In any case, the
> overhead of that is pretty much noise.

OK, here's what I have in mind for the next cycle.  It's still untested (builds,
but that's it).  Conversion to including <net/checksum.h> rather than
<asm/checksum.h> is going to get carved out into a separate patch.
I _think_ I've got the asm parts (and ABI for returning 64bit) right,
but I would really appreciate review and testing.


Fix the csum_and_copy_..._user() idiocy

We need a way for csum_and_copy_{from,to}_user() to report faults.
The approach taken back in 2020 (avoid 0 as return value by starting
summing from ~0U, use 0 to report faults) had been broken; it does
yield the right value modulo 2^16-1, but the case when data is
entirely zero-filled is not handled right.  It almost works, since
for most of the codepaths we have a non-zero value added in
and there 0 is not different from anything divisible by 0xffff.
However, there are cases (ICMPv4 replies, for example) where we
are not guaranteed that.

In other words, we really need to have those primitives return 0
on filled-with-zeroes input.  So let's make them return a 64bit
value instead; we can do that cheaply (all supported architectures
do that via a couple of registers) and we can use that to report
faults without disturbing the 32bit csum.

New type: __wsum_fault.  64bit, returned by csum_and_copy_..._user().
Primitives:
	* CSUM_FAULT representing the fault
	* to_wsum_fault() folding __wsum value into that
	* from_wsum_fault() extracting __wsum value
	* wsum_fault_check() checking if it's a fault value

Representation depends upon the target.
	CSUM_FAULT: ~0ULL
	to_wsum_fault(v32): (u64)v32 for 64bit and 32bit l-e,
(u64)v32 << 32 for 32bit b-e.

Rationale: relationship between the calling conventions for returning 64bit 
and those for returning 32bit values.  On 64bit architectures the same
register is used; on 32bit l-e the lower half of the value goes in the
same register that is used for returning 32bit values and the upper half
goes into additional register.  On 32bit b-e the opposite happens -
upper 32 bits go into the register used for returning 32bit values and
the lower 32 bits get stuffed into additional register.

So with this choice of representation we need minimal changes on the
asm side (zero an extra register in 32bit case, nothing in 64bit case),
and from_wsum_fault() is as cheap as it gets.

Sum calculation is back to "start from 0".

X-paperbag: brown
Fixes: c693cc4676a0 "saner calling conventions for csum_and_copy_..._user()"
Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: gus Gusenleitner Klaus <gus@keba.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
  

Comments

Al Viro Oct. 22, 2023, 7:46 p.m. UTC | #1
On Sun, Oct 22, 2023 at 08:40:20PM +0100, Al Viro wrote:
> On Sat, Oct 21, 2023 at 11:22:03PM +0100, Al Viro wrote:
> > On Sat, Oct 21, 2023 at 08:15:25AM +0100, Al Viro wrote:
> > 
> > > I don't think -rc7 is a good time for that, though.  At the
> > > very least it needs a review on linux-arch - I think I hadn't
> > > fucked the ABI for returning u64 up, but...
> > > 
> > > Anyway, completely untested patch follows:
> > 
> > ... and now something that at least builds (with some brainos fixed); it's still
> > slightly suboptimal representation on big-endian 32bit - there it would be better to
> > have have the csum in upper half of the 64bit getting returned and use the lower
> > half as fault indicator, but dealing with that cleanly takes some massage of
> > includes in several places, so I'd left that alone for now.  In any case, the
> > overhead of that is pretty much noise.
> 
> OK, here's what I have in mind for the next cycle.  It's still untested (builds,
> but that's it).  Conversion to including <net/checksum.h> rather than
> <asm/checksum.h> is going to get carved out into a separate patch.
> I _think_ I've got the asm parts (and ABI for returning 64bit) right,
> but I would really appreciate review and testing.

Gyah....  What I got wrong is the lib/iov_iter.c part - check for faults is
backwards ;-/  Hopefully fixed variant follows:

diff --git a/arch/alpha/include/asm/asm-prototypes.h b/arch/alpha/include/asm/asm-prototypes.h
index c8ae46fc2e74..0cf90f406d00 100644
--- a/arch/alpha/include/asm/asm-prototypes.h
+++ b/arch/alpha/include/asm/asm-prototypes.h
@@ -1,10 +1,10 @@
 #include <linux/spinlock.h>
 
-#include <asm/checksum.h>
 #include <asm/console.h>
 #include <asm/page.h>
 #include <asm/string.h>
 #include <linux/uaccess.h>
+#include <net/checksum.h> // for csum_ipv6_magic()
 
 #include <asm-generic/asm-prototypes.h>
 
diff --git a/arch/alpha/include/asm/checksum.h b/arch/alpha/include/asm/checksum.h
index 99d631e146b2..d3abe290ae4e 100644
--- a/arch/alpha/include/asm/checksum.h
+++ b/arch/alpha/include/asm/checksum.h
@@ -43,7 +43,7 @@ extern __wsum csum_partial(const void *buff, int len, __wsum sum);
  */
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 #define _HAVE_ARCH_CSUM_AND_COPY
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len);
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len);
 
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len);
 
diff --git a/arch/alpha/lib/csum_partial_copy.c b/arch/alpha/lib/csum_partial_copy.c
index 4d180d96f09e..5f75fcd19287 100644
--- a/arch/alpha/lib/csum_partial_copy.c
+++ b/arch/alpha/lib/csum_partial_copy.c
@@ -52,7 +52,7 @@ __asm__ __volatile__("insqh %1,%2,%0":"=r" (z):"r" (x),"r" (y))
 	__guu_err;					\
 })
 
-static inline unsigned short from64to16(unsigned long x)
+static inline __wsum_fault from64to16(unsigned long x)
 {
 	/* Using extract instructions is a bit more efficient
 	   than the original shift/bitmask version.  */
@@ -72,7 +72,7 @@ static inline unsigned short from64to16(unsigned long x)
 			+ (unsigned long) tmp_v.us[2];
 
 	/* Similarly, out_v.us[2] is always zero for the final add.  */
-	return out_v.us[0] + out_v.us[1];
+	return to_wsum_fault((__force __wsum)(out_v.us[0] + out_v.us[1]));
 }
 
 
@@ -80,17 +80,17 @@ static inline unsigned short from64to16(unsigned long x)
 /*
  * Ok. This isn't fun, but this is the EASY case.
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
 			 long len)
 {
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 	unsigned long carry = 0;
 
 	while (len >= 0) {
 		unsigned long word;
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		checksum += carry;
 		src++;
 		checksum += word;
@@ -104,7 +104,7 @@ csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
 	if (len) {
 		unsigned long word, tmp;
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		tmp = *dst;
 		mskql(word, len, word);
 		checksum += word;
@@ -113,14 +113,14 @@ csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
 		*dst = word | tmp;
 		checksum += carry;
 	}
-	return checksum;
+	return from64to16 (checksum);
 }
 
 /*
  * This is even less fun, but this is still reasonably
  * easy.
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 			      unsigned long *dst,
 			      unsigned long soff,
@@ -129,16 +129,16 @@ csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 	unsigned long first;
 	unsigned long word, carry;
 	unsigned long lastsrc = 7+len+(unsigned long)src;
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 
 	if (__get_word(ldq_u, first,src))
-		return 0;
+		return CSUM_FAULT;
 	carry = 0;
 	while (len >= 0) {
 		unsigned long second;
 
 		if (__get_word(ldq_u, second, src+1))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		len -= 8;
 		src++;
@@ -157,7 +157,7 @@ csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 		unsigned long tmp;
 		unsigned long second;
 		if (__get_word(ldq_u, second, lastsrc))
-			return 0;
+			return CSUM_FAULT;
 		tmp = *dst;
 		extql(first, soff, word);
 		extqh(second, soff, first);
@@ -169,13 +169,13 @@ csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 		*dst = word | tmp;
 		checksum += carry;
 	}
-	return checksum;
+	return from64to16 (checksum);
 }
 
 /*
  * This is slightly less fun than the above..
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_src_aligned(const unsigned long __user *src,
 			     unsigned long *dst,
 			     unsigned long doff,
@@ -185,12 +185,12 @@ csum_partial_cfu_src_aligned(const unsigned long __user *src,
 	unsigned long carry = 0;
 	unsigned long word;
 	unsigned long second_dest;
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 
 	mskql(partial_dest, doff, partial_dest);
 	while (len >= 0) {
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		len -= 8;
 		insql(word, doff, second_dest);
 		checksum += carry;
@@ -205,7 +205,7 @@ csum_partial_cfu_src_aligned(const unsigned long __user *src,
 	if (len) {
 		checksum += carry;
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		mskql(word, len, word);
 		len -= 8;
 		checksum += word;
@@ -226,14 +226,14 @@ csum_partial_cfu_src_aligned(const unsigned long __user *src,
 	stq_u(partial_dest | second_dest, dst);
 out:
 	checksum += carry;
-	return checksum;
+	return from64to16 (checksum);
 }
 
 /*
  * This is so totally un-fun that it's frightening. Don't
  * look at this too closely, you'll go blind.
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_unaligned(const unsigned long __user * src,
 			   unsigned long * dst,
 			   unsigned long soff, unsigned long doff,
@@ -242,10 +242,10 @@ csum_partial_cfu_unaligned(const unsigned long __user * src,
 	unsigned long carry = 0;
 	unsigned long first;
 	unsigned long lastsrc;
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 
 	if (__get_word(ldq_u, first, src))
-		return 0;
+		return CSUM_FAULT;
 	lastsrc = 7+len+(unsigned long)src;
 	mskql(partial_dest, doff, partial_dest);
 	while (len >= 0) {
@@ -253,7 +253,7 @@ csum_partial_cfu_unaligned(const unsigned long __user * src,
 		unsigned long second_dest;
 
 		if (__get_word(ldq_u, second, src+1))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		checksum += carry;
 		len -= 8;
@@ -275,7 +275,7 @@ csum_partial_cfu_unaligned(const unsigned long __user * src,
 		unsigned long second_dest;
 
 		if (__get_word(ldq_u, second, lastsrc))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		extqh(second, soff, first);
 		word |= first;
@@ -297,7 +297,7 @@ csum_partial_cfu_unaligned(const unsigned long __user * src,
 		unsigned long second_dest;
 
 		if (__get_word(ldq_u, second, lastsrc))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		extqh(second, soff, first);
 		word |= first;
@@ -310,22 +310,21 @@ csum_partial_cfu_unaligned(const unsigned long __user * src,
 		stq_u(partial_dest | word | second_dest, dst);
 		checksum += carry;
 	}
-	return checksum;
+	return from64to16 (checksum);
 }
 
-static __wsum __csum_and_copy(const void __user *src, void *dst, int len)
+static __wsum_fault __csum_and_copy(const void __user *src, void *dst, int len)
 {
 	unsigned long soff = 7 & (unsigned long) src;
 	unsigned long doff = 7 & (unsigned long) dst;
-	unsigned long checksum;
 
 	if (!doff) {
 		if (!soff)
-			checksum = csum_partial_cfu_aligned(
+			return csum_partial_cfu_aligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst, len-8);
 		else
-			checksum = csum_partial_cfu_dest_aligned(
+			return csum_partial_cfu_dest_aligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst,
 				soff, len-8);
@@ -333,31 +332,28 @@ static __wsum __csum_and_copy(const void __user *src, void *dst, int len)
 		unsigned long partial_dest;
 		ldq_u(partial_dest, dst);
 		if (!soff)
-			checksum = csum_partial_cfu_src_aligned(
+			return csum_partial_cfu_src_aligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst,
 				doff, len-8, partial_dest);
 		else
-			checksum = csum_partial_cfu_unaligned(
+			return csum_partial_cfu_unaligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst,
 				soff, doff, len-8, partial_dest);
 	}
-	return (__force __wsum)from64to16 (checksum);
 }
 
-__wsum
-csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return __csum_and_copy(src, dst, len);
 }
 
-__wsum
-csum_partial_copy_nocheck(const void *src, void *dst, int len)
+__wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return __csum_and_copy((__force const void __user *)src,
-						dst, len);
+	return from_wsum_fault(__csum_and_copy((__force const void __user *)src,
+						dst, len));
 }
 EXPORT_SYMBOL(csum_partial_copy_nocheck);
diff --git a/arch/arm/include/asm/checksum.h b/arch/arm/include/asm/checksum.h
index d8a13959bff0..2b9bddb50967 100644
--- a/arch/arm/include/asm/checksum.h
+++ b/arch/arm/include/asm/checksum.h
@@ -38,16 +38,16 @@ __wsum csum_partial(const void *buff, int len, __wsum sum);
 __wsum
 csum_partial_copy_nocheck(const void *src, void *dst, int len);
 
-__wsum
+__wsum_fault
 csum_partial_copy_from_user(const void __user *src, void *dst, int len);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 #define _HAVE_ARCH_CSUM_AND_COPY
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return -1;
 
 	return csum_partial_copy_from_user(src, dst, len);
 }
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 82e96ac83684..d076a5c8556f 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -14,7 +14,7 @@
 #include <linux/io.h>
 #include <linux/arm-smccc.h>
 
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/ftrace.h>
 
 /*
diff --git a/arch/arm/lib/csumpartialcopygeneric.S b/arch/arm/lib/csumpartialcopygeneric.S
index 0fd5c10e90a7..5db935eaf165 100644
--- a/arch/arm/lib/csumpartialcopygeneric.S
+++ b/arch/arm/lib/csumpartialcopygeneric.S
@@ -86,7 +86,7 @@ sum	.req	r3
 
 FN_ENTRY
 		save_regs
-		mov	sum, #-1
+		mov	sum, #0
 
 		cmp	len, #8			@ Ensure that we have at least
 		blo	.Lless8			@ 8 bytes to copy.
@@ -160,6 +160,7 @@ FN_ENTRY
 		ldr	sum, [sp, #0]		@ dst
 		tst	sum, #1
 		movne	r0, r0, ror #8
+		mov	r1, #0
 		load_regs
 
 .Lsrc_not_aligned:
diff --git a/arch/arm/lib/csumpartialcopyuser.S b/arch/arm/lib/csumpartialcopyuser.S
index 6928781e6bee..f273c9667914 100644
--- a/arch/arm/lib/csumpartialcopyuser.S
+++ b/arch/arm/lib/csumpartialcopyuser.S
@@ -73,11 +73,11 @@
 #include "csumpartialcopygeneric.S"
 
 /*
- * We report fault by returning 0 csum - impossible in normal case, since
- * we start with 0xffffffff for initial sum.
+ * We report fault by returning ~0ULL csum
  */
 		.pushsection .text.fixup,"ax"
 		.align	4
-9001:		mov	r0, #0
+9001:		mov	r0, #-1
+		mov	r1, #-1
 		load_regs
 		.popsection
diff --git a/arch/ia64/include/asm/asm-prototypes.h b/arch/ia64/include/asm/asm-prototypes.h
index a96689447a74..41d23ab53ff4 100644
--- a/arch/ia64/include/asm/asm-prototypes.h
+++ b/arch/ia64/include/asm/asm-prototypes.h
@@ -3,7 +3,7 @@
 #define _ASM_IA64_ASM_PROTOTYPES_H
 
 #include <asm/cacheflush.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/esi.h>
 #include <asm/ftrace.h>
 #include <asm/page.h>
diff --git a/arch/m68k/include/asm/checksum.h b/arch/m68k/include/asm/checksum.h
index 692e7b6cc042..2adef06feeb3 100644
--- a/arch/m68k/include/asm/checksum.h
+++ b/arch/m68k/include/asm/checksum.h
@@ -32,7 +32,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 #define _HAVE_ARCH_CSUM_AND_COPY
-extern __wsum csum_and_copy_from_user(const void __user *src,
+extern __wsum_fault csum_and_copy_from_user(const void __user *src,
 						void *dst,
 						int len);
 
diff --git a/arch/m68k/lib/checksum.c b/arch/m68k/lib/checksum.c
index 5acb821849d3..4fed9070e976 100644
--- a/arch/m68k/lib/checksum.c
+++ b/arch/m68k/lib/checksum.c
@@ -128,7 +128,7 @@ EXPORT_SYMBOL(csum_partial);
  * copy from user space while checksumming, with exception handling.
  */
 
-__wsum
+__wsum_fault
 csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	/*
@@ -137,7 +137,7 @@ csum_and_copy_from_user(const void __user *src, void *dst, int len)
 	 * code.
 	 */
 	unsigned long tmp1, tmp2;
-	__wsum sum = ~0U;
+	__wsum sum = 0;
 
 	__asm__("movel %2,%4\n\t"
 		"btst #1,%4\n\t"	/* Check alignment */
@@ -240,7 +240,7 @@ csum_and_copy_from_user(const void __user *src, void *dst, int len)
 		".even\n"
 		/* If any exception occurs, return 0 */
 	     "90:\t"
-		"clrl %0\n"
+		"moveq #1,%5\n"
 		"jra 7b\n"
 		".previous\n"
 		".section __ex_table,\"a\"\n"
@@ -262,7 +262,7 @@ csum_and_copy_from_user(const void __user *src, void *dst, int len)
 		: "0" (sum), "1" (len), "2" (src), "3" (dst)
 	    );
 
-	return sum;
+	return tmp2 ? CSUM_FAULT : to_wsum_fault(sum);
 }
 
 
diff --git a/arch/microblaze/kernel/microblaze_ksyms.c b/arch/microblaze/kernel/microblaze_ksyms.c
index c892e173ec99..e5858b15cd37 100644
--- a/arch/microblaze/kernel/microblaze_ksyms.c
+++ b/arch/microblaze/kernel/microblaze_ksyms.c
@@ -10,7 +10,7 @@
 #include <linux/in6.h>
 #include <linux/syscalls.h>
 
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/cacheflush.h>
 #include <linux/io.h>
 #include <asm/page.h>
diff --git a/arch/mips/include/asm/asm-prototypes.h b/arch/mips/include/asm/asm-prototypes.h
index 8e8fc38b0941..44fd0c30fd73 100644
--- a/arch/mips/include/asm/asm-prototypes.h
+++ b/arch/mips/include/asm/asm-prototypes.h
@@ -1,11 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#include <asm/checksum.h>
 #include <asm/page.h>
 #include <asm/fpu.h>
 #include <asm-generic/asm-prototypes.h>
 #include <linux/uaccess.h>
 #include <asm/ftrace.h>
 #include <asm/mmu_context.h>
+#include <net/checksum.h>
 
 extern void clear_page_cpu(void *page);
 extern void copy_page_cpu(void *to, void *from);
diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
index 4044eaf989ac..3dfe9eca4adc 100644
--- a/arch/mips/include/asm/checksum.h
+++ b/arch/mips/include/asm/checksum.h
@@ -34,16 +34,16 @@
  */
 __wsum csum_partial(const void *buff, int len, __wsum sum);
 
-__wsum __csum_partial_copy_from_user(const void __user *src, void *dst, int len);
-__wsum __csum_partial_copy_to_user(const void *src, void __user *dst, int len);
+__wsum_fault __csum_partial_copy_from_user(const void __user *src, void *dst, int len);
+__wsum_fault __csum_partial_copy_to_user(const void *src, void __user *dst, int len);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	might_fault();
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return __csum_partial_copy_from_user(src, dst, len);
 }
 
@@ -52,11 +52,11 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
  */
 #define HAVE_CSUM_COPY_USER
 static inline
-__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
+__wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
 	might_fault();
 	if (!access_ok(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	return __csum_partial_copy_to_user(src, dst, len);
 }
 
@@ -65,10 +65,10 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
  * we have just one address space, so this is identical to the above)
  */
 #define _HAVE_ARCH_CSUM_AND_COPY
-__wsum __csum_partial_copy_nocheck(const void *src, void *dst, int len);
+__wsum_fault __csum_partial_copy_nocheck(const void *src, void *dst, int len);
 static inline __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return __csum_partial_copy_nocheck(src, dst, len);
+	return from_wsum_fault(__csum_partial_copy_nocheck(src, dst, len));
 }
 
 /*
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 3d2ff4118d79..b0cda2950f4e 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -437,7 +437,7 @@ EXPORT_SYMBOL(csum_partial)
 
 	.macro __BUILD_CSUM_PARTIAL_COPY_USER mode, from, to
 
-	li	sum, -1
+	move	sum, zero
 	move	odd, zero
 	/*
 	 * Note: dst & src may be unaligned, len may be 0
@@ -723,6 +723,9 @@ EXPORT_SYMBOL(csum_partial)
 1:
 #endif
 	.set	pop
+#ifndef CONFIG_64BIT
+	move	v1, zero
+#endif
 	.set reorder
 	jr	ra
 	.set noreorder
@@ -730,8 +733,11 @@ EXPORT_SYMBOL(csum_partial)
 
 	.set noreorder
 .L_exc:
+#ifndef CONFIG_64BIT
+	li	v1, -1
+#endif
 	jr	ra
-	 li	v0, 0
+	 li	v0, -1
 
 FEXPORT(__csum_partial_copy_nocheck)
 EXPORT_SYMBOL(__csum_partial_copy_nocheck)
diff --git a/arch/openrisc/kernel/or32_ksyms.c b/arch/openrisc/kernel/or32_ksyms.c
index 212e5f85004c..a56dea4411ab 100644
--- a/arch/openrisc/kernel/or32_ksyms.c
+++ b/arch/openrisc/kernel/or32_ksyms.c
@@ -22,7 +22,7 @@
 
 #include <asm/processor.h>
 #include <linux/uaccess.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/io.h>
 #include <asm/hardirq.h>
 #include <asm/delay.h>
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 274bce76f5da..c283183e5a81 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -11,7 +11,7 @@
 
 #include <linux/threads.h>
 #include <asm/cacheflush.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/uaccess.h>
 #include <asm/epapr_hcalls.h>
 #include <asm/dcr.h>
diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 4b573a3b7e17..b68184dfac00 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -18,18 +18,18 @@
  * Like csum_partial, this must be called with even lengths,
  * except for the last fragment.
  */
-extern __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+extern __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
-extern __wsum csum_and_copy_from_user(const void __user *src, void *dst,
+extern __wsum_fault csum_and_copy_from_user(const void __user *src, void *dst,
 				      int len);
 #define HAVE_CSUM_COPY_USER
-extern __wsum csum_and_copy_to_user(const void *src, void __user *dst,
+extern __wsum_fault csum_and_copy_to_user(const void *src, void __user *dst,
 				    int len);
 
 #define _HAVE_ARCH_CSUM_AND_COPY
 #define csum_partial_copy_nocheck(src, dst, len)   \
-        csum_partial_copy_generic((src), (dst), (len))
+        from_wsum_fault(csum_partial_copy_generic((src), (dst), (len)))
 
 
 /*
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index cd00b9bdd772..03f63f36aeba 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -122,7 +122,7 @@ LG_CACHELINE_BYTES = L1_CACHE_SHIFT
 CACHELINE_MASK = (L1_CACHE_BYTES-1)
 
 _GLOBAL(csum_partial_copy_generic)
-	li	r12,-1
+	li	r12,0
 	addic	r0,r0,0			/* clear carry */
 	addi	r6,r4,-4
 	neg	r0,r4
@@ -233,12 +233,14 @@ _GLOBAL(csum_partial_copy_generic)
 	slwi	r0,r0,8
 	adde	r12,r12,r0
 66:	addze	r3,r12
+	li	r4,0
 	beqlr+	cr7
 	rlwinm	r3,r3,8,0,31	/* odd destination address: rotate one byte */
 	blr
 
 fault:
-	li	r3,0
+	li	r3,-1
+	li	r4,-1
 	blr
 
 	EX_TABLE(70b, fault);
diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
index d53d8f09a2c2..3bbfeb98d256 100644
--- a/arch/powerpc/lib/checksum_64.S
+++ b/arch/powerpc/lib/checksum_64.S
@@ -208,7 +208,7 @@ EXPORT_SYMBOL(__csum_partial)
  * csum_partial_copy_generic(r3=src, r4=dst, r5=len)
  */
 _GLOBAL(csum_partial_copy_generic)
-	li	r6,-1
+	li	r6,0
 	addic	r0,r6,0			/* clear carry */
 
 	srdi.	r6,r5,3			/* less than 8 bytes? */
@@ -406,7 +406,7 @@ dstnr;	stb	r6,0(r4)
 	ld	r16,STK_REG(R16)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 .Lerror_nr:
-	li	r3,0
+	li	r3,-1
 	blr
 
 EXPORT_SYMBOL(csum_partial_copy_generic)
diff --git a/arch/powerpc/lib/checksum_wrappers.c b/arch/powerpc/lib/checksum_wrappers.c
index 1a14c8780278..a23482b9f3f0 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -8,16 +8,16 @@
 #include <linux/export.h>
 #include <linux/compiler.h>
 #include <linux/types.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/uaccess.h>
 
-__wsum csum_and_copy_from_user(const void __user *src, void *dst,
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst,
 			       int len)
 {
-	__wsum csum;
+	__wsum_fault csum;
 
 	if (unlikely(!user_read_access_begin(src, len)))
-		return 0;
+		return CSUM_FAULT;
 
 	csum = csum_partial_copy_generic((void __force *)src, dst, len);
 
@@ -25,12 +25,12 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst,
 	return csum;
 }
 
-__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
+__wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
-	__wsum csum;
+	__wsum_fault csum;
 
 	if (unlikely(!user_write_access_begin(dst, len)))
-		return 0;
+		return CSUM_FAULT;
 
 	csum = csum_partial_copy_generic(src, (void __force *)dst, len);
 
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 05e51666db03..9bfd434d31e6 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -28,7 +28,7 @@
 #include <asm/cpcmd.h>
 #include <asm/ebcdic.h>
 #include <asm/sclp.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/debug.h>
 #include <asm/abs_lowcore.h>
 #include <asm/os_info.h>
diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
index 6e1824141b29..44447e3fef84 100644
--- a/arch/s390/kernel/os_info.c
+++ b/arch/s390/kernel/os_info.c
@@ -12,7 +12,7 @@
 #include <linux/crash_dump.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/abs_lowcore.h>
 #include <asm/os_info.h>
 #include <asm/maccess.h>
diff --git a/arch/sh/include/asm/checksum_32.h b/arch/sh/include/asm/checksum_32.h
index 2b5fa75b4651..94464451fd08 100644
--- a/arch/sh/include/asm/checksum_32.h
+++ b/arch/sh/include/asm/checksum_32.h
@@ -31,7 +31,7 @@ asmlinkage __wsum csum_partial(const void *buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 
-asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+asmlinkage __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 #define _HAVE_ARCH_CSUM_AND_COPY
 /*
@@ -44,15 +44,15 @@ asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len)
 static inline
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic((__force const void *)src, dst, len);
 }
 
@@ -193,12 +193,12 @@ static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
  *	Copy and checksum to user
  */
 #define HAVE_CSUM_COPY_USER
-static inline __wsum csum_and_copy_to_user(const void *src,
+static inline __wsum_fault csum_and_copy_to_user(const void *src,
 					   void __user *dst,
 					   int len)
 {
 	if (!access_ok(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic(src, (__force void *)dst, len);
 }
 #endif /* __ASM_SH_CHECKSUM_H */
diff --git a/arch/sh/kernel/sh_ksyms_32.c b/arch/sh/kernel/sh_ksyms_32.c
index 5858936cb431..ce9d5547ac74 100644
--- a/arch/sh/kernel/sh_ksyms_32.c
+++ b/arch/sh/kernel/sh_ksyms_32.c
@@ -4,7 +4,7 @@
 #include <linux/uaccess.h>
 #include <linux/delay.h>
 #include <linux/mm.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/sections.h>
 
 EXPORT_SYMBOL(memchr);
diff --git a/arch/sh/lib/checksum.S b/arch/sh/lib/checksum.S
index 3e07074e0098..2d624efc4c2d 100644
--- a/arch/sh/lib/checksum.S
+++ b/arch/sh/lib/checksum.S
@@ -193,7 +193,7 @@ unsigned int csum_partial_copy_generic (const char *src, char *dst, int len)
 ! r6:	int LEN
 !
 ENTRY(csum_partial_copy_generic)
-	mov	#-1,r7
+	mov	#0,r7
 	mov	#3,r0		! Check src and dest are equally aligned
 	mov	r4,r1
 	and	r0,r1
@@ -358,8 +358,10 @@ EXC(	mov.b	r0,@r5	)
 .section .fixup, "ax"							
 
 6001:
+	mov	#-1,r1
 	rts
-	 mov	#0,r0
+	 mov	#-1,r0
 .previous
+	mov	#0,r1
 	rts
 	 mov	r7,r0
diff --git a/arch/sparc/include/asm/asm-prototypes.h b/arch/sparc/include/asm/asm-prototypes.h
index 4987c735ff56..e15661bf8b36 100644
--- a/arch/sparc/include/asm/asm-prototypes.h
+++ b/arch/sparc/include/asm/asm-prototypes.h
@@ -4,7 +4,7 @@
  */
 
 #include <asm/xor.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/trap_block.h>
 #include <linux/uaccess.h>
 #include <asm/atomic.h>
diff --git a/arch/sparc/include/asm/checksum_32.h b/arch/sparc/include/asm/checksum_32.h
index ce11e0ad80c7..751b89d827aa 100644
--- a/arch/sparc/include/asm/checksum_32.h
+++ b/arch/sparc/include/asm/checksum_32.h
@@ -50,7 +50,7 @@ csum_partial_copy_nocheck(const void *src, void *dst, int len)
 
 	__asm__ __volatile__ (
 		"call __csum_partial_copy_sparc_generic\n\t"
-		" mov -1, %%g7\n"
+		" clr %%g7\n"
 	: "=&r" (ret), "=&r" (d), "=&r" (l)
 	: "0" (ret), "1" (d), "2" (l)
 	: "o2", "o3", "o4", "o5", "o7",
@@ -59,20 +59,50 @@ csum_partial_copy_nocheck(const void *src, void *dst, int len)
 	return (__force __wsum)ret;
 }
 
-static inline __wsum
+static inline __wsum_fault
 csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
+	register unsigned int ret asm("o0") = (unsigned int)src;
+	register char *d asm("o1") = dst;
+	register int l asm("g1") = len; // used to return an error
+
 	if (unlikely(!access_ok(src, len)))
-		return 0;
-	return csum_partial_copy_nocheck((__force void *)src, dst, len);
+		return CSUM_FAULT;
+
+	__asm__ __volatile__ (
+		"call __csum_partial_copy_sparc_generic\n\t"
+		" clr %%g7\n"
+	: "=&r" (ret), "=&r" (d), "=&r" (l)
+	: "0" (ret), "1" (d), "2" (l)
+	: "o2", "o3", "o4", "o5", "o7",
+	  "g2", "g3", "g4", "g5", "g7",
+	  "memory", "cc");
+	if (unlikely(l < 0))
+		return CSUM_FAULT;
+	return to_wsum_fault((__force __wsum)ret);
 }
 
-static inline __wsum
+static inline __u64
 csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
-	if (!access_ok(dst, len))
-		return 0;
-	return csum_partial_copy_nocheck(src, (__force void *)dst, len);
+	register unsigned int ret asm("o0") = (unsigned int)src;
+	register char *d asm("o1") = (__force void *)dst;
+	register int l asm("g1") = len; // used to return an error
+
+	if (unlikely(!access_ok(dst, len)))
+		return CSUM_FAULT;
+
+	__asm__ __volatile__ (
+		"call __csum_partial_copy_sparc_generic\n\t"
+		" clr %%g7\n"
+	: "=&r" (ret), "=&r" (d), "=&r" (l)
+	: "0" (ret), "1" (d), "2" (l)
+	: "o2", "o3", "o4", "o5", "o7",
+	  "g2", "g3", "g4", "g5", "g7",
+	  "memory", "cc");
+	if (unlikely(l < 0))
+		return CSUM_FAULT;
+	return to_wsum_fault((__force __wsum)ret);
 }
 
 /* ihl is always 5 or greater, almost always is 5, and iph is word aligned
diff --git a/arch/sparc/include/asm/checksum_64.h b/arch/sparc/include/asm/checksum_64.h
index d6b59461e064..0e3041ca384b 100644
--- a/arch/sparc/include/asm/checksum_64.h
+++ b/arch/sparc/include/asm/checksum_64.h
@@ -39,8 +39,8 @@ __wsum csum_partial(const void * buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len);
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len);
-__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len);
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len);
+__wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len);
 
 /* ihl is always 5 or greater, almost always is 5, and iph is word aligned
  * the majority of the time.
diff --git a/arch/sparc/lib/checksum_32.S b/arch/sparc/lib/checksum_32.S
index 84ad709cbecb..546968db199d 100644
--- a/arch/sparc/lib/checksum_32.S
+++ b/arch/sparc/lib/checksum_32.S
@@ -453,5 +453,5 @@ ccslow:	cmp	%g1, 0
  * we only bother with faults on loads... */
 
 cc_fault:
-	ret
-	 clr	%o0
+	retl
+	 mov -1, %g1
diff --git a/arch/sparc/lib/csum_copy.S b/arch/sparc/lib/csum_copy.S
index f968e83bc93b..9312d51367d3 100644
--- a/arch/sparc/lib/csum_copy.S
+++ b/arch/sparc/lib/csum_copy.S
@@ -71,7 +71,7 @@
 FUNC_NAME:		/* %o0=src, %o1=dst, %o2=len */
 	LOAD(prefetch, %o0 + 0x000, #n_reads)
 	xor		%o0, %o1, %g1
-	mov		-1, %o3
+	clr		%o3
 	clr		%o4
 	andcc		%g1, 0x3, %g0
 	bne,pn		%icc, 95f
diff --git a/arch/sparc/lib/csum_copy_from_user.S b/arch/sparc/lib/csum_copy_from_user.S
index b0ba8d4dd439..d74241692f0f 100644
--- a/arch/sparc/lib/csum_copy_from_user.S
+++ b/arch/sparc/lib/csum_copy_from_user.S
@@ -9,7 +9,7 @@
 	.section .fixup, "ax";	\
 	.align 4;		\
 99:	retl;			\
-	 mov	0, %o0;		\
+	 mov	-1, %o0;	\
 	.section __ex_table,"a";\
 	.align 4;		\
 	.word 98b, 99b;		\
diff --git a/arch/sparc/lib/csum_copy_to_user.S b/arch/sparc/lib/csum_copy_to_user.S
index 91ba36dbf7d2..2878a933d7ab 100644
--- a/arch/sparc/lib/csum_copy_to_user.S
+++ b/arch/sparc/lib/csum_copy_to_user.S
@@ -9,7 +9,7 @@
 	.section .fixup,"ax";	\
 	.align 4;		\
 99:	retl;			\
-	 mov	0, %o0;		\
+	 mov	-1, %o0;	\
 	.section __ex_table,"a";\
 	.align 4;		\
 	.word 98b, 99b;		\
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index b1a98fa38828..655e25745349 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -4,7 +4,7 @@
 #include <linux/pgtable.h>
 #include <asm/string.h>
 #include <asm/page.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/mce.h>
 
 #include <asm-generic/asm-prototypes.h>
diff --git a/arch/x86/include/asm/checksum_32.h b/arch/x86/include/asm/checksum_32.h
index 17da95387997..65ca3448e83d 100644
--- a/arch/x86/include/asm/checksum_32.h
+++ b/arch/x86/include/asm/checksum_32.h
@@ -27,7 +27,7 @@ asmlinkage __wsum csum_partial(const void *buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 
-asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+asmlinkage __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 /*
  *	Note: when you get a NULL pointer exception here this means someone
@@ -38,17 +38,17 @@ asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len)
  */
 static inline __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 
-static inline __wsum csum_and_copy_from_user(const void __user *src,
+static inline __wsum_fault csum_and_copy_from_user(const void __user *src,
 					     void *dst, int len)
 {
-	__wsum ret;
+	__wsum_fault ret;
 
 	might_sleep();
 	if (!user_access_begin(src, len))
-		return 0;
+		return CSUM_FAULT;
 	ret = csum_partial_copy_generic((__force void *)src, dst, len);
 	user_access_end();
 
@@ -168,15 +168,15 @@ static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
 /*
  *	Copy and checksum to user
  */
-static inline __wsum csum_and_copy_to_user(const void *src,
+static inline __wsum_fault csum_and_copy_to_user(const void *src,
 					   void __user *dst,
 					   int len)
 {
-	__wsum ret;
+	__wsum_fault ret;
 
 	might_sleep();
 	if (!user_access_begin(dst, len))
-		return 0;
+		return CSUM_FAULT;
 
 	ret = csum_partial_copy_generic(src, (__force void *)dst, len);
 	user_access_end();
diff --git a/arch/x86/include/asm/checksum_64.h b/arch/x86/include/asm/checksum_64.h
index 4d4a47a3a8ab..23c56eef8e47 100644
--- a/arch/x86/include/asm/checksum_64.h
+++ b/arch/x86/include/asm/checksum_64.h
@@ -129,10 +129,10 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
 extern __wsum csum_partial(const void *buff, int len, __wsum sum);
 
 /* Do not call this directly. Use the wrappers below */
-extern __visible __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+extern __visible __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
-extern __wsum csum_and_copy_from_user(const void __user *src, void *dst, int len);
-extern __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len);
+extern __wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len);
+extern __wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len);
 extern __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len);
 
 /**
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 23318c338db0..ab58a528d846 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -262,7 +262,7 @@ unsigned int csum_partial_copy_generic (const char *src, char *dst,
 
 #define EXC(y...)						\
 	9999: y;						\
-	_ASM_EXTABLE_TYPE(9999b, 7f, EX_TYPE_UACCESS | EX_FLAG_CLEAR_AX)
+	_ASM_EXTABLE_TYPE(9999b, 9f, EX_TYPE_UACCESS)
 
 #ifndef CONFIG_X86_USE_PPRO_CHECKSUM
 
@@ -278,7 +278,7 @@ SYM_FUNC_START(csum_partial_copy_generic)
 	movl ARGBASE+4(%esp),%esi	# src
 	movl ARGBASE+8(%esp),%edi	# dst
 
-	movl $-1, %eax			# sum
+	xorl %eax,%eax			# sum
 	testl $2, %edi			# Check alignment. 
 	jz 2f				# Jump if alignment is ok.
 	subl $2, %ecx			# Alignment uses up two bytes.
@@ -357,12 +357,17 @@ EXC(	movb %cl, (%edi)	)
 6:	addl %ecx, %eax
 	adcl $0, %eax
 7:
-
+	xorl %edx, %edx
+8:
 	popl %ebx
 	popl %esi
 	popl %edi
 	popl %ecx			# equivalent to addl $4,%esp
 	RET
+9:
+	movl $-1,%eax
+	movl $-1,%edx
+	jmp 8b
 SYM_FUNC_END(csum_partial_copy_generic)
 
 #else
@@ -388,7 +393,7 @@ SYM_FUNC_START(csum_partial_copy_generic)
 	movl ARGBASE+4(%esp),%esi	#src
 	movl ARGBASE+8(%esp),%edi	#dst	
 	movl ARGBASE+12(%esp),%ecx	#len
-	movl $-1, %eax			#sum
+	xorl %eax, %eax			#sum
 #	movl %ecx, %edx  
 	movl %ecx, %ebx  
 	movl %esi, %edx
@@ -430,11 +435,16 @@ EXC(	movb %dl, (%edi)         )
 6:	addl %edx, %eax
 	adcl $0, %eax
 7:
-
+	xorl %edx, %edx
+8:
 	popl %esi
 	popl %edi
 	popl %ebx
 	RET
+9:
+	movl $-1,%eax
+	movl $-1,%edx
+	jmp 8b
 SYM_FUNC_END(csum_partial_copy_generic)
 				
 #undef ROUND
diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index d9e16a2cf285..084181030dd3 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -44,7 +44,7 @@ SYM_FUNC_START(csum_partial_copy_generic)
 	movq  %r13, 3*8(%rsp)
 	movq  %r15, 4*8(%rsp)
 
-	movl  $-1, %eax
+	xorl  %eax, %eax
 	xorl  %r9d, %r9d
 	movl  %edx, %ecx
 	cmpl  $8, %ecx
@@ -249,8 +249,8 @@ SYM_FUNC_START(csum_partial_copy_generic)
 	roll $8, %eax
 	jmp .Lout
 
-	/* Exception: just return 0 */
+	/* Exception: just return -1 */
 .Lfault:
-	xorl %eax, %eax
+	movq -1, %rax
 	jmp  .Lout
 SYM_FUNC_END(csum_partial_copy_generic)
diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index cea25ca8b8cf..5e877592a7b3 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -8,7 +8,7 @@
 
 #include <linux/compiler.h>
 #include <linux/export.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/word-at-a-time.h>
 
 static inline unsigned short from32to16(unsigned a)
diff --git a/arch/x86/lib/csum-wrappers_64.c b/arch/x86/lib/csum-wrappers_64.c
index 145f9a0bde29..e90ac389a013 100644
--- a/arch/x86/lib/csum-wrappers_64.c
+++ b/arch/x86/lib/csum-wrappers_64.c
@@ -4,7 +4,7 @@
  *
  * Wrappers of assembly checksum functions for x86-64.
  */
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/export.h>
 #include <linux/uaccess.h>
 #include <asm/smap.h>
@@ -14,20 +14,18 @@
  * @src: source address (user space)
  * @dst: destination address
  * @len: number of bytes to be copied.
- * @isum: initial sum that is added into the result (32bit unfolded)
- * @errp: set to -EFAULT for an bad source address.
  *
- * Returns an 32bit unfolded checksum of the buffer.
+ * Returns an 32bit unfolded checksum of the buffer or -1ULL on error
  * src and dst are best aligned to 64bits.
  */
-__wsum
+__wsum_fault
 csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
-	__wsum sum;
+	__wsum_fault sum;
 
 	might_sleep();
 	if (!user_access_begin(src, len))
-		return 0;
+		return CSUM_FAULT;
 	sum = csum_partial_copy_generic((__force const void *)src, dst, len);
 	user_access_end();
 	return sum;
@@ -38,20 +36,18 @@ csum_and_copy_from_user(const void __user *src, void *dst, int len)
  * @src: source address
  * @dst: destination address (user space)
  * @len: number of bytes to be copied.
- * @isum: initial sum that is added into the result (32bit unfolded)
- * @errp: set to -EFAULT for an bad destination address.
  *
- * Returns an 32bit unfolded checksum of the buffer.
+ * Returns an 32bit unfolded checksum of the buffer or -1ULL on error
  * src and dst are best aligned to 64bits.
  */
-__wsum
+__wsum_fault
 csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
-	__wsum sum;
+	__wsum_fault sum;
 
 	might_sleep();
 	if (!user_access_begin(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	sum = csum_partial_copy_generic(src, (void __force *)dst, len);
 	user_access_end();
 	return sum;
@@ -62,14 +58,13 @@ csum_and_copy_to_user(const void *src, void __user *dst, int len)
  * @src: source address
  * @dst: destination address
  * @len: number of bytes to be copied.
- * @sum: initial sum that is added into the result (32bit unfolded)
  *
  * Returns an 32bit unfolded checksum of the buffer.
  */
 __wsum
 csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 EXPORT_SYMBOL(csum_partial_copy_nocheck);
 
diff --git a/arch/xtensa/include/asm/asm-prototypes.h b/arch/xtensa/include/asm/asm-prototypes.h
index b0da61812b85..b01b8170fafb 100644
--- a/arch/xtensa/include/asm/asm-prototypes.h
+++ b/arch/xtensa/include/asm/asm-prototypes.h
@@ -3,7 +3,7 @@
 #define __ASM_PROTOTYPES_H
 
 #include <asm/cacheflush.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/ftrace.h>
 #include <asm/page.h>
 #include <asm/string.h>
diff --git a/arch/xtensa/include/asm/checksum.h b/arch/xtensa/include/asm/checksum.h
index 44ec1d0b2a35..bf4ee4fd8f57 100644
--- a/arch/xtensa/include/asm/checksum.h
+++ b/arch/xtensa/include/asm/checksum.h
@@ -37,7 +37,7 @@ asmlinkage __wsum csum_partial(const void *buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 
-asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+asmlinkage __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 #define _HAVE_ARCH_CSUM_AND_COPY
 /*
@@ -47,16 +47,16 @@ asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len)
 static inline
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst,
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst,
 				   int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic((__force const void *)src, dst, len);
 }
 
@@ -237,11 +237,11 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
  *	Copy and checksum to user
  */
 #define HAVE_CSUM_COPY_USER
-static __inline__ __wsum csum_and_copy_to_user(const void *src,
+static __inline__ __wsum_fault csum_and_copy_to_user(const void *src,
 					       void __user *dst, int len)
 {
 	if (!access_ok(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic(src, (__force void *)dst, len);
 }
 #endif
diff --git a/arch/xtensa/lib/checksum.S b/arch/xtensa/lib/checksum.S
index ffee6f94c8f8..71a70bed4618 100644
--- a/arch/xtensa/lib/checksum.S
+++ b/arch/xtensa/lib/checksum.S
@@ -192,7 +192,7 @@ unsigned int csum_partial_copy_generic (const char *src, char *dst, int len)
 ENTRY(csum_partial_copy_generic)
 
 	abi_entry_default
-	movi	a5, -1
+	movi	a5, 0
 	or	a10, a2, a3
 
 	/* We optimize the following alignment tests for the 4-byte
@@ -311,6 +311,7 @@ EX(10f)	s8i	a9, a3, 0
 	ONES_ADD(a5, a9)
 8:
 	mov	a2, a5
+	movi	a3, 0
 	abi_ret_default
 
 5:
@@ -353,7 +354,8 @@ EXPORT_SYMBOL(csum_partial_copy_generic)
 # Exception handler:
 .section .fixup, "ax"
 10:
-	movi	a2, 0
+	movi	a2, -1
+	movi	a3, -1
 	abi_ret_default
 
 .previous
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 627a93ce38ab..a3334ad8ecc8 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -19,8 +19,6 @@
 #include <linux/firmware.h>
 #include <linux/if_vlan.h>
 
-/* Fix for IA64 */
-#include <asm/checksum.h>
 #include <net/ip6_checksum.h>
 
 #include <net/ip.h>
diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c
index f5961bdcc480..a6e88d673878 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -27,8 +27,6 @@
 #include <linux/module.h>
 #include <linux/property.h>
 
-#include <asm/checksum.h>
-
 #include <lantiq_soc.h>
 #include <xway_dma.h>
 #include <lantiq_platform.h>
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 915aaf18c409..bef4cd73f06d 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -51,7 +51,6 @@
 #include <linux/ipv6.h>
 #include <linux/in.h>
 #include <linux/etherdevice.h>
-#include <asm/checksum.h>
 #include <linux/if_vlan.h>
 #include <linux/if_arp.h>
 #include <linux/inetdevice.h>
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index bc3be0330f1d..bcbff216fa5f 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -27,7 +27,7 @@
 #include <asm/debug.h>
 #include <asm/processor.h>
 #include <asm/irqflags.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/os_info.h>
 #include <asm/switch_to.h>
 #include <asm/maccess.h>
diff --git a/include/net/checksum.h b/include/net/checksum.h
index 1338cb92c8e7..22c97902845d 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -16,8 +16,40 @@
 #define _CHECKSUM_H
 
 #include <linux/errno.h>
-#include <asm/types.h>
+#include <linux/bitops.h>
 #include <asm/byteorder.h>
+
+typedef u64 __bitwise __wsum_fault;
+static inline __wsum_fault to_wsum_fault(__wsum v)
+{
+#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
+	return (__force __wsum_fault)v;
+#else
+	return (__force __wsum_fault)((__force u64)v << 32);
+#endif
+}
+
+static inline __wsum_fault from_wsum_fault(__wsum v)
+{
+#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
+	return (__force __wsum)v;
+#else
+	return (__force __wsum)((__force u64)v >> 32);
+#endif
+}
+
+static inline bool wsum_fault_check(__wsum_fault v)
+{
+#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
+	return (__force s64)v < 0;
+#else
+	return (int)(__force u32)v < 0;
+#endif
+}
+
+#define CSUM_FAULT ((__force __wsum_fault)-1)
+
+
 #include <asm/checksum.h>
 #if !defined(_HAVE_ARCH_COPY_AND_CSUM_FROM_USER) || !defined(HAVE_CSUM_COPY_USER)
 #include <linux/uaccess.h>
@@ -25,24 +57,24 @@
 
 #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static __always_inline
-__wsum csum_and_copy_from_user (const void __user *src, void *dst,
+__wsum_fault csum_and_copy_from_user (const void __user *src, void *dst,
 				      int len)
 {
 	if (copy_from_user(dst, src, len))
-		return 0;
-	return csum_partial(dst, len, ~0U);
+		return CSUM_FAULT;
+	return to_wsum_fault(csum_partial(dst, len, 0));
 }
 #endif
 
 #ifndef HAVE_CSUM_COPY_USER
-static __always_inline __wsum csum_and_copy_to_user
+static __always_inline __wsum_fault csum_and_copy_to_user
 (const void *src, void __user *dst, int len)
 {
-	__wsum sum = csum_partial(src, len, ~0U);
+	__wsum sum = csum_partial(src, len, 0);
 
 	if (copy_to_user(dst, src, len) == 0)
-		return sum;
-	return 0;
+		return to_wsum_fault(sum);
+	return CSUM_FAULT;
 }
 #endif
 
diff --git a/include/net/ip6_checksum.h b/include/net/ip6_checksum.h
index c8a96b888277..4666b9855263 100644
--- a/include/net/ip6_checksum.h
+++ b/include/net/ip6_checksum.h
@@ -25,7 +25,7 @@
 #include <asm/types.h>
 #include <asm/byteorder.h>
 #include <net/ip.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/in6.h>
 #include <linux/tcp.h>
 #include <linux/ipv6.h>
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 0eed92b77ba3..971e7824893b 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -4,7 +4,7 @@
  */
 
 #include <kunit/test.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 
 #define MAX_LEN 512
 #define MAX_ALIGN 64
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 27234a820eeb..0da8f36ce341 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1184,15 +1184,16 @@ EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 			       struct iov_iter *i)
 {
-	__wsum sum, next;
+	__wsum sum;
+	__wsum_fault next;
 	sum = *csum;
 	if (WARN_ON_ONCE(!i->data_source))
 		return 0;
 
 	iterate_and_advance(i, bytes, base, len, off, ({
 		next = csum_and_copy_from_user(base, addr + off, len);
-		sum = csum_block_add(sum, next, off);
-		next ? 0 : len;
+		sum = csum_block_add(sum, from_wsum_fault(next), off);
+		likely(!wsum_fault_check(next)) ? 0 : len;
 	}), ({
 		sum = csum_and_memcpy(addr + off, base, len, sum, off);
 	})
@@ -1206,7 +1207,8 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 			     struct iov_iter *i)
 {
 	struct csum_state *csstate = _csstate;
-	__wsum sum, next;
+	__wsum sum;
+	__wsum_fault next;
 
 	if (WARN_ON_ONCE(i->data_source))
 		return 0;
@@ -1222,8 +1224,8 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 	sum = csum_shift(csstate->csum, csstate->off);
 	iterate_and_advance(i, bytes, base, len, off, ({
 		next = csum_and_copy_to_user(addr + off, base, len);
-		sum = csum_block_add(sum, next, off);
-		next ? 0 : len;
+		sum = csum_block_add(sum, from_wsum_fault(next), off);
+		likely(!wsum_fault_check(next)) ? 0 : len;
 	}), ({
 		sum = csum_and_memcpy(base, addr + off, len, sum, off);
 	})
diff --git a/net/ipv6/ip6_checksum.c b/net/ipv6/ip6_checksum.c
index 377717045f8f..fce94af322ae 100644
--- a/net/ipv6/ip6_checksum.c
+++ b/net/ipv6/ip6_checksum.c
@@ -2,7 +2,7 @@
 #include <net/ip.h>
 #include <net/udp.h>
 #include <net/udplite.h>
-#include <asm/checksum.h>
+#include <net/ip6_checksum.h>
 
 #ifndef _HAVE_ARCH_IPV6_CSUM
 __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
  
Thomas Gleixner Oct. 23, 2023, 10:37 a.m. UTC | #2
On Sun, Oct 22 2023 at 20:46, Al Viro wrote:
> @@ -113,14 +113,14 @@ csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
>  		*dst = word | tmp;
>  		checksum += carry;
>  	}
> -	return checksum;
> +	return from64to16 (checksum);

  from64to16(checksum); all over the place

>  
>  #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
>  #define _HAVE_ARCH_CSUM_AND_COPY
>  static inline
> -__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
> +__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
>  {
>  	if (!access_ok(src, len))
> -		return 0;
> +		return -1;

  return CSUM_FAULT; 
  
>  /*
> diff --git a/arch/arm/lib/csumpartialcopygeneric.S b/arch/arm/lib/csumpartialcopygeneric.S
> index 0fd5c10e90a7..5db935eaf165 100644
> --- a/arch/arm/lib/csumpartialcopygeneric.S
> +++ b/arch/arm/lib/csumpartialcopygeneric.S
> @@ -86,7 +86,7 @@ sum	.req	r3
>  
>  FN_ENTRY
>  		save_regs
> -		mov	sum, #-1
> +		mov	sum, #0
>  
>  		cmp	len, #8			@ Ensure that we have at least
>  		blo	.Lless8			@ 8 bytes to copy.
> @@ -160,6 +160,7 @@ FN_ENTRY
>  		ldr	sum, [sp, #0]		@ dst
>  		tst	sum, #1
>  		movne	r0, r0, ror #8
> +		mov	r1, #0
>  		load_regs
>  
>  .Lsrc_not_aligned:
> diff --git a/arch/arm/lib/csumpartialcopyuser.S b/arch/arm/lib/csumpartialcopyuser.S
> index 6928781e6bee..f273c9667914 100644
> --- a/arch/arm/lib/csumpartialcopyuser.S
> +++ b/arch/arm/lib/csumpartialcopyuser.S
> @@ -73,11 +73,11 @@
>  #include "csumpartialcopygeneric.S"
>  
>  /*
> - * We report fault by returning 0 csum - impossible in normal case, since
> - * we start with 0xffffffff for initial sum.
> + * We report fault by returning ~0ULL csum
>   */

There is also a stale comment a few lines further up.

>  		.pushsection .text.fixup,"ax"
>  		.align	4
> -9001:		mov	r0, #0
> +9001:		mov	r0, #-1
> +		mov	r1, #-1
>  		load_regs
>  		.popsection
>  #include <linux/errno.h>
> -#include <asm/types.h>
> +#include <linux/bitops.h>
>  #include <asm/byteorder.h>
> +
> +typedef u64 __bitwise __wsum_fault;

newline please.

> +static inline __wsum_fault to_wsum_fault(__wsum v)
> +{
> +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> +	return (__force __wsum_fault)v;
> +#else
> +	return (__force __wsum_fault)((__force u64)v << 32);
> +#endif
> +}
> +
> +static inline __wsum_fault from_wsum_fault(__wsum v)
> +{
> +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> +	return (__force __wsum)v;
> +#else
> +	return (__force __wsum)((__force u64)v >> 32);
> +#endif
> +}
> +
> +static inline bool wsum_fault_check(__wsum_fault v)
> +{
> +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> +	return (__force s64)v < 0;
> +#else
> +	return (int)(__force u32)v < 0;

Why not __force s32 right away?

>  #include <asm/checksum.h>
>  #if !defined(_HAVE_ARCH_COPY_AND_CSUM_FROM_USER) || !defined(HAVE_CSUM_COPY_USER)
>  #include <linux/uaccess.h>
> @@ -25,24 +57,24 @@
>  
>  #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
>  static __always_inline
> -__wsum csum_and_copy_from_user (const void __user *src, void *dst,
> +__wsum_fault csum_and_copy_from_user (const void __user *src, void *dst,
>  				      int len)
>  {
>  	if (copy_from_user(dst, src, len))
> -		return 0;
> -	return csum_partial(dst, len, ~0U);
> +		return CSUM_FAULT;
> +	return to_wsum_fault(csum_partial(dst, len, 0));
>  }
>  #endif
>  #ifndef HAVE_CSUM_COPY_USER
> -static __always_inline __wsum csum_and_copy_to_user
> +static __always_inline __wsum_fault csum_and_copy_to_user
>  (const void *src, void __user *dst, int len)
>  {
> -	__wsum sum = csum_partial(src, len, ~0U);
> +	__wsum sum = csum_partial(src, len, 0);
>  
>  	if (copy_to_user(dst, src, len) == 0)
> -		return sum;
> -	return 0;
> +		return to_wsum_fault(sum);
> +	return CSUM_FAULT;

  	if (copy_to_user(dst, src, len))
		return CSUM_FAULT;
	return to_wsum_fault(sum);

So it follows the pattern of csum_and_copy_from_user() above?

>  size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
>  			       struct iov_iter *i)
>  {
> -	__wsum sum, next;
> +	__wsum sum;
> +	__wsum_fault next;
>  	sum = *csum;
>  	if (WARN_ON_ONCE(!i->data_source))
>  		return 0;
>  
>  	iterate_and_advance(i, bytes, base, len, off, ({
>  		next = csum_and_copy_from_user(base, addr + off, len);
> -		sum = csum_block_add(sum, next, off);
> -		next ? 0 : len;
> +		sum = csum_block_add(sum, from_wsum_fault(next), off);
> +		likely(!wsum_fault_check(next)) ? 0 : len;

This macro maze is confusing as hell.

Looking at iterate_buf() which is the least convoluted. That resolves to
the following:

     len = bytes;
     ...
     next = csum_and_copy_from_user(...);
     ...
     len -= !wsum_fault_check(next) ? 0 : len;
     ...
     bytes = len;
     ...
     return bytes;

So it correctly returns 'bytes' for the success case and 0 for the fault
case.

Now decrypting iterate_iovec():

    off = 0;
    
    do {
       ....
       len -= !wsum_fault_check(next) ? 0 : len;
       off += len;
       skip += len;
       bytes- -= len;
       if (skip < __p->iov_len)        <- breaks on fault
          break;
       ...
    } while(bytes);

    bytes = off;
    ...
    return bytes;

Which means that if the first vector is successfully copied, then 'off'
is greater 0. A fault on the second one will correctly break out of the
loop, but the function will incorrectly return a value > 0, i.e. the
length of the first iteration.

As the callers just check for != 0 such a partial copy is considered
success, no?

So instead of 

	likely(!wsum_fault_check(next)) ? 0 : len;

shouldn't this just do:

	if (unlikely(wsum_fault_check(next))
		return 0;
        len;

for simplicity and mental sanity sake?

Thanks,

        tglx
  
David Laight Oct. 23, 2023, 2:44 p.m. UTC | #3
From: Al Viro
> Sent: 22 October 2023 20:40
....
> We need a way for csum_and_copy_{from,to}_user() to report faults.
> The approach taken back in 2020 (avoid 0 as return value by starting
> summing from ~0U, use 0 to report faults) had been broken; it does
> yield the right value modulo 2^16-1, but the case when data is
> entirely zero-filled is not handled right.  It almost works, since
> for most of the codepaths we have a non-zero value added in
> and there 0 is not different from anything divisible by 0xffff.
> However, there are cases (ICMPv4 replies, for example) where we
> are not guaranteed that.
> 
> In other words, we really need to have those primitives return 0
> on filled-with-zeroes input.  So let's make them return a 64bit
> value instead; we can do that cheaply (all supported architectures
> do that via a couple of registers) and we can use that to report
> faults without disturbing the 32bit csum.

Does the ICMPv4 sum need to be zero if all zeros but 0xffff
if there are non-zero bytes in there?

IIRC the original buggy case was fixed by returning 0xffff
for the all-zero buffer.

Even if it does then it would seem more sensible to have the
checksum function never return zero, csum_and_copy() return
zero on fault and add extra code to the (unusual) ICMP reply
code to detect 0xffff and convert to zero if the buffer is
all zeros.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Al Viro Oct. 24, 2023, 3:53 a.m. UTC | #4
On Mon, Oct 23, 2023 at 02:44:13PM +0000, David Laight wrote:
> From: Al Viro
> > Sent: 22 October 2023 20:40
> ....
> > We need a way for csum_and_copy_{from,to}_user() to report faults.
> > The approach taken back in 2020 (avoid 0 as return value by starting
> > summing from ~0U, use 0 to report faults) had been broken; it does
> > yield the right value modulo 2^16-1, but the case when data is
> > entirely zero-filled is not handled right.  It almost works, since
> > for most of the codepaths we have a non-zero value added in
> > and there 0 is not different from anything divisible by 0xffff.
> > However, there are cases (ICMPv4 replies, for example) where we
> > are not guaranteed that.
> > 
> > In other words, we really need to have those primitives return 0
> > on filled-with-zeroes input.  So let's make them return a 64bit
> > value instead; we can do that cheaply (all supported architectures
> > do that via a couple of registers) and we can use that to report
> > faults without disturbing the 32bit csum.
> 
> Does the ICMPv4 sum need to be zero if all zeros but 0xffff
> if there are non-zero bytes in there?

No.  RTFRFC, please.  Or the discussion of the bug upthread, for that
matter.

> IIRC the original buggy case was fixed by returning 0xffff
> for the all-zero buffer.

YRIC.  For the benefit of those who can pass the Turing test better than
ChatGPT would:

Define a binary operation on [0..0xffff] by

	X PLUS Y = X + Y - ((X + Y > 0xffff) ? 0xffff : 0)

Properties:
	X PLUS Y \in [0..0xffff]
	X PLUS Y = 0 iff X = Y = 0
	X PLUS Y is congruent to X + Y modulo 0xffff
	X PLUS Y = Y PLUS X
	(X PLUS Y) PLUS Z = X PLUS (Y PLUS Z)
	X PLUS 0 = X
	For any non-zero X, X PLUS 0xffff = X
	X PLUS (0xffff ^ X) = 0xffff
	byteswap(X) PLUS byteswap(Y) = byteswap(X PLUS Y)
(hint: if X \in [0..0xffff], byteswap(X) is congruent to 256*X modulo 0xffff)

	If A0,...,An are all in range 0..0xffff, \sum Ak * 0x1000^k is
congruent to A0 PLUS A1 PLUS ... PLUS An modulo 0xffff.  That's pretty
much the same thing as the usual rule for checking if decimal number
is divisible by 9.

	That's the math behind the checksum calculations.  You look at the
data, append 0 if the length had been odd and sum the 16bit values up
using PLUS as addition.  Result will be a 16bit value that will be
	* congruent to that data taken as long integer modulo 0xffff
	* 0 if and only if the data consists entirely of zeroes.
Endianness does not matter - byteswap the entire thing and result will
be byteswapped.

	Note that since 0xffffffff is a multiple of 0xffff, we can
calculate the remainder modulo 0xffffffff (by doing similar addition
of 4-byte groups), then calculate the remainder of that modulo 0xffff;
that's almost always cheaper - N/4 32bit operations vs N/2 16bit ones.
For 64bit architecture we can do the same with 64bit operations, reducing
to 32bit value in the end.  That's what csum_and_copy_...() stuff
is doing - it's memcpy() (or copy_..._user()) combined with calculation
of some 32bit value that would have the right reminder modulo 0xffff.

	Requirement for ICMP is that this function of the payload should
be 0xffff.  So we zero the "checksum" field, calculate the sum and then
adjust that field so that the sum would become 0xffff.  I.e. if the
value of the function with that field zeroed is N, we want to store
0xffff ^ N into the damn field.

	That's where it had hit the fan - getting 0xffff instead of 0
on the "all zeroes" data ended up with "OK, store the 0xffff ^ 0xffff
in there, everything will be fine".  Which yields the payload with
actual sum being 0.

	Special treatment of icmp_reply() is doable, but nobody is
suggesting to check for "all zeroes" case - that would be insane and
pointless.  The minimal workaround papering over that particular case
would be to chech WTF have we stored in that field and always
replacing 0 with 0xffff.  Note that if our data is e.g.
	<40 zeroes> 0x7f 0xff 0x80 0x00
the old rule would (correctly) suggest storing two zeroes into the
checksum field, while that modification would yield two 0xff in
there.  Which is still fine - 0xffff PLUS 0x7fff PLUS 0x8000 =
0 PLUS 0x7fff PLUS 0x8000 = 0xffff, so both variants are acceptable.

	However, I'm not at all sure that icmp_reply() is really
the only place where we can get all-zero data possible to encounter.
Most of the places where want to calculate checksums are not going
to see all-zeroes data, but it would be a lot better not to have to
rely upon that.
  
Al Viro Oct. 24, 2023, 4:26 a.m. UTC | #5
On Mon, Oct 23, 2023 at 12:37:58PM +0200, Thomas Gleixner wrote:
> On Sun, Oct 22 2023 at 20:46, Al Viro wrote:
> > @@ -113,14 +113,14 @@ csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
> >  		*dst = word | tmp;
> >  		checksum += carry;
> >  	}
> > -	return checksum;
> > +	return from64to16 (checksum);
> 
>   from64to16(checksum); all over the place

Umm...  Is that about whitespace?
 
> >  #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
> >  #define _HAVE_ARCH_CSUM_AND_COPY
> >  static inline
> > -__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
> > +__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
> >  {
> >  	if (!access_ok(src, len))
> > -		return 0;
> > +		return -1;
> 
>   return CSUM_FAULT; 

Already caught and fixed...

> > --- a/arch/arm/lib/csumpartialcopyuser.S
> > +++ b/arch/arm/lib/csumpartialcopyuser.S
> > @@ -73,11 +73,11 @@
> >  #include "csumpartialcopygeneric.S"
> >  
> >  /*
> > - * We report fault by returning 0 csum - impossible in normal case, since
> > - * we start with 0xffffffff for initial sum.
> > + * We report fault by returning ~0ULL csum
> >   */
> 
> There is also a stale comment a few lines further up.

Umm...
 *  Returns : r0:r1 = checksum:0 on success or -1:-1 on fault  
perhaps?

> > +typedef u64 __bitwise __wsum_fault;
> 
> newline please.

Done...

> > +static inline __wsum_fault to_wsum_fault(__wsum v)
> > +{
> > +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> > +	return (__force __wsum_fault)v;
> > +#else
> > +	return (__force __wsum_fault)((__force u64)v << 32);
> > +#endif
> > +}
> > +
> > +static inline __wsum_fault from_wsum_fault(__wsum v)
> > +{
> > +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> > +	return (__force __wsum)v;
> > +#else
> > +	return (__force __wsum)((__force u64)v >> 32);
> > +#endif
> > +}
> > +
> > +static inline bool wsum_fault_check(__wsum_fault v)
> > +{
> > +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> > +	return (__force s64)v < 0;
> > +#else
> > +	return (int)(__force u32)v < 0;
> 
> Why not __force s32 right away?

Mostly to keep the reader within more familiar cases
of conversion - u64 to u32 is "throw the upper 32 bits
away", u32 to s32 - "treat MSB as sign".

It's still a nasal demon country, of course - the proper
solution is

static inline bool wsum_fault_check(__wsum_fault v)
{
#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
	return (__force u64)v & (1ULL << 63);
#else
	return (__force u32)v & (1ULL << 31);
#endif
}

Incidentally, in this case we really want a cast to u32
rather than u64 - gcc is smart enough to figure out that
checking MSB in 32bit can be done as signed 32bit comparison
with 0, but bit 31 in 64bit is not special as far as it's
concerned, even though it's a bit 31 of 32bit register...
 
>   	if (copy_to_user(dst, src, len))
> 		return CSUM_FAULT;
> 	return to_wsum_fault(sum);
> 
> So it follows the pattern of csum_and_copy_from_user() above?

Let's not mix that with the rest of changes...

> Which means that if the first vector is successfully copied, then 'off'
> is greater 0. A fault on the second one will correctly break out of the
> loop, but the function will incorrectly return a value > 0, i.e. the
> length of the first iteration.
> 
> As the callers just check for != 0 such a partial copy is considered
> success, no?

Check the callers...

static __always_inline __must_check
bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
                                  __wsum *csum, struct iov_iter *i)
{
        size_t copied = csum_and_copy_from_iter(addr, bytes, csum, i);
        if (likely(copied == bytes))
                return true;
        iov_iter_revert(i, copied);
        return false;
}

and
static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
                                      struct iov_iter *to, int len,
                                      __wsum *csump)
{
        struct csum_state csdata = { .csum = *csump };
        int ret;

        ret = __skb_datagram_iter(skb, offset, to, len, true,
                                  csum_and_copy_to_iter, &csdata);
        if (ret)
                return ret;

        *csump = csdata.csum;
        return 0;
}

with __skb_datagram_iter() treating short copies as "revert everything
and return -EFAULT".

> So instead of 
> 
> 	likely(!wsum_fault_check(next)) ? 0 : len;
> 
> shouldn't this just do:
> 
> 	if (unlikely(wsum_fault_check(next))
> 		return 0;
>         len;
> 
> for simplicity and mental sanity sake?

Let's not.  The macros are still too convoluted, but I really hate the
idea of having to cope with non-trivial control flow in the thunks.
It's really *NOT* helping mental state of anyone having to touch those...
  
Thomas Gleixner Oct. 24, 2023, 12:31 p.m. UTC | #6
On Tue, Oct 24 2023 at 05:26, Al Viro wrote:
> On Mon, Oct 23, 2023 at 12:37:58PM +0200, Thomas Gleixner wrote:
>> On Sun, Oct 22 2023 at 20:46, Al Viro wrote:
>> > -	return checksum;
>> > +	return from64to16 (checksum);
>> 
>>   from64to16(checksum); all over the place
>
> Umm...  Is that about whitespace?

Yes, my parser choked on that :)
  
>> >  /*
>> > - * We report fault by returning 0 csum - impossible in normal case, since
>> > - * we start with 0xffffffff for initial sum.
>> > + * We report fault by returning ~0ULL csum
>> >   */
>> 
>> There is also a stale comment a few lines further up.
>
> Umm...
>  *  Returns : r0:r1 = checksum:0 on success or -1:-1 on fault  
> perhaps?

Looks good.

>> > +static inline bool wsum_fault_check(__wsum_fault v)
>> > +{
>> > +#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
>> > +	return (__force s64)v < 0;
>> > +#else
>> > +	return (int)(__force u32)v < 0;
>> 
>> Why not __force s32 right away?
>
> Mostly to keep the reader within more familiar cases
> of conversion - u64 to u32 is "throw the upper 32 bits
> away", u32 to s32 - "treat MSB as sign".

Fair enough.

> It's still a nasal demon country, of course - the proper
> solution is
>
> static inline bool wsum_fault_check(__wsum_fault v)
> {
> #if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
> 	return (__force u64)v & (1ULL << 63);
> #else
> 	return (__force u32)v & (1ULL << 31);
> #endif
> }
>
> Incidentally, in this case we really want a cast to u32
> rather than u64 - gcc is smart enough to figure out that
> checking MSB in 32bit can be done as signed 32bit comparison
> with 0, but bit 31 in 64bit is not special as far as it's
> concerned, even though it's a bit 31 of 32bit register...

Indeed.
  
>> As the callers just check for != 0 such a partial copy is considered
>> success, no?
>
> Check the callers...
>
> static __always_inline __must_check
> bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
>                                   __wsum *csum, struct iov_iter *i)
> {
>         size_t copied = csum_and_copy_from_iter(addr, bytes, csum, i);
>         if (likely(copied == bytes))
>                 return true;

Duh. I think I stared at a caller of csum_and_copy_from_iter_full()
instead...

Thanks,

        tglx
  
Al Viro Dec. 5, 2023, 2:21 a.m. UTC | #7
We need a way for csum_and_copy_{from,to}_user() to report faults.
The approach taken back in 2020 (avoid 0 as return value by starting
summing from ~0U, use 0 to report faults) had been broken; it does
yield the right value modulo 2^16-1, but the case when data is
entirely zero-filled is not handled right.  It almost works, since
for most of the codepaths we have a non-zero value added in
and there 0 is not different from anything divisible by 0xffff.
However, there are cases (ICMPv4 replies, for example) where we
are not guaranteed that.

In other words, we really need to have those primitives return 0
on filled-with-zeroes input.  So let's make them return a 64bit
value instead; we can do that cheaply (all supported architectures
do that via a couple of registers) and we can use that to report
faults without disturbing the 32bit csum.

New type: __wsum_fault.  64bit, returned by csum_and_copy_..._user().
Primitives:
        * CSUM_FAULT representing the fault
        * to_wsum_fault() folding __wsum value into that
        * from_wsum_fault() extracting __wsum value
        * wsum_is_fault() checking if it's a fault value

Representation depends upon the target.
        CSUM_FAULT: ~0ULL
        to_wsum_fault(v32): (u64)v32 for 64bit and 32bit l-e,
(u64)v32 << 32 for 32bit b-e.

Rationale: relationship between the calling conventions for returning 64bit
and those for returning 32bit values.  On 64bit architectures the same
register is used; on 32bit l-e the lower half of the value goes in the
same register that is used for returning 32bit values and the upper half
goes into additional register.  On 32bit b-e the opposite happens -
upper 32 bits go into the register used for returning 32bit values and
the lower 32 bits get stuffed into additional register.

So with this choice of representation we need minimal changes on the
asm side (zero an extra register in 32bit case, nothing in 64bit case),
and from_wsum_fault() is as cheap as it gets.

Sum calculation is back to "start from 0".

The rest of the series consists of cleaning up assorted asm/checksum.h.

Branch lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.csum
Individual patches in followups.  Help with review and testing would
be very welcome.

Al Viro (18):
      make net/checksum.h self-contained
      get rid of asm/checksum.h includes outside of include/net/checksum.h and arch
      make net/checksum.h the sole user of asm/checksum.h
      Fix the csum_and_copy_..._user() idiocy
      bits missing from csum_and_copy_{from,to}_user() unexporting.
      consolidate csum_tcpudp_magic(), take default variant into net/checksum.h
      consolidate default ip_compute_csum()
      alpha: pull asm-generic/checksum.h
      mips: pull include of asm-generic/checksum.h out of #if
      nios2: pull asm-generic/checksum.h
      x86: merge csum_fold() for 32bit and 64bit
      x86: merge ip_fast_csum() for 32bit and 64bit
      x86: merge csum_tcpudp_nofold() for 32bit and 64bit
      amd64: saner handling of odd address in csum_partial()
      x86: optimized csum_add() is the same for 32bit and 64bit
      x86: lift the extern for csum_partial() into checksum.h
      x86_64: move csum_ipv6_magic() from csum-wrappers_64.c to csum-partial_64.c
      uml/x86: use normal x86 checksum.h

 arch/alpha/include/asm/asm-prototypes.h   |   2 +-
 arch/alpha/include/asm/checksum.h         |  68 ++----------
 arch/alpha/lib/csum_partial_copy.c        |  74 ++++++-------
 arch/arm/include/asm/checksum.h           |  27 +----
 arch/arm/kernel/armksyms.c                |   3 +-
 arch/arm/lib/csumpartialcopygeneric.S     |   3 +-
 arch/arm/lib/csumpartialcopyuser.S        |   8 +-
 arch/hexagon/include/asm/checksum.h       |   4 +-
 arch/hexagon/kernel/hexagon_ksyms.c       |   1 -
 arch/hexagon/lib/checksum.c               |   1 +
 arch/m68k/include/asm/checksum.h          |  24 +---
 arch/m68k/lib/checksum.c                  |   8 +-
 arch/microblaze/kernel/microblaze_ksyms.c |   2 +-
 arch/mips/include/asm/asm-prototypes.h    |   2 +-
 arch/mips/include/asm/checksum.h          |  32 ++----
 arch/mips/lib/csum_partial.S              |  12 +-
 arch/nios2/include/asm/checksum.h         |  13 +--
 arch/openrisc/kernel/or32_ksyms.c         |   2 +-
 arch/parisc/include/asm/checksum.h        |  21 ----
 arch/powerpc/include/asm/asm-prototypes.h |   2 +-
 arch/powerpc/include/asm/checksum.h       |  27 +----
 arch/powerpc/lib/checksum_32.S            |   6 +-
 arch/powerpc/lib/checksum_64.S            |   4 +-
 arch/powerpc/lib/checksum_wrappers.c      |  14 +--
 arch/s390/include/asm/checksum.h          |  18 ---
 arch/s390/kernel/ipl.c                    |   2 +-
 arch/s390/kernel/os_info.c                |   2 +-
 arch/sh/include/asm/checksum_32.h         |  32 +-----
 arch/sh/kernel/sh_ksyms_32.c              |   2 +-
 arch/sh/lib/checksum.S                    |   6 +-
 arch/sparc/include/asm/asm-prototypes.h   |   2 +-
 arch/sparc/include/asm/checksum_32.h      |  63 ++++++-----
 arch/sparc/include/asm/checksum_64.h      |  21 +---
 arch/sparc/lib/checksum_32.S              |   2 +-
 arch/sparc/lib/csum_copy.S                |   4 +-
 arch/sparc/lib/csum_copy_from_user.S      |   2 +-
 arch/sparc/lib/csum_copy_to_user.S        |   2 +-
 arch/x86/include/asm/asm-prototypes.h     |   2 +-
 arch/x86/include/asm/checksum.h           | 177 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/checksum_32.h        | 141 ++----------------------
 arch/x86/include/asm/checksum_64.h        | 172 +----------------------------
 arch/x86/lib/checksum_32.S                |  20 +++-
 arch/x86/lib/csum-copy_64.S               |   6 +-
 arch/x86/lib/csum-partial_64.c            |  41 ++++---
 arch/x86/lib/csum-wrappers_64.c           |  43 ++------
 arch/x86/um/asm/checksum.h                | 119 --------------------
 arch/x86/um/asm/checksum_32.h             |  38 -------
 arch/x86/um/asm/checksum_64.h             |  19 ----
 arch/xtensa/include/asm/asm-prototypes.h  |   2 +-
 arch/xtensa/include/asm/checksum.h        |  33 +-----
 arch/xtensa/lib/checksum.S                |   6 +-
 drivers/net/ethernet/brocade/bna/bnad.h   |   2 -
 drivers/net/ethernet/lantiq_etop.c        |   2 -
 drivers/net/vmxnet3/vmxnet3_int.h         |   1 -
 drivers/s390/char/zcore.c                 |   2 +-
 include/asm-generic/checksum.h            |  15 +--
 include/net/checksum.h                    |  81 ++++++++++++--
 include/net/ip6_checksum.h                |   1 -
 lib/checksum_kunit.c                      |   2 +-
 net/core/datagram.c                       |   8 +-
 net/core/skbuff.c                         |   8 +-
 net/ipv6/ip6_checksum.c                   |   1 -
 62 files changed, 501 insertions(+), 959 deletions(-)

Part 1: sorting out the includes.
	We have asm/checksum.h and net/checksum.h; the latter pulls
the former.  A lot of things would become easier if we could move
the things from asm/checksum.h to net/checksum.h; for that we need to
make net/checksum.h the only file that pulls asm/checksum.h.

1/18) make net/checksum.h self-contained
right now it has an implicit dependency upon linux/bitops.h (for the
sake of ror32()).

2/18) get rid of asm/checksum.h includes outside of include/net/checksum.h and arch
In almost all cases include is redundant; zcore.c and checksum_kunit.c
are the sole exceptions and those got switched to net/checksum.h

3/18) make net/checksum.h the sole user of asm/checksum.h
All other users (all in arch/* by now) can pull net/checksum.h.

Part 2: fix the fault reporting.

4/18) Fix the csum_and_copy_..._user() idiocy

Fix the breakage introduced back in 2020 - see above for details.

Part 3: trimming related crap

5/18) bits missing from csum_and_copy_{from,to}_user() unexporting.
6/18) consolidate csum_tcpudp_magic(), take default variant into net/checksum.h
7/18) consolidate default ip_compute_csum()
... and take it into include/net/checksum.h
8/18) alpha: pull asm-generic/checksum.h
9/18) mips: pull include of asm-generic/checksum.h out of #if
10/18) nios2: pull asm-generic/checksum.h

Part 4: trimming x86 crap
11/18) x86: merge csum_fold() for 32bit and 64bit
identical...
12/18) x86: merge ip_fast_csum() for 32bit and 64bit
Identical, except that 32bit version uses asm volatile where 64bit
one uses plain asm.  The former had become pointless when memory
clobber got added to both versions...
13/18) x86: merge csum_tcpudp_nofold() for 32bit and 64bit
identical...
14/18) amd64: saner handling of odd address in csum_partial()
all we want there is to have return value congruent to result * 256
modulo 0xffff; no need to convert from 32bit to 16bit (i.e. take it
modulo 0xffff) first - cyclic shift of 32bit value by 8 bits (in either
direction) will work.
Kills the from32to16() helper and yields better code...
15/18) x86: optimized csum_add() is the same for 32bit and 64bit
16/18) x86: lift the extern for csum_partial() into checksum.h
17/18) x86_64: move csum_ipv6_magic() from csum-wrappers_64.c to csum-partial_64.c
... and make uml/amd64 use it.
18/18) uml/x86: use normal x86 checksum.h
The only difference left is that UML really does *NOT* want the
csum-and-uaccess combinations; leave those in 
arch/x86/include/asm/checksum_{32,64}, move the rest into
arch/x86/include/asm/checksum.h (under ifdefs) and that's 
pretty much it.
  
Al Viro Dec. 5, 2023, 2:27 a.m. UTC | #8
On Tue, Dec 05, 2023 at 02:21:00AM +0000, Al Viro wrote:

Arrrgghhh...

Sorry about the crap mixed into that - git send-email .... v2-* without
checking if there are old files matching the same pattern remain.

Shouldn't have skipped --dry-run; mea culpa.
  
David Laight Dec. 6, 2023, 11:10 a.m. UTC | #9
From: Al Viro
> Sent: 05 December 2023 02:21
> 
> We need a way for csum_and_copy_{from,to}_user() to report faults.
> The approach taken back in 2020 (avoid 0 as return value by starting
> summing from ~0U, use 0 to report faults) had been broken; it does
> yield the right value modulo 2^16-1, but the case when data is
> entirely zero-filled is not handled right.  It almost works, since
> for most of the codepaths we have a non-zero value added in
> and there 0 is not different from anything divisible by 0xffff.
> However, there are cases (ICMPv4 replies, for example) where we
> are not guaranteed that.
> 
> In other words, we really need to have those primitives return 0
> on filled-with-zeroes input.

Do we?
I've not seen any justification for this at all.
IIRC the ICMPv4 reply code needs the checksum function return 0xffff
for all-zero input.

So the correct and simple fix is to initialise the sum to 0xffff
in the checksum function.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Al Viro Dec. 6, 2023, 10:43 p.m. UTC | #10
On Wed, Dec 06, 2023 at 11:10:45AM +0000, David Laight wrote:

> Do we?
> I've not seen any justification for this at all.
> IIRC the ICMPv4 reply code needs the checksum function return 0xffff
> for all-zero input.
> 
> So the correct and simple fix is to initialise the sum to 0xffff
> in the checksum function.

You do realize that ICMPv4 reply code is not the only user of those,
right?  Sure, we can special-case it there.  And audit the entire
call tree, proving that no other call chains need the same.

Care to post the analysis?  I have the beginnings of that and it's already
long and convoluted and touches far too many places, all of which will
have to be watched indefinitely, so that changes in there don't introduce
new breakage.

I could be wrong.  About many things, including the depth of your
aversion to RTFS.  But frankly, until that analysis shows up somewhere,
I'm going to ignore your usual handwaving.
  
David Laight Dec. 7, 2023, 9:58 a.m. UTC | #11
From: Al Viro
> Sent: 06 December 2023 22:44
> 
> On Wed, Dec 06, 2023 at 11:10:45AM +0000, David Laight wrote:
> 
> > Do we?
> > I've not seen any justification for this at all.
> > IIRC the ICMPv4 reply code needs the checksum function return 0xffff
> > for all-zero input.
> >
> > So the correct and simple fix is to initialise the sum to 0xffff
> > in the checksum function.
> 
> You do realize that ICMPv4 reply code is not the only user of those,
> right?  Sure, we can special-case it there.  And audit the entire
> call tree, proving that no other call chains need the same.
> 
> Care to post the analysis?  I have the beginnings of that and it's already
> long and convoluted and touches far too many places, all of which will
> have to be watched indefinitely, so that changes in there don't introduce
> new breakage.
> 
> I could be wrong.  About many things, including the depth of your
> aversion to RTFS.  But frankly, until that analysis shows up somewhere,
> I'm going to ignore your usual handwaving.

This code is calculating the ip-checksum of a buffer.
The is subtly different from the 16bit 1's complement sum of
the buffer.
Now 0x0000 and 0xffff are mathematically equivalent but various specs
to treat them differently.

Consider the UDP header checksum.
For IPv4 the checksum field can be zero - but that means 'not
calculated' and should be treated as a valid checksum.
But IPv6 treats zero as an error.
If the 1's complement sum is 0xffff then the checksum field
need to contain 0xffff not 0.
This means you really need to calculate 1 + ~sum16(1, buff, len)
(ie initialise the sum to 1 rather than 0 or 0xffff.)

The issue that showed this was zero being put into an ICMP message
when all the bytes were zero instead of the required 0xffff.
The reporter had changed the initialiser and got the required 0xffff
and everything then worked.

That wasn't the copy+checksum path either - since the packet got
sent rather than EFAULT being generated.

If I read the code/specs I'll only find places where the buffer
is guaranteed to be non-zero (pretty much all of IP, TCP and UDP)
or 0xffff is the required valued (ICMP).

Since you are proposing this patch I think you need to show
a concrete example of where an all zero buffer is required to
generate a zero checksum value but a non-zero buffer must
generate 0xffff.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
David Laight Dec. 8, 2023, 12:04 p.m. UTC | #12
From: Al Viro
> Sent: 06 December 2023 22:44
> 
> On Wed, Dec 06, 2023 at 11:10:45AM +0000, David Laight wrote:
> 
> > Do we?
> > I've not seen any justification for this at all.
> > IIRC the ICMPv4 reply code needs the checksum function return 0xffff
> > for all-zero input.
> >
> > So the correct and simple fix is to initialise the sum to 0xffff
> > in the checksum function.
> 
> You do realize that ICMPv4 reply code is not the only user of those,
> right?  Sure, we can special-case it there.  And audit the entire
> call tree, proving that no other call chains need the same.
> 
> Care to post the analysis?  I have the beginnings of that and it's already
> long and convoluted and touches far too many places, all of which will
> have to be watched indefinitely, so that changes in there don't introduce
> new breakage.
> 
> I could be wrong.  About many things, including the depth of your
> aversion to RTFS.  But frankly, until that analysis shows up somewhere,
> I'm going to ignore your usual handwaving.

I've just read RFC 792 and done some experiments.
The kernel ICMP checksum code is just plain broken.

RFC 792 is quite clear that the checksum is the 16-bit ones's
complement of the one's complement sum of the ICMP message
starting with the ICMP Type.

The one's complement sum of 0xfffe and 0x0001 is zero (not the 0xffff
that the adc loop generates.
So it's one's complement is 0xffff not 0x0000.
So the checksum field in an ICMP message should never contain zero.

Now for the experiments...

Run: 'tcpdump -n -x icmp' in one window to trace the icmp messages.
Elsewhere run 'ping -c1 -s2 -p 0000 host'
The last 6 bytes of the ICMP echo request are the ICMP data eg:
	0800 c381 347d 0001 0000
	cmd   sum  id   seq data
If you repeat the request the 'id' will increase by one or two.
Replace data with the checksum - so ping -c1 -s2 -p c381
You won't be quick enough, the checksum will probably be fffd.
But nearly there...
Repeat but subtract (say) 8 from the old checksum.
Then issue the command a few times and the packet checksum
should go from 0004 down to 0001 then jump to ffff.
But you'll see a 0000 instead of the 0xffff.

Note that the buffer being summed isn't all zeros but the
checksum field is still wrong.

I suspect that the real problem is that the value returned
by csum_fold() should never be put into a packet that is being
transmitted because the domain is [0..0xfffe] not [1..0xffff].

I bet ICMP isn't the only place where a transmitted checksum
ends up being zero.
As I've said before UDP4 treats is as no-checksum (so assume valid)
and UDP6 states the field must not be zero.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Al Viro Dec. 8, 2023, 2:17 p.m. UTC | #13
On Fri, Dec 08, 2023 at 12:04:24PM +0000, David Laight wrote:
> I've just read RFC 792 and done some experiments.
> The kernel ICMP checksum code is just plain broken.
> 
> RFC 792 is quite clear that the checksum is the 16-bit ones's
> complement of the one's complement sum of the ICMP message
> starting with the ICMP Type.
> 
> The one's complement sum of 0xfffe and 0x0001 is zero (not the 0xffff

It is not.  FYI, N-bit one's complement sum is defined as

X + Y <= MAX_N_BIT ? X + Y : X + Y - MAX_N_BIT,

where MAX_N_BIT is 2^N - 1.

You add them as natural numbers.  If there is no carry and result
fits into N bits, that's it.  If there is carry, you add it to
the lower N bits of sum.

Discussion of properties of that operation is present e.g. in
RFC1071, titled "Computing the Internet Checksum".

May I politely suggest that some basic understanding of the
arithmetics involved might be useful for this discussion?
  
Al Viro Dec. 8, 2023, 3:29 p.m. UTC | #14
On Fri, Dec 08, 2023 at 02:17:12PM +0000, Al Viro wrote:
> On Fri, Dec 08, 2023 at 12:04:24PM +0000, David Laight wrote:
> > I've just read RFC 792 and done some experiments.
> > The kernel ICMP checksum code is just plain broken.
> > 
> > RFC 792 is quite clear that the checksum is the 16-bit ones's
> > complement of the one's complement sum of the ICMP message
> > starting with the ICMP Type.
> > 
> > The one's complement sum of 0xfffe and 0x0001 is zero (not the 0xffff
> 
> It is not.  FYI, N-bit one's complement sum is defined as
> 
> X + Y <= MAX_N_BIT ? X + Y : X + Y - MAX_N_BIT,
> 
> where MAX_N_BIT is 2^N - 1.
> 
> You add them as natural numbers.  If there is no carry and result
> fits into N bits, that's it.  If there is carry, you add it to
> the lower N bits of sum.
> 
> Discussion of properties of that operation is present e.g. in
> RFC1071, titled "Computing the Internet Checksum".
> 
> May I politely suggest that some basic understanding of the
> arithmetics involved might be useful for this discussion?

FWIW, "one's complement" in the name is due to the fact that this
operation is how one normally implements addition of integers in
one's complement representation.

The operation itself is on bit vectors - you take two N-bit vectors,
pad them with 0 on the left, add them as unsigned N+1-bit numbers,
then add the leftmost bit (carry) to the value in the remaining N bits.
Since the sum on the first step is no greater than 2^{N+1} - 2, the
result of the second addition will always fit into N bits.

If bit vectors <A> and <B> represent integers x and y with representable
sum (i.e. if 2^{N-1} > x + y > -2^{N-1}), then applying this operation will
produce a vector representing x + y.  In case when the sum allows
more than one representation (i.e. when x + y is 0), it is biased towards
negative zero - the only way to get positive zero is (+0) + (+0); in
particular, your example is (+1) + (-1) yielding (-0).

Your bit vectors are 1111111111111110 and 0000000000000001; padding gives
01111111111111110 and 00000000000000001; the first addition - 01111111111111111,
so the carry bit is 0 and result is the lower 16 bits, i.e. 1111111111111111.

Had the second argument been 0000000000000011 (+3), you would get
10000000000000001 from adding padded vectors, with carry bit being
1.  So the result would be 1 + 0000000000000001, i.e. 0000000000000010
((+2), from adding (-1) and (+3)).

References to 1's complement integers aside, the operation above is
what is used as basis for checksum calculations.  Reasons and
properties are dealt with in IEN 45 (older than RFC 791/792 - TCP
design is older than IP, and that's where the choice of checksum had
been originally dealt with) and in RFC 1071 (which includes
IEN 45 as appendix, noting that it has not been widely available).
  
David Laight Dec. 8, 2023, 3:56 p.m. UTC | #15
From: Al Viro
> Sent: 08 December 2023 14:17
> 
> On Fri, Dec 08, 2023 at 12:04:24PM +0000, David Laight wrote:
> > I've just read RFC 792 and done some experiments.
> > The kernel ICMP checksum code is just plain broken.
> >
> > RFC 792 is quite clear that the checksum is the 16-bit ones's
> > complement of the one's complement sum of the ICMP message
> > starting with the ICMP Type.
> >
> > The one's complement sum of 0xfffe and 0x0001 is zero (not the 0xffff
> 
> It is not.  FYI, N-bit one's complement sum is defined as
> 
> X + Y <= MAX_N_BIT ? X + Y : X + Y - MAX_N_BIT,
> 
> where MAX_N_BIT is 2^N - 1.

If that is true (I did bump into that RFC earlier) I can't help
feeling that someone has decided to call an 'adc sum' 1's compliment!
I've never used a computer with native 1's compliment integers
(only sign over-punch) so I'm not really sure what might be expected
to happen when you wrap from +MAXINT (+32767) to -MAXINT (-32767).

> You add them as natural numbers.  If there is no carry and result
> fits into N bits, that's it.  If there is carry, you add it to
> the lower N bits of sum.
> 
> Discussion of properties of that operation is present e.g. in
> RFC1071, titled "Computing the Internet Checksum".
> 
> May I politely suggest that some basic understanding of the
> arithmetics involved might be useful for this discussion?

Well 0x0000 is +0 and 0xffff is -0, mathematically they are (mostly)
equal.

Most code validating checksums just sums the buffer and expects 0xffff.

RFC 768 (UDP) aays:
If the computed  checksum  is zero,  it is transmitted  as all ones (the
equivalent  in one's complement  arithmetic).   An all zero  transmitted
checksum  value means that the transmitter  generated  no checksum  (for
debugging or for higher level protocols that don't care).

RFC 8200 (IPv6) makes the zero checksum illegal.

So those paths (at least) really need to initialise the chechsum to 1
and then increment after the invert.

I bet that ICMP response (with id == 0 and seq == 0) is the only
place it is possible to get an ip-checksum of a zero buffer.
So it will be pretty moot for copy+checksum with can return 0xffff
(or lots of other values) for an all-zero buffer.

In terms of copy+checksum returning an error, why not reduce the
32bit wcsum once (to 17 bits) and return -1 (or ~0u) on error?
Much simpler than your patch and it won't have the lurking problem
of the result being assigned to a 32bit variable.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Al Viro Dec. 8, 2023, 6:35 p.m. UTC | #16
On Fri, Dec 08, 2023 at 03:56:27PM +0000, David Laight wrote:

> > You add them as natural numbers.  If there is no carry and result
> > fits into N bits, that's it.  If there is carry, you add it to
> > the lower N bits of sum.
> > 
> > Discussion of properties of that operation is present e.g. in
> > RFC1071, titled "Computing the Internet Checksum".
> > 
> > May I politely suggest that some basic understanding of the
> > arithmetics involved might be useful for this discussion?
> 
> Well 0x0000 is +0 and 0xffff is -0, mathematically they are (mostly)
> equal.

As representations of signed integers they are.  However, the origin
of operation in question is irrelevant.  Again, read the fucking RFC.

> I bet that ICMP response (with id == 0 and seq == 0) is the only
> place it is possible to get an ip-checksum of a zero buffer.
> So it will be pretty moot for copy+checksum with can return 0xffff
> (or lots of other values) for an all-zero buffer.

Egads...  Your bets are your business; you *still* have not bothered
to look at the callers of these primitives.  Given the accuracy of
your guesses so far, pardon me for treating any "I bet"/"it stands for
reason"/etc. coming from you as empty handwaving.

> In terms of copy+checksum returning an error, why not reduce the
> 32bit wcsum once (to 17 bits) and return -1 (or ~0u) on error?
> Much simpler than your patch and it won't have the lurking problem
> of the result being assigned to a 32bit variable.

In case you have somehow missed it, quite a few of the affected places
are in assembler.  The same would apply to added code that would do
this reduction.  Which is going to be considerably less trivial than
the changes done in that patch.
  

Patch

diff --git a/arch/alpha/include/asm/asm-prototypes.h b/arch/alpha/include/asm/asm-prototypes.h
index c8ae46fc2e74..0cf90f406d00 100644
--- a/arch/alpha/include/asm/asm-prototypes.h
+++ b/arch/alpha/include/asm/asm-prototypes.h
@@ -1,10 +1,10 @@ 
 #include <linux/spinlock.h>
 
-#include <asm/checksum.h>
 #include <asm/console.h>
 #include <asm/page.h>
 #include <asm/string.h>
 #include <linux/uaccess.h>
+#include <net/checksum.h> // for csum_ipv6_magic()
 
 #include <asm-generic/asm-prototypes.h>
 
diff --git a/arch/alpha/include/asm/checksum.h b/arch/alpha/include/asm/checksum.h
index 99d631e146b2..d3abe290ae4e 100644
--- a/arch/alpha/include/asm/checksum.h
+++ b/arch/alpha/include/asm/checksum.h
@@ -43,7 +43,7 @@  extern __wsum csum_partial(const void *buff, int len, __wsum sum);
  */
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 #define _HAVE_ARCH_CSUM_AND_COPY
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len);
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len);
 
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len);
 
diff --git a/arch/alpha/lib/csum_partial_copy.c b/arch/alpha/lib/csum_partial_copy.c
index 4d180d96f09e..5f75fcd19287 100644
--- a/arch/alpha/lib/csum_partial_copy.c
+++ b/arch/alpha/lib/csum_partial_copy.c
@@ -52,7 +52,7 @@  __asm__ __volatile__("insqh %1,%2,%0":"=r" (z):"r" (x),"r" (y))
 	__guu_err;					\
 })
 
-static inline unsigned short from64to16(unsigned long x)
+static inline __wsum_fault from64to16(unsigned long x)
 {
 	/* Using extract instructions is a bit more efficient
 	   than the original shift/bitmask version.  */
@@ -72,7 +72,7 @@  static inline unsigned short from64to16(unsigned long x)
 			+ (unsigned long) tmp_v.us[2];
 
 	/* Similarly, out_v.us[2] is always zero for the final add.  */
-	return out_v.us[0] + out_v.us[1];
+	return to_wsum_fault((__force __wsum)(out_v.us[0] + out_v.us[1]));
 }
 
 
@@ -80,17 +80,17 @@  static inline unsigned short from64to16(unsigned long x)
 /*
  * Ok. This isn't fun, but this is the EASY case.
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
 			 long len)
 {
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 	unsigned long carry = 0;
 
 	while (len >= 0) {
 		unsigned long word;
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		checksum += carry;
 		src++;
 		checksum += word;
@@ -104,7 +104,7 @@  csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
 	if (len) {
 		unsigned long word, tmp;
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		tmp = *dst;
 		mskql(word, len, word);
 		checksum += word;
@@ -113,14 +113,14 @@  csum_partial_cfu_aligned(const unsigned long __user *src, unsigned long *dst,
 		*dst = word | tmp;
 		checksum += carry;
 	}
-	return checksum;
+	return from64to16 (checksum);
 }
 
 /*
  * This is even less fun, but this is still reasonably
  * easy.
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 			      unsigned long *dst,
 			      unsigned long soff,
@@ -129,16 +129,16 @@  csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 	unsigned long first;
 	unsigned long word, carry;
 	unsigned long lastsrc = 7+len+(unsigned long)src;
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 
 	if (__get_word(ldq_u, first,src))
-		return 0;
+		return CSUM_FAULT;
 	carry = 0;
 	while (len >= 0) {
 		unsigned long second;
 
 		if (__get_word(ldq_u, second, src+1))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		len -= 8;
 		src++;
@@ -157,7 +157,7 @@  csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 		unsigned long tmp;
 		unsigned long second;
 		if (__get_word(ldq_u, second, lastsrc))
-			return 0;
+			return CSUM_FAULT;
 		tmp = *dst;
 		extql(first, soff, word);
 		extqh(second, soff, first);
@@ -169,13 +169,13 @@  csum_partial_cfu_dest_aligned(const unsigned long __user *src,
 		*dst = word | tmp;
 		checksum += carry;
 	}
-	return checksum;
+	return from64to16 (checksum);
 }
 
 /*
  * This is slightly less fun than the above..
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_src_aligned(const unsigned long __user *src,
 			     unsigned long *dst,
 			     unsigned long doff,
@@ -185,12 +185,12 @@  csum_partial_cfu_src_aligned(const unsigned long __user *src,
 	unsigned long carry = 0;
 	unsigned long word;
 	unsigned long second_dest;
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 
 	mskql(partial_dest, doff, partial_dest);
 	while (len >= 0) {
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		len -= 8;
 		insql(word, doff, second_dest);
 		checksum += carry;
@@ -205,7 +205,7 @@  csum_partial_cfu_src_aligned(const unsigned long __user *src,
 	if (len) {
 		checksum += carry;
 		if (__get_word(ldq, word, src))
-			return 0;
+			return CSUM_FAULT;
 		mskql(word, len, word);
 		len -= 8;
 		checksum += word;
@@ -226,14 +226,14 @@  csum_partial_cfu_src_aligned(const unsigned long __user *src,
 	stq_u(partial_dest | second_dest, dst);
 out:
 	checksum += carry;
-	return checksum;
+	return from64to16 (checksum);
 }
 
 /*
  * This is so totally un-fun that it's frightening. Don't
  * look at this too closely, you'll go blind.
  */
-static inline unsigned long
+static inline __wsum_fault
 csum_partial_cfu_unaligned(const unsigned long __user * src,
 			   unsigned long * dst,
 			   unsigned long soff, unsigned long doff,
@@ -242,10 +242,10 @@  csum_partial_cfu_unaligned(const unsigned long __user * src,
 	unsigned long carry = 0;
 	unsigned long first;
 	unsigned long lastsrc;
-	unsigned long checksum = ~0U;
+	unsigned long checksum = 0;
 
 	if (__get_word(ldq_u, first, src))
-		return 0;
+		return CSUM_FAULT;
 	lastsrc = 7+len+(unsigned long)src;
 	mskql(partial_dest, doff, partial_dest);
 	while (len >= 0) {
@@ -253,7 +253,7 @@  csum_partial_cfu_unaligned(const unsigned long __user * src,
 		unsigned long second_dest;
 
 		if (__get_word(ldq_u, second, src+1))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		checksum += carry;
 		len -= 8;
@@ -275,7 +275,7 @@  csum_partial_cfu_unaligned(const unsigned long __user * src,
 		unsigned long second_dest;
 
 		if (__get_word(ldq_u, second, lastsrc))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		extqh(second, soff, first);
 		word |= first;
@@ -297,7 +297,7 @@  csum_partial_cfu_unaligned(const unsigned long __user * src,
 		unsigned long second_dest;
 
 		if (__get_word(ldq_u, second, lastsrc))
-			return 0;
+			return CSUM_FAULT;
 		extql(first, soff, word);
 		extqh(second, soff, first);
 		word |= first;
@@ -310,22 +310,21 @@  csum_partial_cfu_unaligned(const unsigned long __user * src,
 		stq_u(partial_dest | word | second_dest, dst);
 		checksum += carry;
 	}
-	return checksum;
+	return from64to16 (checksum);
 }
 
-static __wsum __csum_and_copy(const void __user *src, void *dst, int len)
+static __wsum_fault __csum_and_copy(const void __user *src, void *dst, int len)
 {
 	unsigned long soff = 7 & (unsigned long) src;
 	unsigned long doff = 7 & (unsigned long) dst;
-	unsigned long checksum;
 
 	if (!doff) {
 		if (!soff)
-			checksum = csum_partial_cfu_aligned(
+			return csum_partial_cfu_aligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst, len-8);
 		else
-			checksum = csum_partial_cfu_dest_aligned(
+			return csum_partial_cfu_dest_aligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst,
 				soff, len-8);
@@ -333,31 +332,28 @@  static __wsum __csum_and_copy(const void __user *src, void *dst, int len)
 		unsigned long partial_dest;
 		ldq_u(partial_dest, dst);
 		if (!soff)
-			checksum = csum_partial_cfu_src_aligned(
+			return csum_partial_cfu_src_aligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst,
 				doff, len-8, partial_dest);
 		else
-			checksum = csum_partial_cfu_unaligned(
+			return csum_partial_cfu_unaligned(
 				(const unsigned long __user *) src,
 				(unsigned long *) dst,
 				soff, doff, len-8, partial_dest);
 	}
-	return (__force __wsum)from64to16 (checksum);
 }
 
-__wsum
-csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return __csum_and_copy(src, dst, len);
 }
 
-__wsum
-csum_partial_copy_nocheck(const void *src, void *dst, int len)
+__wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return __csum_and_copy((__force const void __user *)src,
-						dst, len);
+	return from_wsum_fault(__csum_and_copy((__force const void __user *)src,
+						dst, len));
 }
 EXPORT_SYMBOL(csum_partial_copy_nocheck);
diff --git a/arch/arm/include/asm/checksum.h b/arch/arm/include/asm/checksum.h
index d8a13959bff0..2b9bddb50967 100644
--- a/arch/arm/include/asm/checksum.h
+++ b/arch/arm/include/asm/checksum.h
@@ -38,16 +38,16 @@  __wsum csum_partial(const void *buff, int len, __wsum sum);
 __wsum
 csum_partial_copy_nocheck(const void *src, void *dst, int len);
 
-__wsum
+__wsum_fault
 csum_partial_copy_from_user(const void __user *src, void *dst, int len);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 #define _HAVE_ARCH_CSUM_AND_COPY
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return -1;
 
 	return csum_partial_copy_from_user(src, dst, len);
 }
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 82e96ac83684..d076a5c8556f 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -14,7 +14,7 @@ 
 #include <linux/io.h>
 #include <linux/arm-smccc.h>
 
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/ftrace.h>
 
 /*
diff --git a/arch/arm/lib/csumpartialcopygeneric.S b/arch/arm/lib/csumpartialcopygeneric.S
index 0fd5c10e90a7..5db935eaf165 100644
--- a/arch/arm/lib/csumpartialcopygeneric.S
+++ b/arch/arm/lib/csumpartialcopygeneric.S
@@ -86,7 +86,7 @@  sum	.req	r3
 
 FN_ENTRY
 		save_regs
-		mov	sum, #-1
+		mov	sum, #0
 
 		cmp	len, #8			@ Ensure that we have at least
 		blo	.Lless8			@ 8 bytes to copy.
@@ -160,6 +160,7 @@  FN_ENTRY
 		ldr	sum, [sp, #0]		@ dst
 		tst	sum, #1
 		movne	r0, r0, ror #8
+		mov	r1, #0
 		load_regs
 
 .Lsrc_not_aligned:
diff --git a/arch/arm/lib/csumpartialcopyuser.S b/arch/arm/lib/csumpartialcopyuser.S
index 6928781e6bee..f273c9667914 100644
--- a/arch/arm/lib/csumpartialcopyuser.S
+++ b/arch/arm/lib/csumpartialcopyuser.S
@@ -73,11 +73,11 @@ 
 #include "csumpartialcopygeneric.S"
 
 /*
- * We report fault by returning 0 csum - impossible in normal case, since
- * we start with 0xffffffff for initial sum.
+ * We report fault by returning ~0ULL csum
  */
 		.pushsection .text.fixup,"ax"
 		.align	4
-9001:		mov	r0, #0
+9001:		mov	r0, #-1
+		mov	r1, #-1
 		load_regs
 		.popsection
diff --git a/arch/ia64/include/asm/asm-prototypes.h b/arch/ia64/include/asm/asm-prototypes.h
index a96689447a74..41d23ab53ff4 100644
--- a/arch/ia64/include/asm/asm-prototypes.h
+++ b/arch/ia64/include/asm/asm-prototypes.h
@@ -3,7 +3,7 @@ 
 #define _ASM_IA64_ASM_PROTOTYPES_H
 
 #include <asm/cacheflush.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/esi.h>
 #include <asm/ftrace.h>
 #include <asm/page.h>
diff --git a/arch/m68k/include/asm/checksum.h b/arch/m68k/include/asm/checksum.h
index 692e7b6cc042..2adef06feeb3 100644
--- a/arch/m68k/include/asm/checksum.h
+++ b/arch/m68k/include/asm/checksum.h
@@ -32,7 +32,7 @@  __wsum csum_partial(const void *buff, int len, __wsum sum);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 #define _HAVE_ARCH_CSUM_AND_COPY
-extern __wsum csum_and_copy_from_user(const void __user *src,
+extern __wsum_fault csum_and_copy_from_user(const void __user *src,
 						void *dst,
 						int len);
 
diff --git a/arch/m68k/lib/checksum.c b/arch/m68k/lib/checksum.c
index 5acb821849d3..4fed9070e976 100644
--- a/arch/m68k/lib/checksum.c
+++ b/arch/m68k/lib/checksum.c
@@ -128,7 +128,7 @@  EXPORT_SYMBOL(csum_partial);
  * copy from user space while checksumming, with exception handling.
  */
 
-__wsum
+__wsum_fault
 csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	/*
@@ -137,7 +137,7 @@  csum_and_copy_from_user(const void __user *src, void *dst, int len)
 	 * code.
 	 */
 	unsigned long tmp1, tmp2;
-	__wsum sum = ~0U;
+	__wsum sum = 0;
 
 	__asm__("movel %2,%4\n\t"
 		"btst #1,%4\n\t"	/* Check alignment */
@@ -240,7 +240,7 @@  csum_and_copy_from_user(const void __user *src, void *dst, int len)
 		".even\n"
 		/* If any exception occurs, return 0 */
 	     "90:\t"
-		"clrl %0\n"
+		"moveq #1,%5\n"
 		"jra 7b\n"
 		".previous\n"
 		".section __ex_table,\"a\"\n"
@@ -262,7 +262,7 @@  csum_and_copy_from_user(const void __user *src, void *dst, int len)
 		: "0" (sum), "1" (len), "2" (src), "3" (dst)
 	    );
 
-	return sum;
+	return tmp2 ? CSUM_FAULT : to_wsum_fault(sum);
 }
 
 
diff --git a/arch/microblaze/kernel/microblaze_ksyms.c b/arch/microblaze/kernel/microblaze_ksyms.c
index c892e173ec99..e5858b15cd37 100644
--- a/arch/microblaze/kernel/microblaze_ksyms.c
+++ b/arch/microblaze/kernel/microblaze_ksyms.c
@@ -10,7 +10,7 @@ 
 #include <linux/in6.h>
 #include <linux/syscalls.h>
 
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/cacheflush.h>
 #include <linux/io.h>
 #include <asm/page.h>
diff --git a/arch/mips/include/asm/asm-prototypes.h b/arch/mips/include/asm/asm-prototypes.h
index 8e8fc38b0941..44fd0c30fd73 100644
--- a/arch/mips/include/asm/asm-prototypes.h
+++ b/arch/mips/include/asm/asm-prototypes.h
@@ -1,11 +1,11 @@ 
 /* SPDX-License-Identifier: GPL-2.0 */
-#include <asm/checksum.h>
 #include <asm/page.h>
 #include <asm/fpu.h>
 #include <asm-generic/asm-prototypes.h>
 #include <linux/uaccess.h>
 #include <asm/ftrace.h>
 #include <asm/mmu_context.h>
+#include <net/checksum.h>
 
 extern void clear_page_cpu(void *page);
 extern void copy_page_cpu(void *to, void *from);
diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
index 4044eaf989ac..3dfe9eca4adc 100644
--- a/arch/mips/include/asm/checksum.h
+++ b/arch/mips/include/asm/checksum.h
@@ -34,16 +34,16 @@ 
  */
 __wsum csum_partial(const void *buff, int len, __wsum sum);
 
-__wsum __csum_partial_copy_from_user(const void __user *src, void *dst, int len);
-__wsum __csum_partial_copy_to_user(const void *src, void __user *dst, int len);
+__wsum_fault __csum_partial_copy_from_user(const void __user *src, void *dst, int len);
+__wsum_fault __csum_partial_copy_to_user(const void *src, void __user *dst, int len);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	might_fault();
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return __csum_partial_copy_from_user(src, dst, len);
 }
 
@@ -52,11 +52,11 @@  __wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
  */
 #define HAVE_CSUM_COPY_USER
 static inline
-__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
+__wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
 	might_fault();
 	if (!access_ok(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	return __csum_partial_copy_to_user(src, dst, len);
 }
 
@@ -65,10 +65,10 @@  __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
  * we have just one address space, so this is identical to the above)
  */
 #define _HAVE_ARCH_CSUM_AND_COPY
-__wsum __csum_partial_copy_nocheck(const void *src, void *dst, int len);
+__wsum_fault __csum_partial_copy_nocheck(const void *src, void *dst, int len);
 static inline __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return __csum_partial_copy_nocheck(src, dst, len);
+	return from_wsum_fault(__csum_partial_copy_nocheck(src, dst, len));
 }
 
 /*
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 3d2ff4118d79..b0cda2950f4e 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -437,7 +437,7 @@  EXPORT_SYMBOL(csum_partial)
 
 	.macro __BUILD_CSUM_PARTIAL_COPY_USER mode, from, to
 
-	li	sum, -1
+	move	sum, zero
 	move	odd, zero
 	/*
 	 * Note: dst & src may be unaligned, len may be 0
@@ -723,6 +723,9 @@  EXPORT_SYMBOL(csum_partial)
 1:
 #endif
 	.set	pop
+#ifndef CONFIG_64BIT
+	move	v1, zero
+#endif
 	.set reorder
 	jr	ra
 	.set noreorder
@@ -730,8 +733,11 @@  EXPORT_SYMBOL(csum_partial)
 
 	.set noreorder
 .L_exc:
+#ifndef CONFIG_64BIT
+	li	v1, -1
+#endif
 	jr	ra
-	 li	v0, 0
+	 li	v0, -1
 
 FEXPORT(__csum_partial_copy_nocheck)
 EXPORT_SYMBOL(__csum_partial_copy_nocheck)
diff --git a/arch/openrisc/kernel/or32_ksyms.c b/arch/openrisc/kernel/or32_ksyms.c
index 212e5f85004c..a56dea4411ab 100644
--- a/arch/openrisc/kernel/or32_ksyms.c
+++ b/arch/openrisc/kernel/or32_ksyms.c
@@ -22,7 +22,7 @@ 
 
 #include <asm/processor.h>
 #include <linux/uaccess.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/io.h>
 #include <asm/hardirq.h>
 #include <asm/delay.h>
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 274bce76f5da..c283183e5a81 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -11,7 +11,7 @@ 
 
 #include <linux/threads.h>
 #include <asm/cacheflush.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/uaccess.h>
 #include <asm/epapr_hcalls.h>
 #include <asm/dcr.h>
diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 4b573a3b7e17..b68184dfac00 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -18,18 +18,18 @@ 
  * Like csum_partial, this must be called with even lengths,
  * except for the last fragment.
  */
-extern __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+extern __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
-extern __wsum csum_and_copy_from_user(const void __user *src, void *dst,
+extern __wsum_fault csum_and_copy_from_user(const void __user *src, void *dst,
 				      int len);
 #define HAVE_CSUM_COPY_USER
-extern __wsum csum_and_copy_to_user(const void *src, void __user *dst,
+extern __wsum_fault csum_and_copy_to_user(const void *src, void __user *dst,
 				    int len);
 
 #define _HAVE_ARCH_CSUM_AND_COPY
 #define csum_partial_copy_nocheck(src, dst, len)   \
-        csum_partial_copy_generic((src), (dst), (len))
+        from_wsum_fault(csum_partial_copy_generic((src), (dst), (len)))
 
 
 /*
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index cd00b9bdd772..03f63f36aeba 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -122,7 +122,7 @@  LG_CACHELINE_BYTES = L1_CACHE_SHIFT
 CACHELINE_MASK = (L1_CACHE_BYTES-1)
 
 _GLOBAL(csum_partial_copy_generic)
-	li	r12,-1
+	li	r12,0
 	addic	r0,r0,0			/* clear carry */
 	addi	r6,r4,-4
 	neg	r0,r4
@@ -233,12 +233,14 @@  _GLOBAL(csum_partial_copy_generic)
 	slwi	r0,r0,8
 	adde	r12,r12,r0
 66:	addze	r3,r12
+	li	r4,0
 	beqlr+	cr7
 	rlwinm	r3,r3,8,0,31	/* odd destination address: rotate one byte */
 	blr
 
 fault:
-	li	r3,0
+	li	r3,-1
+	li	r4,-1
 	blr
 
 	EX_TABLE(70b, fault);
diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
index d53d8f09a2c2..3bbfeb98d256 100644
--- a/arch/powerpc/lib/checksum_64.S
+++ b/arch/powerpc/lib/checksum_64.S
@@ -208,7 +208,7 @@  EXPORT_SYMBOL(__csum_partial)
  * csum_partial_copy_generic(r3=src, r4=dst, r5=len)
  */
 _GLOBAL(csum_partial_copy_generic)
-	li	r6,-1
+	li	r6,0
 	addic	r0,r6,0			/* clear carry */
 
 	srdi.	r6,r5,3			/* less than 8 bytes? */
@@ -406,7 +406,7 @@  dstnr;	stb	r6,0(r4)
 	ld	r16,STK_REG(R16)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 .Lerror_nr:
-	li	r3,0
+	li	r3,-1
 	blr
 
 EXPORT_SYMBOL(csum_partial_copy_generic)
diff --git a/arch/powerpc/lib/checksum_wrappers.c b/arch/powerpc/lib/checksum_wrappers.c
index 1a14c8780278..a23482b9f3f0 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -8,16 +8,16 @@ 
 #include <linux/export.h>
 #include <linux/compiler.h>
 #include <linux/types.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/uaccess.h>
 
-__wsum csum_and_copy_from_user(const void __user *src, void *dst,
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst,
 			       int len)
 {
-	__wsum csum;
+	__wsum_fault csum;
 
 	if (unlikely(!user_read_access_begin(src, len)))
-		return 0;
+		return CSUM_FAULT;
 
 	csum = csum_partial_copy_generic((void __force *)src, dst, len);
 
@@ -25,12 +25,12 @@  __wsum csum_and_copy_from_user(const void __user *src, void *dst,
 	return csum;
 }
 
-__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
+__wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
-	__wsum csum;
+	__wsum_fault csum;
 
 	if (unlikely(!user_write_access_begin(dst, len)))
-		return 0;
+		return CSUM_FAULT;
 
 	csum = csum_partial_copy_generic(src, (void __force *)dst, len);
 
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 05e51666db03..9bfd434d31e6 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -28,7 +28,7 @@ 
 #include <asm/cpcmd.h>
 #include <asm/ebcdic.h>
 #include <asm/sclp.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/debug.h>
 #include <asm/abs_lowcore.h>
 #include <asm/os_info.h>
diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
index 6e1824141b29..44447e3fef84 100644
--- a/arch/s390/kernel/os_info.c
+++ b/arch/s390/kernel/os_info.c
@@ -12,7 +12,7 @@ 
 #include <linux/crash_dump.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/abs_lowcore.h>
 #include <asm/os_info.h>
 #include <asm/maccess.h>
diff --git a/arch/sh/include/asm/checksum_32.h b/arch/sh/include/asm/checksum_32.h
index 2b5fa75b4651..94464451fd08 100644
--- a/arch/sh/include/asm/checksum_32.h
+++ b/arch/sh/include/asm/checksum_32.h
@@ -31,7 +31,7 @@  asmlinkage __wsum csum_partial(const void *buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 
-asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+asmlinkage __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 #define _HAVE_ARCH_CSUM_AND_COPY
 /*
@@ -44,15 +44,15 @@  asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len)
 static inline
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic((__force const void *)src, dst, len);
 }
 
@@ -193,12 +193,12 @@  static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
  *	Copy and checksum to user
  */
 #define HAVE_CSUM_COPY_USER
-static inline __wsum csum_and_copy_to_user(const void *src,
+static inline __wsum_fault csum_and_copy_to_user(const void *src,
 					   void __user *dst,
 					   int len)
 {
 	if (!access_ok(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic(src, (__force void *)dst, len);
 }
 #endif /* __ASM_SH_CHECKSUM_H */
diff --git a/arch/sh/kernel/sh_ksyms_32.c b/arch/sh/kernel/sh_ksyms_32.c
index 5858936cb431..ce9d5547ac74 100644
--- a/arch/sh/kernel/sh_ksyms_32.c
+++ b/arch/sh/kernel/sh_ksyms_32.c
@@ -4,7 +4,7 @@ 
 #include <linux/uaccess.h>
 #include <linux/delay.h>
 #include <linux/mm.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/sections.h>
 
 EXPORT_SYMBOL(memchr);
diff --git a/arch/sh/lib/checksum.S b/arch/sh/lib/checksum.S
index 3e07074e0098..2d624efc4c2d 100644
--- a/arch/sh/lib/checksum.S
+++ b/arch/sh/lib/checksum.S
@@ -193,7 +193,7 @@  unsigned int csum_partial_copy_generic (const char *src, char *dst, int len)
 ! r6:	int LEN
 !
 ENTRY(csum_partial_copy_generic)
-	mov	#-1,r7
+	mov	#0,r7
 	mov	#3,r0		! Check src and dest are equally aligned
 	mov	r4,r1
 	and	r0,r1
@@ -358,8 +358,10 @@  EXC(	mov.b	r0,@r5	)
 .section .fixup, "ax"							
 
 6001:
+	mov	#-1,r1
 	rts
-	 mov	#0,r0
+	 mov	#-1,r0
 .previous
+	mov	#0,r1
 	rts
 	 mov	r7,r0
diff --git a/arch/sparc/include/asm/asm-prototypes.h b/arch/sparc/include/asm/asm-prototypes.h
index 4987c735ff56..e15661bf8b36 100644
--- a/arch/sparc/include/asm/asm-prototypes.h
+++ b/arch/sparc/include/asm/asm-prototypes.h
@@ -4,7 +4,7 @@ 
  */
 
 #include <asm/xor.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/trap_block.h>
 #include <linux/uaccess.h>
 #include <asm/atomic.h>
diff --git a/arch/sparc/include/asm/checksum_32.h b/arch/sparc/include/asm/checksum_32.h
index ce11e0ad80c7..751b89d827aa 100644
--- a/arch/sparc/include/asm/checksum_32.h
+++ b/arch/sparc/include/asm/checksum_32.h
@@ -50,7 +50,7 @@  csum_partial_copy_nocheck(const void *src, void *dst, int len)
 
 	__asm__ __volatile__ (
 		"call __csum_partial_copy_sparc_generic\n\t"
-		" mov -1, %%g7\n"
+		" clr %%g7\n"
 	: "=&r" (ret), "=&r" (d), "=&r" (l)
 	: "0" (ret), "1" (d), "2" (l)
 	: "o2", "o3", "o4", "o5", "o7",
@@ -59,20 +59,50 @@  csum_partial_copy_nocheck(const void *src, void *dst, int len)
 	return (__force __wsum)ret;
 }
 
-static inline __wsum
+static inline __wsum_fault
 csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
+	register unsigned int ret asm("o0") = (unsigned int)src;
+	register char *d asm("o1") = dst;
+	register int l asm("g1") = len; // used to return an error
+
 	if (unlikely(!access_ok(src, len)))
-		return 0;
-	return csum_partial_copy_nocheck((__force void *)src, dst, len);
+		return CSUM_FAULT;
+
+	__asm__ __volatile__ (
+		"call __csum_partial_copy_sparc_generic\n\t"
+		" clr %%g7\n"
+	: "=&r" (ret), "=&r" (d), "=&r" (l)
+	: "0" (ret), "1" (d), "2" (l)
+	: "o2", "o3", "o4", "o5", "o7",
+	  "g2", "g3", "g4", "g5", "g7",
+	  "memory", "cc");
+	if (unlikely(l < 0))
+		return CSUM_FAULT;
+	return to_wsum_fault((__force __wsum)ret);
 }
 
-static inline __wsum
+static inline __u64
 csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
-	if (!access_ok(dst, len))
-		return 0;
-	return csum_partial_copy_nocheck(src, (__force void *)dst, len);
+	register unsigned int ret asm("o0") = (unsigned int)src;
+	register char *d asm("o1") = (__force void *)dst;
+	register int l asm("g1") = len; // used to return an error
+
+	if (unlikely(!access_ok(dst, len)))
+		return CSUM_FAULT;
+
+	__asm__ __volatile__ (
+		"call __csum_partial_copy_sparc_generic\n\t"
+		" clr %%g7\n"
+	: "=&r" (ret), "=&r" (d), "=&r" (l)
+	: "0" (ret), "1" (d), "2" (l)
+	: "o2", "o3", "o4", "o5", "o7",
+	  "g2", "g3", "g4", "g5", "g7",
+	  "memory", "cc");
+	if (unlikely(l < 0))
+		return CSUM_FAULT;
+	return to_wsum_fault((__force __wsum)ret);
 }
 
 /* ihl is always 5 or greater, almost always is 5, and iph is word aligned
diff --git a/arch/sparc/include/asm/checksum_64.h b/arch/sparc/include/asm/checksum_64.h
index d6b59461e064..0e3041ca384b 100644
--- a/arch/sparc/include/asm/checksum_64.h
+++ b/arch/sparc/include/asm/checksum_64.h
@@ -39,8 +39,8 @@  __wsum csum_partial(const void * buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len);
-__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len);
-__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len);
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len);
+__wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len);
 
 /* ihl is always 5 or greater, almost always is 5, and iph is word aligned
  * the majority of the time.
diff --git a/arch/sparc/lib/checksum_32.S b/arch/sparc/lib/checksum_32.S
index 84ad709cbecb..546968db199d 100644
--- a/arch/sparc/lib/checksum_32.S
+++ b/arch/sparc/lib/checksum_32.S
@@ -453,5 +453,5 @@  ccslow:	cmp	%g1, 0
  * we only bother with faults on loads... */
 
 cc_fault:
-	ret
-	 clr	%o0
+	retl
+	 mov -1, %g1
diff --git a/arch/sparc/lib/csum_copy.S b/arch/sparc/lib/csum_copy.S
index f968e83bc93b..9312d51367d3 100644
--- a/arch/sparc/lib/csum_copy.S
+++ b/arch/sparc/lib/csum_copy.S
@@ -71,7 +71,7 @@ 
 FUNC_NAME:		/* %o0=src, %o1=dst, %o2=len */
 	LOAD(prefetch, %o0 + 0x000, #n_reads)
 	xor		%o0, %o1, %g1
-	mov		-1, %o3
+	clr		%o3
 	clr		%o4
 	andcc		%g1, 0x3, %g0
 	bne,pn		%icc, 95f
diff --git a/arch/sparc/lib/csum_copy_from_user.S b/arch/sparc/lib/csum_copy_from_user.S
index b0ba8d4dd439..d74241692f0f 100644
--- a/arch/sparc/lib/csum_copy_from_user.S
+++ b/arch/sparc/lib/csum_copy_from_user.S
@@ -9,7 +9,7 @@ 
 	.section .fixup, "ax";	\
 	.align 4;		\
 99:	retl;			\
-	 mov	0, %o0;		\
+	 mov	-1, %o0;	\
 	.section __ex_table,"a";\
 	.align 4;		\
 	.word 98b, 99b;		\
diff --git a/arch/sparc/lib/csum_copy_to_user.S b/arch/sparc/lib/csum_copy_to_user.S
index 91ba36dbf7d2..2878a933d7ab 100644
--- a/arch/sparc/lib/csum_copy_to_user.S
+++ b/arch/sparc/lib/csum_copy_to_user.S
@@ -9,7 +9,7 @@ 
 	.section .fixup,"ax";	\
 	.align 4;		\
 99:	retl;			\
-	 mov	0, %o0;		\
+	 mov	-1, %o0;	\
 	.section __ex_table,"a";\
 	.align 4;		\
 	.word 98b, 99b;		\
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index b1a98fa38828..655e25745349 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -4,7 +4,7 @@ 
 #include <linux/pgtable.h>
 #include <asm/string.h>
 #include <asm/page.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/mce.h>
 
 #include <asm-generic/asm-prototypes.h>
diff --git a/arch/x86/include/asm/checksum_32.h b/arch/x86/include/asm/checksum_32.h
index 17da95387997..65ca3448e83d 100644
--- a/arch/x86/include/asm/checksum_32.h
+++ b/arch/x86/include/asm/checksum_32.h
@@ -27,7 +27,7 @@  asmlinkage __wsum csum_partial(const void *buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 
-asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+asmlinkage __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 /*
  *	Note: when you get a NULL pointer exception here this means someone
@@ -38,17 +38,17 @@  asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len)
  */
 static inline __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 
-static inline __wsum csum_and_copy_from_user(const void __user *src,
+static inline __wsum_fault csum_and_copy_from_user(const void __user *src,
 					     void *dst, int len)
 {
-	__wsum ret;
+	__wsum_fault ret;
 
 	might_sleep();
 	if (!user_access_begin(src, len))
-		return 0;
+		return CSUM_FAULT;
 	ret = csum_partial_copy_generic((__force void *)src, dst, len);
 	user_access_end();
 
@@ -168,15 +168,15 @@  static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
 /*
  *	Copy and checksum to user
  */
-static inline __wsum csum_and_copy_to_user(const void *src,
+static inline __wsum_fault csum_and_copy_to_user(const void *src,
 					   void __user *dst,
 					   int len)
 {
-	__wsum ret;
+	__wsum_fault ret;
 
 	might_sleep();
 	if (!user_access_begin(dst, len))
-		return 0;
+		return CSUM_FAULT;
 
 	ret = csum_partial_copy_generic(src, (__force void *)dst, len);
 	user_access_end();
diff --git a/arch/x86/include/asm/checksum_64.h b/arch/x86/include/asm/checksum_64.h
index 4d4a47a3a8ab..23c56eef8e47 100644
--- a/arch/x86/include/asm/checksum_64.h
+++ b/arch/x86/include/asm/checksum_64.h
@@ -129,10 +129,10 @@  static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
 extern __wsum csum_partial(const void *buff, int len, __wsum sum);
 
 /* Do not call this directly. Use the wrappers below */
-extern __visible __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+extern __visible __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
-extern __wsum csum_and_copy_from_user(const void __user *src, void *dst, int len);
-extern __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len);
+extern __wsum_fault csum_and_copy_from_user(const void __user *src, void *dst, int len);
+extern __wsum_fault csum_and_copy_to_user(const void *src, void __user *dst, int len);
 extern __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len);
 
 /**
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 23318c338db0..ab58a528d846 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -262,7 +262,7 @@  unsigned int csum_partial_copy_generic (const char *src, char *dst,
 
 #define EXC(y...)						\
 	9999: y;						\
-	_ASM_EXTABLE_TYPE(9999b, 7f, EX_TYPE_UACCESS | EX_FLAG_CLEAR_AX)
+	_ASM_EXTABLE_TYPE(9999b, 9f, EX_TYPE_UACCESS)
 
 #ifndef CONFIG_X86_USE_PPRO_CHECKSUM
 
@@ -278,7 +278,7 @@  SYM_FUNC_START(csum_partial_copy_generic)
 	movl ARGBASE+4(%esp),%esi	# src
 	movl ARGBASE+8(%esp),%edi	# dst
 
-	movl $-1, %eax			# sum
+	xorl %eax,%eax			# sum
 	testl $2, %edi			# Check alignment. 
 	jz 2f				# Jump if alignment is ok.
 	subl $2, %ecx			# Alignment uses up two bytes.
@@ -357,12 +357,17 @@  EXC(	movb %cl, (%edi)	)
 6:	addl %ecx, %eax
 	adcl $0, %eax
 7:
-
+	xorl %edx, %edx
+8:
 	popl %ebx
 	popl %esi
 	popl %edi
 	popl %ecx			# equivalent to addl $4,%esp
 	RET
+9:
+	movl $-1,%eax
+	movl $-1,%edx
+	jmp 8b
 SYM_FUNC_END(csum_partial_copy_generic)
 
 #else
@@ -388,7 +393,7 @@  SYM_FUNC_START(csum_partial_copy_generic)
 	movl ARGBASE+4(%esp),%esi	#src
 	movl ARGBASE+8(%esp),%edi	#dst	
 	movl ARGBASE+12(%esp),%ecx	#len
-	movl $-1, %eax			#sum
+	xorl %eax, %eax			#sum
 #	movl %ecx, %edx  
 	movl %ecx, %ebx  
 	movl %esi, %edx
@@ -430,11 +435,16 @@  EXC(	movb %dl, (%edi)         )
 6:	addl %edx, %eax
 	adcl $0, %eax
 7:
-
+	xorl %edx, %edx
+8:
 	popl %esi
 	popl %edi
 	popl %ebx
 	RET
+9:
+	movl $-1,%eax
+	movl $-1,%edx
+	jmp 8b
 SYM_FUNC_END(csum_partial_copy_generic)
 				
 #undef ROUND
diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index d9e16a2cf285..084181030dd3 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -44,7 +44,7 @@  SYM_FUNC_START(csum_partial_copy_generic)
 	movq  %r13, 3*8(%rsp)
 	movq  %r15, 4*8(%rsp)
 
-	movl  $-1, %eax
+	xorl  %eax, %eax
 	xorl  %r9d, %r9d
 	movl  %edx, %ecx
 	cmpl  $8, %ecx
@@ -249,8 +249,8 @@  SYM_FUNC_START(csum_partial_copy_generic)
 	roll $8, %eax
 	jmp .Lout
 
-	/* Exception: just return 0 */
+	/* Exception: just return -1 */
 .Lfault:
-	xorl %eax, %eax
+	movq -1, %rax
 	jmp  .Lout
 SYM_FUNC_END(csum_partial_copy_generic)
diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index cea25ca8b8cf..5e877592a7b3 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -8,7 +8,7 @@ 
 
 #include <linux/compiler.h>
 #include <linux/export.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/word-at-a-time.h>
 
 static inline unsigned short from32to16(unsigned a)
diff --git a/arch/x86/lib/csum-wrappers_64.c b/arch/x86/lib/csum-wrappers_64.c
index 145f9a0bde29..e90ac389a013 100644
--- a/arch/x86/lib/csum-wrappers_64.c
+++ b/arch/x86/lib/csum-wrappers_64.c
@@ -4,7 +4,7 @@ 
  *
  * Wrappers of assembly checksum functions for x86-64.
  */
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/export.h>
 #include <linux/uaccess.h>
 #include <asm/smap.h>
@@ -14,20 +14,18 @@ 
  * @src: source address (user space)
  * @dst: destination address
  * @len: number of bytes to be copied.
- * @isum: initial sum that is added into the result (32bit unfolded)
- * @errp: set to -EFAULT for an bad source address.
  *
- * Returns an 32bit unfolded checksum of the buffer.
+ * Returns an 32bit unfolded checksum of the buffer or -1ULL on error
  * src and dst are best aligned to 64bits.
  */
-__wsum
+__wsum_fault
 csum_and_copy_from_user(const void __user *src, void *dst, int len)
 {
-	__wsum sum;
+	__wsum_fault sum;
 
 	might_sleep();
 	if (!user_access_begin(src, len))
-		return 0;
+		return CSUM_FAULT;
 	sum = csum_partial_copy_generic((__force const void *)src, dst, len);
 	user_access_end();
 	return sum;
@@ -38,20 +36,18 @@  csum_and_copy_from_user(const void __user *src, void *dst, int len)
  * @src: source address
  * @dst: destination address (user space)
  * @len: number of bytes to be copied.
- * @isum: initial sum that is added into the result (32bit unfolded)
- * @errp: set to -EFAULT for an bad destination address.
  *
- * Returns an 32bit unfolded checksum of the buffer.
+ * Returns an 32bit unfolded checksum of the buffer or -1ULL on error
  * src and dst are best aligned to 64bits.
  */
-__wsum
+__wsum_fault
 csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
-	__wsum sum;
+	__wsum_fault sum;
 
 	might_sleep();
 	if (!user_access_begin(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	sum = csum_partial_copy_generic(src, (void __force *)dst, len);
 	user_access_end();
 	return sum;
@@ -62,14 +58,13 @@  csum_and_copy_to_user(const void *src, void __user *dst, int len)
  * @src: source address
  * @dst: destination address
  * @len: number of bytes to be copied.
- * @sum: initial sum that is added into the result (32bit unfolded)
  *
  * Returns an 32bit unfolded checksum of the buffer.
  */
 __wsum
 csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 EXPORT_SYMBOL(csum_partial_copy_nocheck);
 
diff --git a/arch/xtensa/include/asm/asm-prototypes.h b/arch/xtensa/include/asm/asm-prototypes.h
index b0da61812b85..b01b8170fafb 100644
--- a/arch/xtensa/include/asm/asm-prototypes.h
+++ b/arch/xtensa/include/asm/asm-prototypes.h
@@ -3,7 +3,7 @@ 
 #define __ASM_PROTOTYPES_H
 
 #include <asm/cacheflush.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/ftrace.h>
 #include <asm/page.h>
 #include <asm/string.h>
diff --git a/arch/xtensa/include/asm/checksum.h b/arch/xtensa/include/asm/checksum.h
index 44ec1d0b2a35..bf4ee4fd8f57 100644
--- a/arch/xtensa/include/asm/checksum.h
+++ b/arch/xtensa/include/asm/checksum.h
@@ -37,7 +37,7 @@  asmlinkage __wsum csum_partial(const void *buff, int len, __wsum sum);
  * better 64-bit) boundary
  */
 
-asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len);
+asmlinkage __wsum_fault csum_partial_copy_generic(const void *src, void *dst, int len);
 
 #define _HAVE_ARCH_CSUM_AND_COPY
 /*
@@ -47,16 +47,16 @@  asmlinkage __wsum csum_partial_copy_generic(const void *src, void *dst, int len)
 static inline
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len)
 {
-	return csum_partial_copy_generic(src, dst, len);
+	return from_wsum_fault(csum_partial_copy_generic(src, dst, len));
 }
 
 #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static inline
-__wsum csum_and_copy_from_user(const void __user *src, void *dst,
+__wsum_fault csum_and_copy_from_user(const void __user *src, void *dst,
 				   int len)
 {
 	if (!access_ok(src, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic((__force const void *)src, dst, len);
 }
 
@@ -237,11 +237,11 @@  static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
  *	Copy and checksum to user
  */
 #define HAVE_CSUM_COPY_USER
-static __inline__ __wsum csum_and_copy_to_user(const void *src,
+static __inline__ __wsum_fault csum_and_copy_to_user(const void *src,
 					       void __user *dst, int len)
 {
 	if (!access_ok(dst, len))
-		return 0;
+		return CSUM_FAULT;
 	return csum_partial_copy_generic(src, (__force void *)dst, len);
 }
 #endif
diff --git a/arch/xtensa/lib/checksum.S b/arch/xtensa/lib/checksum.S
index ffee6f94c8f8..71a70bed4618 100644
--- a/arch/xtensa/lib/checksum.S
+++ b/arch/xtensa/lib/checksum.S
@@ -192,7 +192,7 @@  unsigned int csum_partial_copy_generic (const char *src, char *dst, int len)
 ENTRY(csum_partial_copy_generic)
 
 	abi_entry_default
-	movi	a5, -1
+	movi	a5, 0
 	or	a10, a2, a3
 
 	/* We optimize the following alignment tests for the 4-byte
@@ -311,6 +311,7 @@  EX(10f)	s8i	a9, a3, 0
 	ONES_ADD(a5, a9)
 8:
 	mov	a2, a5
+	movi	a3, 0
 	abi_ret_default
 
 5:
@@ -353,7 +354,8 @@  EXPORT_SYMBOL(csum_partial_copy_generic)
 # Exception handler:
 .section .fixup, "ax"
 10:
-	movi	a2, 0
+	movi	a2, -1
+	movi	a3, -1
 	abi_ret_default
 
 .previous
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 627a93ce38ab..a3334ad8ecc8 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -19,8 +19,6 @@ 
 #include <linux/firmware.h>
 #include <linux/if_vlan.h>
 
-/* Fix for IA64 */
-#include <asm/checksum.h>
 #include <net/ip6_checksum.h>
 
 #include <net/ip.h>
diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c
index f5961bdcc480..a6e88d673878 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -27,8 +27,6 @@ 
 #include <linux/module.h>
 #include <linux/property.h>
 
-#include <asm/checksum.h>
-
 #include <lantiq_soc.h>
 #include <xway_dma.h>
 #include <lantiq_platform.h>
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 915aaf18c409..bef4cd73f06d 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -51,7 +51,6 @@ 
 #include <linux/ipv6.h>
 #include <linux/in.h>
 #include <linux/etherdevice.h>
-#include <asm/checksum.h>
 #include <linux/if_vlan.h>
 #include <linux/if_arp.h>
 #include <linux/inetdevice.h>
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index bc3be0330f1d..bcbff216fa5f 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -27,7 +27,7 @@ 
 #include <asm/debug.h>
 #include <asm/processor.h>
 #include <asm/irqflags.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <asm/os_info.h>
 #include <asm/switch_to.h>
 #include <asm/maccess.h>
diff --git a/include/net/checksum.h b/include/net/checksum.h
index 1338cb92c8e7..22c97902845d 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -16,8 +16,40 @@ 
 #define _CHECKSUM_H
 
 #include <linux/errno.h>
-#include <asm/types.h>
+#include <linux/bitops.h>
 #include <asm/byteorder.h>
+
+typedef u64 __bitwise __wsum_fault;
+static inline __wsum_fault to_wsum_fault(__wsum v)
+{
+#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
+	return (__force __wsum_fault)v;
+#else
+	return (__force __wsum_fault)((__force u64)v << 32);
+#endif
+}
+
+static inline __wsum_fault from_wsum_fault(__wsum v)
+{
+#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
+	return (__force __wsum)v;
+#else
+	return (__force __wsum)((__force u64)v >> 32);
+#endif
+}
+
+static inline bool wsum_fault_check(__wsum_fault v)
+{
+#if defined(CONFIG_64BIT) || defined(__LITTLE_ENDIAN__)
+	return (__force s64)v < 0;
+#else
+	return (int)(__force u32)v < 0;
+#endif
+}
+
+#define CSUM_FAULT ((__force __wsum_fault)-1)
+
+
 #include <asm/checksum.h>
 #if !defined(_HAVE_ARCH_COPY_AND_CSUM_FROM_USER) || !defined(HAVE_CSUM_COPY_USER)
 #include <linux/uaccess.h>
@@ -25,24 +57,24 @@ 
 
 #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
 static __always_inline
-__wsum csum_and_copy_from_user (const void __user *src, void *dst,
+__wsum_fault csum_and_copy_from_user (const void __user *src, void *dst,
 				      int len)
 {
 	if (copy_from_user(dst, src, len))
-		return 0;
-	return csum_partial(dst, len, ~0U);
+		return CSUM_FAULT;
+	return to_wsum_fault(csum_partial(dst, len, 0));
 }
 #endif
 
 #ifndef HAVE_CSUM_COPY_USER
-static __always_inline __wsum csum_and_copy_to_user
+static __always_inline __wsum_fault csum_and_copy_to_user
 (const void *src, void __user *dst, int len)
 {
-	__wsum sum = csum_partial(src, len, ~0U);
+	__wsum sum = csum_partial(src, len, 0);
 
 	if (copy_to_user(dst, src, len) == 0)
-		return sum;
-	return 0;
+		return to_wsum_fault(sum);
+	return CSUM_FAULT;
 }
 #endif
 
diff --git a/include/net/ip6_checksum.h b/include/net/ip6_checksum.h
index c8a96b888277..4666b9855263 100644
--- a/include/net/ip6_checksum.h
+++ b/include/net/ip6_checksum.h
@@ -25,7 +25,7 @@ 
 #include <asm/types.h>
 #include <asm/byteorder.h>
 #include <net/ip.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 #include <linux/in6.h>
 #include <linux/tcp.h>
 #include <linux/ipv6.h>
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 0eed92b77ba3..971e7824893b 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -4,7 +4,7 @@ 
  */
 
 #include <kunit/test.h>
-#include <asm/checksum.h>
+#include <net/checksum.h>
 
 #define MAX_LEN 512
 #define MAX_ALIGN 64
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 27234a820eeb..076ff222ee39 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1184,15 +1184,16 @@  EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 			       struct iov_iter *i)
 {
-	__wsum sum, next;
+	__wsum sum;
+	__wsum_fault next;
 	sum = *csum;
 	if (WARN_ON_ONCE(!i->data_source))
 		return 0;
 
 	iterate_and_advance(i, bytes, base, len, off, ({
 		next = csum_and_copy_from_user(base, addr + off, len);
-		sum = csum_block_add(sum, next, off);
-		next ? 0 : len;
+		sum = csum_block_add(sum, from_wsum_fault(next), off);
+		unlikely(wsum_fault_check(next)) ? 0 : len;
 	}), ({
 		sum = csum_and_memcpy(addr + off, base, len, sum, off);
 	})
@@ -1206,7 +1207,8 @@  size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 			     struct iov_iter *i)
 {
 	struct csum_state *csstate = _csstate;
-	__wsum sum, next;
+	__wsum sum;
+	__wsum_fault next;
 
 	if (WARN_ON_ONCE(i->data_source))
 		return 0;
@@ -1222,8 +1224,8 @@  size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 	sum = csum_shift(csstate->csum, csstate->off);
 	iterate_and_advance(i, bytes, base, len, off, ({
 		next = csum_and_copy_to_user(addr + off, base, len);
-		sum = csum_block_add(sum, next, off);
-		next ? 0 : len;
+		sum = csum_block_add(sum, from_wsum_fault(next), off);
+		unlikely(wsum_fault_check(next)) ? 0 : len;
 	}), ({
 		sum = csum_and_memcpy(base, addr + off, len, sum, off);
 	})
diff --git a/net/ipv6/ip6_checksum.c b/net/ipv6/ip6_checksum.c
index 377717045f8f..fce94af322ae 100644
--- a/net/ipv6/ip6_checksum.c
+++ b/net/ipv6/ip6_checksum.c
@@ -2,7 +2,7 @@ 
 #include <net/ip.h>
 #include <net/udp.h>
 #include <net/udplite.h>
-#include <asm/checksum.h>
+#include <net/ip6_checksum.h>
 
 #ifndef _HAVE_ARCH_IPV6_CSUM
 __sum16 csum_ipv6_magic(const struct in6_addr *saddr,