commit 51f70ddc85e4c38efdc30e2cb4fa1a0b3878ea12 Author: Ben Hutchings Date: Tue Sep 25 23:47:37 2018 +0100 Linux 3.16.58 commit 536c4d174c0402c5fbf6f7a995f7c9539d124410 Author: Linus Torvalds Date: Wed Sep 12 23:57:48 2018 -1000 mm: get rid of vmacache_flush_all() entirely commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream. Jann Horn points out that the vmacache_flush_all() function is not only potentially expensive, it's buggy too. It also happens to be entirely unnecessary, because the sequence number overflow case can be avoided by simply making the sequence number be 64-bit. That doesn't even grow the data structures in question, because the other adjacent fields are already 64-bit. So simplify the whole thing by just making the sequence number overflow case go away entirely, which gets rid of all the complications and makes the code faster too. Win-win. [ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics also just goes away entirely with this ] Reported-by: Jann Horn Suggested-by: Will Deacon Acked-by: Davidlohr Bueso Cc: Oleg Nesterov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: drop changes to mm debug code] Signed-off-by: Ben Hutchings commit e23ada68e9428ef5f711e84d58b1c76465990918 Author: Paolo Bonzini Date: Tue May 5 12:08:55 2015 +0200 KVM: x86: introduce num_emulated_msrs commit 62ef68bb4d00f1a662e487f3fc44ce8521c416aa upstream. We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if the host CPU does not support SMM virtualization. Introduce the logic to do that, and also move paravirt MSRs to emulated_msrs for simplicity and to get rid of KVM_SAVE_MSRS_BEGIN. Reviewed-by: Radim Krčmář Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings commit 98d050e1d1688c3e971fa148ba9e5023cdae3345 Author: Piotr Luc Date: Wed Oct 12 20:05:20 2016 +0200 x86/cpu/intel: Add Knights Mill to Intel family commit 0047f59834e5947d45f34f5f12eb330d158f700b upstream. Add CPUID of Knights Mill (KNM) processor to Intel family list. Signed-off-by: Piotr Luc Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20161012180520.30976-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 5abd1583e06b3963e5c9d915760367de86808b78 Author: Borislav Petkov Date: Thu Sep 7 19:08:21 2017 +0200 x86/cpu/AMD: Fix erratum 1076 (CPB bit) commit f7f3dc00f61261cdc9ccd8b886f21bc4dffd6fd9 upstream. CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that. Signed-off-by: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sherry Hurwitz Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de Signed-off-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Change the added case into an if-statement - s/x86_stepping/x86_mask/] Signed-off-by: Ben Hutchings commit a5e7e187aef0a25aaed6820e6664d1cc4f99c876 Author: Kyle Huey Date: Tue Feb 14 00:11:03 2017 -0800 x86/process: Correct and optimize TIF_BLOCKSTEP switch commit b9894a2f5bd18b1691cb6872c9afe32b148d0132 upstream. The debug control MSR is "highly magical" as the blockstep bit can be cleared by hardware under not well documented circumstances. So a task switch relying on the bit set by the previous task (according to the previous tasks thread flags) can trip over this and not update the flag for the next task. To fix this its required to handle DEBUGCTLMSR_BTF when either the previous or the next or both tasks have the TIF_BLOCKSTEP flag set. While at it avoid branching within the TIF_BLOCKSTEP case and evaluating boot_cpu_data twice in kernels without CONFIG_X86_DEBUGCTLMSR. x86_64: arch/x86/kernel/process.o text data bss dec hex 3024 8577 16 11617 2d61 Before 3008 8577 16 11601 2d51 After i386: No change [ tglx: Made the shift value explicit, use a local variable to make the code readable and massaged changelog] Originally-by: Thomas Gleixner Signed-off-by: Kyle Huey Cc: Peter Zijlstra Cc: Andy Lutomirski Link: http://lkml.kernel.org/r/20170214081104.9244-3-khuey@kylehuey.com Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 3c459a5f4c7900ea2fd681c94cb14bf9173c0a2a Author: Kyle Huey Date: Tue Feb 14 00:11:02 2017 -0800 x86/process: Optimize TIF checks in __switch_to_xtra() commit af8b3cd3934ec60f4c2a420d19a9d416554f140b upstream. Help the compiler to avoid reevaluating the thread flags for each checked bit by reordering the bit checks and providing an explicit xor for evaluation. With default defconfigs for each arch, x86_64: arch/x86/kernel/process.o text data bss dec hex 3056 8577 16 11649 2d81 Before 3024 8577 16 11617 2d61 After i386: arch/x86/kernel/process.o text data bss dec hex 2957 8673 8 11638 2d76 Before 2925 8673 8 11606 2d56 After Originally-by: Thomas Gleixner Signed-off-by: Kyle Huey Cc: Peter Zijlstra Cc: Andy Lutomirski Link: http://lkml.kernel.org/r/20170214081104.9244-2-khuey@kylehuey.com Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - We don't do refresh_tr_limit() here - Use ACCESS_ONCE() instead of READ_ONCE()] Signed-off-by: Ben Hutchings commit 9fd2b97aa4a94fd22febaf22c7ded343ba25afd4 Author: Kees Cook Date: Wed Jun 25 16:08:24 2014 -0700 seccomp: add "seccomp" syscall commit 48dc92b9fc3926844257316e75ba11eb5c742b2c upstream. This adds the new "seccomp" syscall with both an "operation" and "flags" parameter for future expansion. The third argument is a pointer value, used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). In addition to the TSYNC flag later in this patch series, there is a non-zero chance that this syscall could be used for configuring a fixed argument area for seccomp-tracer-aware processes to pass syscall arguments in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter" for this syscall. Additionally, this syscall uses operation, flags, and user pointer for arguments because strictly passing arguments via a user pointer would mean seccomp itself would be unable to trivially filter the seccomp syscall itself. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings commit c8085221ed12ef1758b6a99a51975b9811a64460 Author: Kees Cook Date: Wed Jun 25 15:55:25 2014 -0700 seccomp: split mode setting routines commit 3b23dd12846215eff4afb073366b80c0c4d7543e upstream. Separates the two mode setting paths to make things more readable with fewer #ifdefs within function bodies. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings commit f4ab7e360995f53ab15763b989a1e8fe30d6b26e Author: Kees Cook Date: Wed Jun 25 15:38:02 2014 -0700 seccomp: extract check/assign mode helpers commit 1f41b450416e689b9b7c8bfb750a98604f687a9b upstream. To support splitting mode 1 from mode 2, extract the mode checking and assignment logic into common functions. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings commit 14d4b3d574c1ff6a8f830a198fb6b569468d7d6b Author: Kees Cook Date: Wed May 21 15:02:11 2014 -0700 seccomp: create internal mode-setting function commit d78ab02c2c194257a03355fbb79eb721b381d105 upstream. In preparation for having other callers of the seccomp mode setting logic, split the prctl entry point away from the core logic that performs seccomp mode setting. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings commit 991ec538e6683859b065467b8406c7e57526e212 Author: Eric Sandeen Date: Fri Jun 8 09:53:49 2018 -0700 xfs: don't call xfs_da_shrink_inode with NULL bp commit bb3d48dcf86a97dc25fe9fc2c11938e19cb4399a upstream. xfs_attr3_leaf_create may have errored out before instantiating a buffer, for example if the blkno is out of range. In that case there is no work to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops if we try. This also seems to fix a flaw where the original error from xfs_attr3_leaf_create gets overwritten in the cleanup case, and it removes a pointless assignment to bp which isn't used after this. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969 Reported-by: Xu, Wen Tested-by: Xu, Wen Signed-off-by: Eric Sandeen Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 7744e6b42712dd27e2457e1eb03b1c73920364c2 Author: Dave Chinner Date: Tue Apr 17 17:17:34 2018 -0700 xfs: validate cached inodes are free when allocated commit afca6c5b2595fc44383919fba740c194b0b76aff upstream. A recent fuzzed filesystem image cached random dcache corruption when the reproducer was run. This often showed up as panics in lookup_slow() on a null inode->i_ops pointer when doing pathwalks. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 .... Call Trace: lookup_slow+0x44/0x60 walk_component+0x3dd/0x9f0 link_path_walk+0x4a7/0x830 path_lookupat+0xc1/0x470 filename_lookup+0x129/0x270 user_path_at_empty+0x36/0x40 path_listxattr+0x98/0x110 SyS_listxattr+0x13/0x20 do_syscall_64+0xf5/0x280 entry_SYSCALL_64_after_hwframe+0x42/0xb7 but had many different failure modes including deadlocks trying to lock the inode that was just allocated or KASAN reports of use-after-free violations. The cause of the problem was a corrupt INOBT on a v4 fs where the root inode was marked as free in the inobt record. Hence when we allocated an inode, it chose the root inode to allocate, found it in the cache and re-initialised it. We recently fixed a similar inode allocation issue caused by inobt record corruption problem in xfs_iget_cache_miss() in commit ee457001ed6c ("xfs: catch inode allocation state mismatch corruption"). This change adds similar checks to the cache-hit path to catch it, and turns the reproducer into a corruption shutdown situation. Reported-by: Wen Xu Signed-Off-By: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Reviewed-by: Darrick J. Wong [darrick: fix typos in comment] Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: - Look up mode in XFS inode, not VFS inode - Use positive error codes] Signed-off-by: Ben Hutchings commit 3b479fc0dff59955b8ff0c52850182a27b986690 Author: Dave Chinner Date: Fri Mar 23 10:22:53 2018 -0700 xfs: catch inode allocation state mismatch corruption commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream. We recently came across a V4 filesystem causing memory corruption due to a newly allocated inode being setup twice and being added to the superblock inode list twice. From code inspection, the only way this could happen is if a newly allocated inode was not marked as free on disk (i.e. di_mode wasn't zero). Running the metadump on an upstream debug kernel fails during inode allocation like so: XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod= e.c, line: 838 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:114! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0= 1/2014 RIP: 0010:assfail+0x28/0x30 RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202 RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30 R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10 FS: 00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000= 000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0 Call Trace: xfs_ialloc+0x383/0x570 xfs_dir_ialloc+0x6a/0x2a0 xfs_create+0x412/0x670 xfs_generic_create+0x1f7/0x2c0 ? capable_wrt_inode_uidgid+0x3f/0x50 vfs_mkdir+0xfb/0x1b0 SyS_mkdir+0xcf/0xf0 do_syscall_64+0x73/0x1a0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Extracting the inode number we crashed on from an event trace and looking at it with xfs_db: xfs_db> inode 184452204 xfs_db> p core.magic = 0x494e core.mode = 0100644 core.version = 2 core.format = 2 (extents) core.nlinkv2 = 1 core.onlink = 0 ..... Confirms that it is not a free inode on disk. xfs_repair also trips over this inode: ..... zero length extent (off = 0, fsbno = 0) in ino 184452204 correcting nextents for inode 184452204 bad attribute fork in inode 184452204, would clear attr fork bad nblocks 1 for inode 184452204, would reset to 0 bad anextents 1 for inode 184452204, would reset to 0 imap claims in-use inode 184452204 is free, would correct imap would have cleared inode 184452204 ..... disconnected inode 184452204, would move to lost+found And so we have a situation where the directory structure and the inobt thinks the inode is free, but the inode on disk thinks it is still in use. Where this corruption came from is not possible to diagnose, but we can detect it and prevent the kernel from oopsing on lookup. The reproducer now results in: $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5} mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex= ists mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex= ists mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu= re needs cleaning mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o= utput error mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o= utput error .... And this corruption shutdown: [ 54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not= marked free on disk [ 54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 = of file fs/xfs/xfs_trans.c. Caller xfs_create+0x425/0x670 [ 54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #= 443 [ 54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO= S 1.10.2-1 04/01/2014 [ 54.852859] Call Trace: [ 54.853531] dump_stack+0x85/0xc5 [ 54.854385] xfs_trans_cancel+0x197/0x1c0 [ 54.855421] xfs_create+0x425/0x670 [ 54.856314] xfs_generic_create+0x1f7/0x2c0 [ 54.857390] ? capable_wrt_inode_uidgid+0x3f/0x50 [ 54.858586] vfs_mkdir+0xfb/0x1b0 [ 54.859458] SyS_mkdir+0xcf/0xf0 [ 54.860254] do_syscall_64+0x73/0x1a0 [ 54.861193] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 54.862492] RIP: 0033:0x7fb73bddf547 [ 54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000= 000000000053 [ 54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73= bddf547 [ 54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda= a55449a [ 54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a= 8670dd0 [ 54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000= 00001ff [ 54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda= a553500 [ 54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1= 024 of file fs/xfs/xfs_trans.c. Return address = ffffffff814cd050 [ 54.882790] XFS (loop0): Corruption of in-memory data detected. Shutt= ing down filesystem [ 54.884597] XFS (loop0): Please umount the filesystem and rectify the = problem(s) Note that this crash is only possible on v4 filesystemsi or v5 filesystems mounted with the ikeep mount option. For all other V5 filesystems, this problem cannot occur because we don't read inodes we are allocating from disk - we simply overwrite them with the new inode information. Signed-Off-By: Dave Chinner Reviewed-by: Carlos Maiolino Tested-by: Carlos Maiolino Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: - Look up mode in XFS inode, not VFS inode - Use positive error codes] Signed-off-by: Ben Hutchings commit 5c8e78e811123b61c8a194a28b48df984b540ec7 Author: Ernesto A. Fernández Date: Thu Aug 23 17:00:25 2018 -0700 hfsplus: fix NULL dereference in hfsplus_lookup() commit a7ec7a4193a2eb3b5341243fc0b621c1ac9e4ec4 upstream. An HFS+ filesystem can be mounted read-only without having a metadata directory, which is needed to support hardlinks. But if the catalog data is corrupted, a directory lookup may still find dentries claiming to be hardlinks. hfsplus_lookup() does check that ->hidden_dir is not NULL in such a situation, but mistakenly does so after dereferencing it for the first time. Reorder this check to prevent a crash. This happens when looking up corrupted catalog data (dentry) on a filesystem with no metadata directory (this could only ever happen on a read-only mount). Wen Xu sent the replication steps in detail to the fsdevel list: https://bugzilla.kernel.org/show_bug.cgi?id=200297 Link: http://lkml.kernel.org/r/20180712215344.q44dyrhymm4ajkao@eaf Signed-off-by: Ernesto A. Fernández Reported-by: Wen Xu Cc: Viacheslav Dubeyko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 7cd49306b9e47333e097ea586feef596ba708771 Author: Qu Wenruo Date: Tue Jul 3 17:10:07 2018 +0800 btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized commit 389305b2aa68723c754f88d9dbd268a400e10664 upstream. Invalid reloc tree can cause kernel NULL pointer dereference when btrfs does some cleanup of the reloc roots. It turns out that fs_info::reloc_ctl can be NULL in btrfs_recover_relocation() as we allocate relocation control after all reloc roots have been verified. So when we hit: note, we haven't called set_reloc_control() thus fs_info::reloc_ctl is still NULL. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833 Reported-by: Xu Wen Signed-off-by: Qu Wenruo Tested-by: Gu Jinxiang Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Ben Hutchings commit ab6d43c1c7c298cacaae180b2232fe8abc18075f Author: Kees Cook Date: Fri May 11 18:24:12 2018 +1000 video: uvesafb: Fix integer overflow in allocation commit 9f645bcc566a1e9f921bdae7528a01ced5bc3713 upstream. cmap->len can get close to INT_MAX/2, allowing for an integer overflow in allocation. This uses kmalloc_array() instead to catch the condition. Reported-by: Dr Silvio Cesare of InfoSect Fixes: 8bdb3a2d7df48 ("uvesafb: the driver core") Signed-off-by: Kees Cook Signed-off-by: Ben Hutchings commit ef16be8519f20e688cae41db04c4ad60b4fc9b22 Author: Sanjeev Sharma Date: Tue Aug 12 12:10:21 2014 +0530 uas: replace WARN_ON_ONCE() with lockdep_assert_held() commit ab945eff8396bc3329cc97274320e8d2c6585077 upstream. on some architecture spin_is_locked() always return false in uniprocessor configuration and therefore it would be advise to replace with lockdep_assert_held(). Signed-off-by: Sanjeev Sharma Acked-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 585e054220cd820aeac2436c29ff9c06e483dc83 Author: Scott Bauer Date: Thu Apr 26 11:51:08 2018 -0600 cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status commit 8f3fafc9c2f0ece10832c25f7ffcb07c97a32ad4 upstream. Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()" There is another cast from unsigned long to int which causes a bounds check to fail with specially crafted input. The value is then used as an index in the slot array in cdrom_slot_status(). Signed-off-by: Scott Bauer Signed-off-by: Scott Bauer Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings commit d7b6636fc7e90fa0944f25ee66e2ff334c7c2f19 Author: Peter Zijlstra Date: Fri Aug 3 16:41:39 2018 +0200 x86/paravirt: Fix spectre-v2 mitigations for paravirt guests commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream. Nadav reported that on guests we're failing to rewrite the indirect calls to CALLEE_SAVE paravirt functions. In particular the pv_queued_spin_unlock() call is left unpatched and that is all over the place. This obviously wrecks Spectre-v2 mitigation (for paravirt guests) which relies on not actually having indirect calls around. The reason is an incorrect clobber test in paravirt_patch_call(); this function rewrites an indirect call with a direct call to the _SAME_ function, there is no possible way the clobbers can be different because of this. Therefore remove this clobber check. Also put WARNs on the other patch failure case (not enough room for the instruction) which I've not seen trigger in my (limited) testing. Three live kernel image disassemblies for lock_sock_nested (as a small function that illustrates the problem nicely). PRE is the current situation for guests, POST is with this patch applied and NATIVE is with or without the patch for !guests. PRE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 End of assembler dump. POST: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock> 0xffffffff817be9a5 <+53>: xchg %ax,%ax 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 End of assembler dump. NATIVE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: movb $0x0,(%rdi) 0xffffffff817be9a3 <+51>: nopl 0x0(%rax) 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 End of assembler dump. Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery") Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls") Reported-by: Nadav Amit Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Reviewed-by: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: David Woodhouse Signed-off-by: Ben Hutchings commit ba4a6140b84f5a86be14c2511431004bc4b9be69 Author: Jiri Kosina Date: Thu Jul 26 13:14:55 2018 +0200 x86/speculation: Protect against userspace-userspace spectreRSB commit fdf82a7856b32d905c39afc85e34364491e46346 upstream. The article "Spectre Returns! Speculation Attacks using the Return Stack Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks, making use solely of the RSB contents even on CPUs that don't fallback to BTB on RSB underflow (Skylake+). Mitigate userspace-userspace attacks by always unconditionally filling RSB on context switch when the generic spectrev2 mitigation has been enabled. [1] https://arxiv.org/pdf/1807.07940.pdf Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Borislav Petkov Cc: David Woodhouse Cc: Peter Zijlstra Cc: Linus Torvalds Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm Signed-off-by: Ben Hutchings commit af4386c97a325887796853bffaf9797f5c6b61f9 Author: Ingo Molnar Date: Tue Feb 13 09:03:08 2018 +0100 x86/speculation: Clean up various Spectre related details commit 21e433bdb95bdf3aa48226fd3d33af608437f293 upstream. Harmonize all the Spectre messages so that a: dmesg | grep -i spectre ... gives us most Spectre related kernel boot messages. Also fix a few other details: - clarify a comment about firmware speculation control - s/KPTI/PTI - remove various line-breaks that made the code uglier Acked-by: David Woodhouse Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit c95e0783eab0d1f31c7f8baa6e4ff8b0b8e7eb72 Author: Takashi Iwai Date: Tue Jul 17 17:26:43 2018 +0200 ALSA: rawmidi: Change resized buffers atomically commit 39675f7a7c7e7702f7d5341f1e0d01db746543a0 upstream. The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the current code is racy. For example, the sequencer client may write to buffer while it being resized. As a simple workaround, let's switch to the resized buffer inside the stream runtime lock. Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com Signed-off-by: Takashi Iwai Signed-off-by: Ben Hutchings commit 189254a6aa0cc823b55e624ba77ad3bd0637bbd9 Author: Jann Horn Date: Fri Jul 6 17:12:56 2018 +0200 USB: yurex: fix out-of-bounds uaccess in read handler commit f1e255d60ae66a9f672ff9a207ee6cd8e33d2679 upstream. In general, accessing userspace memory beyond the length of the supplied buffer in VFS read/write handlers can lead to both kernel memory corruption (via kernel_read()/kernel_write(), which can e.g. be triggered via sys_splice()) and privilege escalation inside userspace. Fix it by using simple_read_from_buffer() instead of custom logic. Fixes: 6bc235a2e24a ("USB: add driver for Meywa-Denki & Kayac YUREX") Signed-off-by: Jann Horn Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit c62b25af5f51f49e9f93f828cc38a82c23e8a0c5 Author: Cong Wang Date: Fri Jun 1 11:31:44 2018 -0700 infiniband: fix a possible use-after-free bug commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 upstream. ucma_process_join() will free the new allocated "mc" struct, if there is any error after that, especially the copy_to_user(). But in parallel, ucma_leave_multicast() could find this "mc" through idr_find() before ucma_process_join() frees it, since it is already published. So "mc" could be used in ucma_leave_multicast() after it is been allocated and freed in ucma_process_join(), since we don't refcnt it. Fix this by separating "publish" from ID allocation, so that we can get an ID first and publish it later after copy_to_user(). Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Reported-by: Noam Rathaus Signed-off-by: Cong Wang Signed-off-by: Jason Gunthorpe Signed-off-by: Ben Hutchings commit 78ff116fd34c3d86db2b15b9213591c8f929111f Author: Andy Lutomirski Date: Sun Jul 22 11:05:09 2018 -0700 x86/entry/64: Remove %ebx handling from error_entry/exit commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. error_entry and error_exit communicate the user vs. kernel status of the frame using %ebx. This is unnecessary -- the information is in regs->cs. Just use regs->cs. This makes error_entry simpler and makes error_exit more robust. It also fixes a nasty bug. Before all the Spectre nonsense, the xen_failsafe_callback entry point returned like this: ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS ENCODE_FRAME_POINTER jmp error_exit And it did not go through error_entry. This was bogus: RBX contained garbage, and error_exit expected a flag in RBX. Fortunately, it generally contained *nonzero* garbage, so the correct code path was used. As part of the Spectre fixes, code was added to clear RBX to mitigate certain speculation attacks. Now, depending on kernel configuration, RBX got zeroed and, when running some Wine workloads, the kernel crashes. This was introduced by: commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") With this patch applied, RBX is no longer needed as a flag, and the problem goes away. I suspect that malicious userspace could use this bug to crash the kernel even without the offending patch applied, though. [ Historical note: I wrote this patch as a cleanup before I was aware of the bug it fixed. ] [ Note to stable maintainers: this should probably get applied to all kernels. If you're nervous about that, a more conservative fix to add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should also fix the problem. ] Reported-and-tested-by: M. Vefa Bicakci Signed-off-by: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: Dominik Brodowski Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: xen-devel@lists.xenproject.org Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org Signed-off-by: Ingo Molnar [bwh: Backported to 3.16: - error_exit moved EBX to EAX before testing it, so delete both instructions - error_exit does RESTORE_REST earlier, so adjust the offset to saved CS accordingly - Drop inapplicable comment changes - Adjust filename, context] Signed-off-by: Ben Hutchings commit 0b3369840cd61c23e2b9241093737b4c395cb406 Author: Linus Torvalds Date: Tue Jul 3 17:10:19 2018 -0700 Fix up non-directory creation in SGID directories commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream. sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: Jann Horn Cc: Andy Lutomirski Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit edb108a5510bfb55e2ceef9a9f3e21282158b462 Author: Theodore Ts'o Date: Sat Jun 16 23:41:59 2018 -0400 ext4: avoid running out of journal credits when appending to an inline file commit 8bc1379b82b8e809eef77a9fedbb75c6c297be19 upstream. Use a separate journal transaction if it turns out that we need to convert an inline file to use an data block. Otherwise we could end up failing due to not having journal credits. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 005c9f88b625b204e5f80d0241cbf38963f263bf Author: Theodore Ts'o Date: Sat Jun 16 20:21:45 2018 -0400 jbd2: don't mark block as modified if the handle is out of credits commit e09463f220ca9a1a1ecfda84fcda658f99a1f12a upstream. Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: Drop the added logging statement, as it's on a code path that doesn't exist here] Signed-off-by: Ben Hutchings commit f8d710be66f6f85084331734d7795a7fc80d99de Author: Theodore Ts'o Date: Sun Jun 17 00:41:14 2018 -0400 ext4: add more inode number paranoia checks commit c37e9e013469521d9adb932d17a1795c139b36db upstream. If there is a directory entry pointing to a system inode (such as a journal inode), complain and declare the file system to be corrupted. Also, if the superblock's first inode number field is too small, refuse to mount the file system. This addresses CVE-2018-10882. https://bugzilla.kernel.org/show_bug.cgi?id=200069 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit e2e3ff3ad042fba28c0b49e9534f3b281b105c48 Author: Theodore Ts'o Date: Fri Jun 15 12:28:16 2018 -0400 ext4: clear i_data in ext4_inode_info when removing inline data commit 6e8ab72a812396996035a37e5ca4b3b99b5d214b upstream. When converting from an inode from storing the data in-line to a data block, ext4_destroy_inline_data_nolock() was only clearing the on-disk copy of the i_blocks[] array. It was not clearing copy of the i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually used by ext4_map_blocks(). This didn't matter much if we are using extents, since the extents header would be invalid and thus the extents could would re-initialize the extents tree. But if we are using indirect blocks, the previous contents of the i_blocks array will be treated as block numbers, with potentially catastrophic results to the file system integrity and/or user data. This gets worse if the file system is using a 1k block size and s_first_data is zero, but even without this, the file system can get quite badly corrupted. This addresses CVE-2018-10881. https://bugzilla.kernel.org/show_bug.cgi?id=200015 Signed-off-by: Theodore Ts'o Signed-off-by: Ben Hutchings commit 42a6cd12f1f0728e7c09a0c1dde8f6d9e8a5fbd6 Author: Theodore Ts'o Date: Sat Jun 16 15:40:48 2018 -0400 ext4: never move the system.data xattr out of the inode body commit 8cdb5240ec5928b20490a2bb34cb87e9a5f40226 upstream. When expanding the extra isize space, we must never move the system.data xattr out of the inode body. For performance reasons, it doesn't make any sense, and the inline data implementation assumes that system.data xattr is never in the external xattr block. This addresses CVE-2018-10880 https://bugzilla.kernel.org/show_bug.cgi?id=200005 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 177c48042c7120cdecb8ba782a2b82efdc373aa1 Author: Theodore Ts'o Date: Wed Jun 13 00:51:28 2018 -0400 ext4: always verify the magic number in xattr blocks commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream. If there an inode points to a block which is also some other type of metadata block (such as a block allocation bitmap), the buffer_verified flag can be set when it was validated as that other metadata block type; however, it would make a really terrible external attribute block. The reason why we use the verified flag is to avoid constantly reverifying the block. However, it doesn't take much overhead to make sure the magic number of the xattr block is correct, and this will avoid potential crashes. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=200001 Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit ed21fb0d17c27880835f61e122c00e1c3f748467 Author: Theodore Ts'o Date: Wed Jun 13 00:23:11 2018 -0400 ext4: add corruption check in ext4_xattr_set_entry() commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream. In theory this should have been caught earlier when the xattr list was verified, but in case it got missed, it's simple enough to add check to make sure we don't overrun the xattr buffer. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=200001 Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger [bwh: Backported to 3.16: - Add inode parameter to ext4_xattr_set_entry() and update callers - Return -EIO instead of -EFSCORRUPTED on error - Adjust context] Signed-off-by: Ben Hutchings commit 9099e472af7619a380b5a02a2f607be98661281f Author: Theodore Ts'o Date: Sun Jul 8 19:35:02 2018 -0400 ext4: fix false negatives *and* false positives in ext4_check_descriptors() commit 44de022c4382541cebdd6de4465d1f4f465ff1dd upstream. Ext4_check_descriptors() was getting called before s_gdb_count was initialized. So for file systems w/o the meta_bg feature, allocation bitmaps could overlap the block group descriptors and ext4 wouldn't notice. For file systems with the meta_bg feature enabled, there was a fencepost error which would cause the ext4_check_descriptors() to incorrectly believe that the block allocation bitmap overlaps with the block group descriptor blocks, and it would reject the mount. Fix both of these problems. Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 96e340bf132e16be02fdbd6d03c4946f824c085d Author: Theodore Ts'o Date: Wed Jun 13 23:08:26 2018 -0400 ext4: make sure bitmaps and the inode table don't overlap with bg descriptors commit 77260807d1170a8cf35dbb06e07461a655f67eee upstream. It's really bad when the allocation bitmaps and the inode table overlap with the block group descriptors, since it causes random corruption of the bg descriptors. So we really want to head those off at the pass. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: Open-code sb_rdonly()] Signed-off-by: Ben Hutchings commit 5a8a94bdd577c0264b103204136f57e899797a8a Author: Theodore Ts'o Date: Thu Mar 29 22:10:35 2018 -0400 ext4: don't allow r/w mounts if metadata blocks overlap the superblock commit 18db4b4e6fc31eda838dd1c1296d67dbcb3dc957 upstream. If some metadata block, such as an allocation bitmap, overlaps the superblock, it's very likely that if the file system is mounted read/write, the results will not be pretty. So disallow r/w mounts for file systems corrupted in this particular way. Backport notes: 3.18.y is missing bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") and e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") so we simply use the sb MS_RDONLY check from pre bc98a42c1f7d in place of the sb_rdonly function used in the upstream variant of the patch. Signed-off-by: Theodore Ts'o Signed-off-by: Harsh Shandilya Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit e6eacb6555474a49b1aa29f4e98b38348d3c45fd Author: Theodore Ts'o Date: Wed Jun 13 23:00:48 2018 -0400 ext4: always check block group bounds in ext4_init_block_bitmap() commit 819b23f1c501b17b9694325471789e6b5cc2d0d2 upstream. Regardless of whether the flex_bg feature is set, we should always check to make sure the bits we are setting in the block bitmap are within the block group bounds. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 09999807edd836f8d96ca5a5b8bf007856c5f268 Author: Theodore Ts'o Date: Thu Jun 14 12:55:10 2018 -0400 ext4: verify the depth of extent tree in ext4_find_extent() commit bc890a60247171294acc0bd67d211fa4b88d40ba upstream. If there is a corupted file system where the claimed depth of the extent tree is -1, this can cause a massive buffer overrun leading to sadness. This addresses CVE-2018-10877. https://bugzilla.kernel.org/show_bug.cgi?id=199417 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: return -EIO instead of -EFSCORRUPTED] Signed-off-by: Ben Hutchings commit 2ddf6de5d4af615b83c1a681386b25637290b44e Author: Theodore Ts'o Date: Sat Jul 28 08:12:04 2018 -0400 ext4: fix check to prevent initializing reserved inodes commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream. Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is valid" will complain if block group zero does not have the EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct, since a freshly created file system has this flag cleared. It gets almost immediately after the file system is mounted read-write --- but the following somewhat unlikely sequence will end up triggering a false positive report of a corrupted file system: mkfs.ext4 /dev/vdc mount -o ro /dev/vdc /vdc mount -o remount,rw /dev/vdc Instead, when initializing the inode table for block group zero, test to make sure that itable_unused count is not too large, since that is the case that will result in some or all of the reserved inodes getting cleared. This fixes the failures reported by Eric Whiteney when running generic/230 and generic/231 in the the nojournal test case. Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid") Reported-by: Eric Whitney Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 9c2e1d0691bfc68ebc914043497330bd530c6ed6 Author: Theodore Ts'o Date: Thu Jun 14 00:58:00 2018 -0400 ext4: only look at the bg_flags field if it is valid commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream. The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this. Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected. This addresses CVE-2018-10876. https://bugzilla.kernel.org/show_bug.cgi?id=199403 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: - ext4_read_block_bitmap_nowait() and ext4_read_inode_bitmap() return a pointer (NULL on error) instead of an error code - Open-code sb_rdonly() - Adjust context] Signed-off-by: Ben Hutchings commit 00fe22e3f801fd5225aeecc6bf79630ec201f8e4 Author: Eric Sandeen Date: Mon Apr 16 23:07:27 2018 -0700 xfs: set format back to extents if xfs_bmap_extents_to_btree commit 2c4306f719b083d17df2963bc761777576b8ad1b upstream. If xfs_bmap_extents_to_btree fails in a mode where we call xfs_iroot_realloc(-1) to de-allocate the root, set the format back to extents. Otherwise we can assume we can dereference ifp->if_broot based on the XFS_DINODE_FMT_BTREE format, and crash. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423 Signed-off-by: Eric Sandeen Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: - Only one failure path needs to be patched - Adjust filename] Signed-off-by: Ben Hutchings commit 0643adfa36b54ea5948e48383d8549ac5c2fb69e Author: Jason Yan Date: Thu Mar 8 10:34:53 2018 +0800 scsi: libsas: defer ata device eh commands to libata commit 318aaf34f1179b39fa9c30fa0f3288b645beee39 upstream. When ata device doing EH, some commands still attached with tasks are not passed to libata when abort failed or recover failed, so libata did not handle these commands. After these commands done, sas task is freed, but ata qc is not freed. This will cause ata qc leak and trigger a warning like below: WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037 ata_eh_finish+0xb4/0xcc CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1 ...... Call trace: [] ata_eh_finish+0xb4/0xcc [] ata_do_eh+0xc4/0xd8 [] ata_std_error_handler+0x44/0x8c [] ata_scsi_port_error_handler+0x480/0x694 [] async_sas_ata_eh+0x4c/0x80 [] async_run_entry_fn+0x4c/0x170 [] process_one_work+0x144/0x390 [] worker_thread+0x144/0x418 [] kthread+0x10c/0x138 [] ret_from_fork+0x10/0x18 If ata qc leaked too many, ata tag allocation will fail and io blocked for ever. As suggested by Dan Williams, defer ata device commands to libata and merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle ata qcs correctly after this. Signed-off-by: Jason Yan CC: Xiaofei Tan CC: John Garry CC: Dan Williams Reviewed-by: Dan Williams Signed-off-by: Martin K. Petersen Signed-off-by: Ben Hutchings commit 556fa3e5feba266ebfb14df4509ef0a69b0b1f24 Author: Mark Salyzyn Date: Tue Jul 31 15:02:13 2018 -0700 Bluetooth: hidp: buffer overflow in hidp_process_report commit 7992c18810e568b95c869b227137a2215702a805 upstream. CVE-2018-9363 The buffer length is unsigned at all layers, but gets cast to int and checked in hidp_process_report and can lead to a buffer overflow. Switch len parameter to unsigned int to resolve issue. This affects 3.18 and newer kernels. Signed-off-by: Mark Salyzyn Fixes: a4b1b5877b514b276f0f31efe02388a9c2836728 ("HID: Bluetooth: hidp: make sure input buffers are big enough") Cc: Marcel Holtmann Cc: Johan Hedberg Cc: "David S. Miller" Cc: Kees Cook Cc: Benjamin Tissoires Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: security@kernel.org Cc: kernel-team@android.com Acked-by: Kees Cook Signed-off-by: Marcel Holtmann Signed-off-by: Ben Hutchings commit 582802e7c617cfb07cc15f280c128e6decbc57b8 Author: Alexander Potapenko Date: Fri May 18 16:23:18 2018 +0200 scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() commit a45b599ad808c3c982fdcdc12b0b8611c2f92824 upstream. This shall help avoid copying uninitialized memory to the userspace when calling ioctl(fd, SG_IO) with an empty command. Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com Signed-off-by: Alexander Potapenko Acked-by: Douglas Gilbert Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Ben Hutchings commit 63bd05e42208647417f421504ea70db00f046d21 Author: Shankara Pailoor Date: Tue Jun 5 08:33:27 2018 -0500 jfs: Fix inconsistency between memory allocation and ea_buf->max_size commit 92d34134193e5b129dc24f8d79cb9196626e8d7a upstream. The code is assuming the buffer is max_size length, but we weren't allocating enough space for it. Signed-off-by: Shankara Pailoor Signed-off-by: Dave Kleikamp Signed-off-by: Ben Hutchings commit d98da66531a3b203dded83749d69dd07ca9e646a Author: Jens Axboe Date: Mon May 21 12:21:14 2018 -0600 sr: pass down correctly sized SCSI sense buffer commit f7068114d45ec55996b9040e98111afa56e010fe upstream. We're casting the CDROM layer request_sense to the SCSI sense buffer, but the former is 64 bytes and the latter is 96 bytes. As we generally allocate these on the stack, we end up blowing up the stack. Fix this by wrapping the scsi_execute() call with a properly sized sense buffer, and copying back the bits for the CDROM layer. Reported-by: Piotr Gabriel Kosinski Reported-by: Daniel Shapira Tested-by: Kees Cook Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Jens Axboe [bwh: Despite what the "Fixes" field says, a buffer overrun was already possible if the sense data was really > 64 bytes long. Backported to 3.16: - We always need to allocate a sense buffer in order to call scsi_normalize_sense() - Remove the existing conditional heap-allocation of the sense buffer] Signed-off-by: Ben Hutchings commit c1eef5daecfd48a4e85a0b4f37239b8dbfb9703a Author: Paolo Bonzini Date: Wed Jun 6 17:38:09 2018 +0200 kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access commit 3c9fa24ca7c9c47605672916491f79e8ccacb9e6 upstream. The functions that were used in the emulation of fxrstor, fxsave, sgdt and sidt were originally meant for task switching, and as such they did not check privilege levels. This is very bad when the same functions are used in the emulation of unprivileged instructions. This is CVE-2018-10853. The obvious fix is to add a new argument to ops->read_std and ops->write_std, which decides whether the access is a "system" access or should use the processor's CPL. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: Drop change in handle_ud()] Signed-off-by: Ben Hutchings commit b1632afd23734e0d565ace124df0bc9c55a7575e Author: Paolo Bonzini Date: Wed Jun 6 17:37:49 2018 +0200 KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system commit ce14e868a54edeb2e30cb7a7b104a2fc4b9d76ca upstream. Int the next patch the emulator's .read_std and .write_std callbacks will grow another argument, which is not needed in kvm_read_guest_virt and kvm_write_guest_virt_system's callers. Since we have to make separate functions, let's give the currently existing names a nicer interface, too. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: - Drop change to handle_invvpid() - Adjust context] Signed-off-by: Ben Hutchings commit 8b22be95e4e060515553551e295c48c57c9ad2c7 Author: Paolo Bonzini Date: Wed Jun 6 16:43:02 2018 +0200 KVM: x86: introduce linear_{read,write}_system commit 79367a65743975e5cac8d24d08eccc7fdae832b0 upstream. Wrap the common invocation of ctxt->ops->read_std and ctxt->ops->write_std, so as to have a smaller patch when the functions grow another argument. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 33fe75f9ef06cefb368460ecf5ff6f9594d72b33 Author: Nadav Amit Date: Mon Jun 2 18:34:04 2014 +0300 KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR commit e37a75a13cdae5deaa2ea2cbf8d55b5dd08638b6 upstream. The current implementation ignores the LDTR/TR base high 32-bits on long-mode. As a result the loaded segment descriptor may be incorrect. Signed-off-by: Nadav Amit Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 3887a9acedb443a798cf9222f08c08792cdca404 Author: Mel Gorman Date: Wed Aug 9 08:27:11 2017 +0100 futex: Remove unnecessary warning from get_futex_key commit 48fb6f4db940e92cfb16cd878cddd59ea6120d06 upstream. Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()") removed an unnecessary lock_page() with the side-effect that page->mapping needed to be treated very carefully. Two defensive warnings were added in case any assumption was missed and the first warning assumed a correct application would not alter a mapping backing a futex key. Since merging, it has not triggered for any unexpected case but Mark Rutland reported the following bug triggering due to the first warning. kernel BUG at kernel/futex.c:679! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3 Hardware name: linux,dummy-virt (DT) task: ffff80001e271780 task.stack: ffff000010908000 PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 pc : [] lr : [] pstate: 80000145 The fact that it's a bug instead of a warning was due to an unrelated arm64 problem, but the warning itself triggered because the underlying mapping changed. This is an application issue but from a kernel perspective it's a recoverable situation and the warning is unnecessary so this patch removes the warning. The warning may potentially be triggered with the following test program from Mark although it may be necessary to adjust NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the system. #include #include #include #include #include #include #include #include #define NR_FUTEX_THREADS 16 pthread_t threads[NR_FUTEX_THREADS]; void *mem; #define MEM_PROT (PROT_READ | PROT_WRITE) #define MEM_SIZE 65536 static int futex_wrapper(int *uaddr, int op, int val, const struct timespec *timeout, int *uaddr2, int val3) { syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3); } void *poll_futex(void *unused) { for (;;) { futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1); } } int main(int argc, char *argv[]) { int i; mem = mmap(NULL, MEM_SIZE, MEM_PROT, MAP_SHARED | MAP_ANONYMOUS, -1, 0); printf("Mapping @ %p\n", mem); printf("Creating futex threads...\n"); for (i = 0; i < NR_FUTEX_THREADS; i++) pthread_create(&threads[i], NULL, poll_futex, NULL); printf("Flipping mapping...\n"); for (;;) { mmap(mem, MEM_SIZE, MEM_PROT, MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0); } return 0; } Reported-and-tested-by: Mark Rutland Signed-off-by: Mel Gorman Acked-by: Peter Zijlstra (Intel) Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 862b19bc43313fadad14334760d447f715003500 Author: Mel Gorman Date: Tue Feb 9 11:15:14 2016 -0800 futex: Remove requirement for lock_page() in get_futex_key() commit 65d8fc777f6dcfee12785c057a6b57f679641c90 upstream. When dealing with key handling for shared futexes, we can drastically reduce the usage/need of the page lock. 1) For anonymous pages, the associated futex object is the mm_struct which does not require the page lock. 2) For inode based, keys, we can check under RCU read lock if the page mapping is still valid and take reference to the inode. This just leaves one rare race that requires the page lock in the slow path when examining the swapcache. Additionally realtime users currently have a problem with the page lock being contended for unbounded periods of time during futex operations. Task A get_futex_key() lock_page() ---> preempted Now any other task trying to lock that page will have to wait until task A gets scheduled back in, which is an unbound time. With this patch, we pretty much have a lockless futex_get_key(). Experiments show that this patch can boost/speedup the hashing of shared futexes with the perf futex benchmarks (which is good for measuring such change) by up to 45% when there are high (> 100) thread counts on a 60 core Westmere. Lower counts are pretty much in the noise range or less than 10%, but mid range can be seen at over 30% overall throughput (hash ops/sec). This makes anon-mem shared futexes much closer to its private counterpart. Signed-off-by: Mel Gorman [ Ported on top of thp refcount rework, changelog, comments, fixes. ] Signed-off-by: Davidlohr Bueso Reviewed-by: Thomas Gleixner Cc: Chris Mason Cc: Darren Hart Cc: Hugh Dickins Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Sebastian Andrzej Siewior Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar Signed-off-by: Chenbo Feng Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: s/READ_ONCE/ACCESS_ONCE/] Signed-off-by: Ben Hutchings commit 6a837c8c2937fba8f5c94821ca603913d72bc049 Author: Shuah Khan (Samsung OSG) Date: Tue May 15 17:57:23 2018 -0600 usbip: usbip_host: fix bad unlock balance during stub_probe() commit c171654caa875919be3c533d3518da8be5be966e upstream. stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer. This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer. usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip! [ 367.359679] ===================================== [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ #5 Not tainted [ 367.359685] ------------------------------------- [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] ================================================================== [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768 [ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5 Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 896b00bd9c66cc72ae6800ff6dba65a9e83ea5fd Author: Shuah Khan (Samsung OSG) Date: Mon May 14 20:49:58 2018 -0600 usbip: usbip_host: fix NULL-ptr deref and use-after-free errors commit 22076557b07c12086eeb16b8ce2b0b735f7a27e7 upstream. usbip_host updates device status without holding lock from stub probe, disconnect and rebind code paths. When multiple requests to import a device are received, these unprotected code paths step all over each other and drive fails with NULL-ptr deref and use-after-free errors. The driver uses a table lock to protect the busid array for adding and deleting busids to the table. However, the probe, disconnect and rebind paths get the busid table entry and update the status without holding the busid table lock. Add a new finer grain lock to protect the busid entry. This new lock will be held to search and update the busid entry fields from get_busid_idx(), add_match_busid() and del_match_busid(). match_busid_show() does the same to access the busid entry fields. get_busid_priv() changed to return the pointer to the busid entry holding the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind() call put_busid_priv() to release the busid lock before returning. This changes fixes the unprotected code paths eliminating the race conditions in updating the busid entries. Reported-by: Jakub Jirasek Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filenames, context] Signed-off-by: Ben Hutchings commit 9e45017ecab89b6dd4bd1d6bf05ce4d7b0e604fa Author: Shuah Khan (Samsung OSG) Date: Mon Apr 30 16:17:20 2018 -0600 usbip: usbip_host: run rebind from exit when module is removed commit 7510df3f29d44685bab7b1918b61a8ccd57126a9 upstream. After removing usbip_host module, devices it releases are left without a driver. For example, when a keyboard or a mass storage device are bound to usbip_host when it is removed, these devices are no longer bound to any driver. Fix it to run device_attach() from the module exit routine to restore the devices to their original drivers. This includes cleanup changes and moving device_attach() code to a common routine to be called from rebind_store() and usbip_host_exit(). Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filenames] Signed-off-by: Ben Hutchings commit 6a07695053815f39e331eb3689d52a7d0c27b0fe Author: Shuah Khan (Samsung OSG) Date: Mon Apr 30 16:17:19 2018 -0600 usbip: usbip_host: delete device from busid_table after rebind commit 1e180f167d4e413afccbbb4a421b48b2de832549 upstream. Device is left in the busid_table after unbind and rebind. Rebind initiates usb bus scan and the original driver claims the device. After rescan the device should be deleted from the busid_table as it no longer belongs to usbip_host. Fix it to delete the device after device_attach() succeeds. Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit cbb26268a26b53f3d0a92c39265ff90a43ffdaa2 Author: Shuah Khan Date: Wed Apr 11 18:13:30 2018 -0600 usbip: usbip_host: refine probe and disconnect debug msgs to be useful commit 28b68acc4a88dcf91fd1dcf2577371dc9bf574cc upstream. Refine probe and disconnect debug msgs to be useful and say what is in progress. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 9e6ed4b8de9bc1c230fd12cb3af1d682962d35bb Author: Shuah Khan Date: Thu Apr 5 16:29:04 2018 -0600 usbip: usbip_host: fix to hold parent lock for device_attach() calls commit 4bfb141bc01312a817d36627cc47c93f801c216d upstream. usbip_host calls device_attach() without holding dev->parent lock. Fix it. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 77e7f91231bf63e3c64eb59a2aaf5754eaea2e69 Author: Alexey Khoroshilov Date: Sat Nov 29 01:29:10 2014 +0300 usbip: fix error handling in stub_probe() commit 3ff67445750a84de67faaf52c6e1895cb09f2c56 upstream. If usb_hub_claim_port() fails, no resources are deallocated and if stub_add_files() fails, port is not released. The patch fixes these issues and rearranges error handling code. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Acked-by: Valentina Manea Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 51556151d0c82515934a0feb7c61f3bcad0e73d8 Author: Christoph Paasch Date: Tue Sep 26 17:38:50 2017 -0700 net: Set sk_prot_creator when cloning sockets to the right proto commit 9d538fa60bad4f7b23193c89e843797a1cf71ef3 upstream. sk->sk_prot and sk->sk_prot_creator can differ when the app uses IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one). Which is why sk_prot_creator is there to make sure that sk_prot_free() does the kmem_cache_free() on the right kmem_cache slab. Now, if such a socket gets transformed back to a listening socket (using connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through sk_clone_lock() when a new connection comes in. But sk_prot_creator will still point to the IPv6 kmem_cache (as everything got copied in sk_clone_lock()). When freeing, we will thus put this memory back into the IPv6 kmem_cache although it was allocated in the IPv4 cache. I have seen memory corruption happening because of this. With slub-debugging and MEMCG_KMEM enabled this gives the warning "cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP" A C-program to trigger this: void main(void) { int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP); int new_fd, newest_fd, client_fd; struct sockaddr_in6 bind_addr; struct sockaddr_in bind_addr4, client_addr1, client_addr2; struct sockaddr unsp; int val; memset(&bind_addr, 0, sizeof(bind_addr)); bind_addr.sin6_family = AF_INET6; bind_addr.sin6_port = ntohs(42424); memset(&client_addr1, 0, sizeof(client_addr1)); client_addr1.sin_family = AF_INET; client_addr1.sin_port = ntohs(42424); client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&client_addr2, 0, sizeof(client_addr2)); client_addr2.sin_family = AF_INET; client_addr2.sin_port = ntohs(42421); client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&unsp, 0, sizeof(unsp)); unsp.sa_family = AF_UNSPEC; bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr)); listen(fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1)); new_fd = accept(fd, NULL, NULL); close(fd); val = AF_INET; setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val)); connect(new_fd, &unsp, sizeof(unsp)); memset(&bind_addr4, 0, sizeof(bind_addr4)); bind_addr4.sin_family = AF_INET; bind_addr4.sin_port = ntohs(42421); bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4)); listen(new_fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2)); newest_fd = accept(new_fd, NULL, NULL); close(new_fd); close(client_fd); close(new_fd); } As far as I can see, this bug has been there since the beginning of the git-days. Signed-off-by: Christoph Paasch Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Cc: Thomas King Signed-off-by: Ben Hutchings commit 90d6d096a007277ef5a87b2992bf138886946426 Author: Ben Hutchings Date: Tue Jul 10 20:07:08 2018 +0100 Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU" This reverts commit 5a79e43ffa5014c020e0d0f4e383205f87b10111, which was commit 03080e5ec72740c1a62e6730f2a5f3f114f11b19 upstream, as it causes test failures. It should not have been backported to anything older than 4.16. Thanks to Alistair Strachan for debugging this. Cc: Alistair Strachan Signed-off-by: Ben Hutchings commit d4f06dfa574db2af1de3ade75fb04240a94f19dc Author: Ben Hutchings Date: Mon Jun 25 00:25:48 2018 +0100 x86/fpu: Default eagerfpu if FPU and FXSR are enabled This is a limited version of commit 58122bf1d856 "x86/fpu: Default eagerfpu=on on all CPUs". That commit revealed bugs in the use of eagerfpu together with math emulation or without the FXSR feature. Although those bugs have been fixed upstream, the fixes do not seem to be practical to backport to 3.16. The security issue that motivates using eagerfpu (CVE-2018-3665) is an information leak through speculative execution, and most CPUs lacking the FXSR feature also don't implement speculative execution. The exceptions I am aware of are the Intel Pentium Pro and AMD K6 family, which will remain vulnerable to this issue. Move the eagerfpu variable and associated initialisation into fpu_init(), since xstate_enable_boot_cpu() won't be called at all if XSAVE is disabled. Signed-off-by: Ben Hutchings Cc: Andy Lutomirski Cc: x86@kernel.org commit 44337996ef2ff5974b83baf8d04f195aacf67ce5 Author: Ingo Molnar Date: Tue May 5 11:43:02 2015 +0200 x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT commit d364a7656c1855c940dfa4baf4ebcc3c6a9e6fd2 upstream. I tried to simulate an ancient CPU via this option, and found that it still has fxsr_opt enabled, confusing the FPU code. Make the 'nofxsr' option also clear FXSR_OPT flag. Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings