commit 3cb027bd694d0366fadc522fbc431323658eebf5 Author: Greg Kroah-Hartman Date: Thu Feb 13 13:51:06 2014 -0800 Linux 3.12.11 commit b8786a04f4db6cb6dd92f28b2f81737d290ba30b Author: Adrian Hunter Date: Tue Jan 21 09:52:39 2014 +0200 mmc: sdhci-pci: Fix possibility of chip->fixes being null commit 945be38caa287b177b8c17ffaae7754cab6a658f upstream. It is possible for chip->fixes to be null. Check before dereferencing it. Signed-off-by: Adrian Hunter Signed-off-by: Chris Ball Signed-off-by: Greg Kroah-Hartman commit 1fb1e637bd971cdf1d5d4ca3a0ed6dcb657dbdc0 Author: Adrian Hunter Date: Mon Jan 13 09:49:16 2014 +0200 mmc: sdhci-pci: Fix BYT sd card getting stuck in runtime suspend commit 77a0122e0838663795651aa0beb2325156f98c09 upstream. A host controller for a SD card may need a GPIO for card detect in order to wake up from runtime suspend when a card is inserted. If that GPIO is not configured, then the host controller will not wake up. Fix that for the affected devices by not enabling runtime PM unless the GPIO is successfully set up. This affects BYT sd card host controller which had runtime PM enabled from v3.11. For completeness, the MFD sd card host controller is flagged also. The original patch before rebasing (see link below) was tested on v3.11.10 and v3.12.4 although the patch applied with some offsets and fuzz. The original patch is here: http://marc.info/?l=linux-mmc&m=138676702327057 Signed-off-by: Adrian Hunter Signed-off-by: Chris Ball Signed-off-by: Greg Kroah-Hartman commit 5c8150ef94837713601532e6db0f31d3de711f1d Author: Borislav Petkov Date: Sat Jul 20 19:00:23 2013 +0200 rtc-cmos: Add an alarm disable quirk commit d5a1c7e3fc38d9c7d629e1e47f32f863acbdec3d upstream. 41c7f7424259f ("rtc: Disable the alarm in the hardware (v2)") added the functionality to disable the RTC wake alarm when shutting down the box. However, there are at least two b0rked BIOSes we know about: https://bugzilla.novell.com/show_bug.cgi?id=812592 https://bugzilla.novell.com/show_bug.cgi?id=805740 where, when wakeup alarm is enabled in the BIOS, the machine reboots automatically right after shutdown, regardless of what wakeup time is programmed. Bisecting the issue lead to this patch so disable its functionality with a DMI quirk only for those boxes. Cc: Brecht Machiels Cc: Thomas Gleixner Cc: John Stultz Cc: Rabin Vincent Signed-off-by: Borislav Petkov [jstultz: Changed variable name for clarity, added extra dmi entry] Tested-by: Brecht Machiels Tested-by: Borislav Petkov Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit f7efe92de8f98563e1a711021f91d55f9a1de963 Author: John Stultz Date: Wed Dec 11 19:10:36 2013 -0800 timekeeping: Fix missing timekeeping_update in suspend path commit 330a1617b0a6268d427aa5922c94d082b1d3e96d upstream. Since 48cdc135d4840 (Implement a shadow timekeeper), we have to call timekeeping_update() after any adjustment to the timekeeping structure in order to make sure that any adjustments to the structure persist. In the timekeeping suspend path, we udpate the timekeeper structure, so we should be sure to update the shadow-timekeeper before releasing the timekeeping locks. Currently this isn't done. In most cases, the next time related code to run would be timekeeping_resume, which does update the shadow-timekeeper, but in an abundence of caution, this patch adds the call to timekeeping_update() in the suspend path. Cc: Sasha Levin Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit aaa5ad29bc420c04cb0df5e4d6876aa32096b7d0 Author: John Stultz Date: Tue Dec 10 17:13:35 2013 -0800 timekeeping: Fix CLOCK_TAI timer/nanosleep delays commit 04005f6011e3b504cd4d791d9769f7cb9a3b2eae upstream. A think-o in the calculation of the monotonic -> tai time offset results in CLOCK_TAI timers and nanosleeps to expire late (the latency is ~2x the tai offset). Fix this by adding the tai offset from the realtime offset instead of subtracting. Cc: Sasha Levin Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit e14bada985c4af6d192413ad72dc35758cffe286 Author: John Stultz Date: Mon Feb 10 13:07:21 2014 -0800 3.13.y: timekeeping: Fix clock_set/clock_was_set think-o In backporting 6fdda9a9c5db367130cf32df5d6618d08b89f46a (timekeeping: Avoid possible deadlock from clock_was_set_delayed), I ralized the patch had a think-o where instead of checking clock_set I accidentally typed clock_was_set (which is a function - so the conditional always is true). Upstream this was resolved in the immediately following patch 47a1b796306356f358e515149d86baf0cc6bf007 (tick/timekeeping: Call update_wall_time outside the jiffies lock). But since that patch really isn't -stable material, so this patch only pulls the name change. Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Cc: Sasha Levin Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit b6f484eae189da008d6afa7b68414bdc5d460066 Author: John Stultz Date: Tue Dec 10 17:18:18 2013 -0800 timekeeping: Avoid possible deadlock from clock_was_set_delayed commit 6fdda9a9c5db367130cf32df5d6618d08b89f46a upstream. As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [] validate_chain+0x6c3/0x7b0 [ 251.101967] [] __lock_acquire+0x4ad/0x580 [ 251.101967] [] lock_acquire+0x182/0x1d0 [ 251.101967] [] _raw_spin_lock+0x40/0x80 [ 251.101967] [] __queue_work+0x1a9/0x3f0 [ 251.101967] [] queue_work_on+0x98/0x120 [ 251.101967] [] clock_was_set_delayed+0x21/0x30 [ 251.101967] [] do_adjtimex+0x111/0x160 [ 251.101967] [] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Cc: Sasha Levin Reported-by: Sasha Levin Tested-by: Sasha Levin Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit 638d188c2444f60c9bfa24efff3e1a6e3695873b Author: John Stultz Date: Wed Dec 11 20:07:49 2013 -0800 timekeeping: Fix potential lost pv notification of time change commit 5258d3f25c76f6ab86e9333abf97a55a877d3870 upstream. In 780427f0e11 (Indicate that clock was set in the pvclock gtod notifier), logic was added to pass a CLOCK_WAS_SET notification to the pvclock notifier chain. While that patch added a action flag returned from accumulate_nsecs_to_secs(), it only uses the returned value in one location, and not in the logarithmic accumulation. This means if a leap second triggered during the logarithmic accumulation (which is most likely where it would happen), the notification that the clock was set would not make it to the pv notifiers. This patch extends the logarithmic_accumulation pass down that action flag so proper notification will occur. This patch also changes the varialbe action -> clock_set per Ingo's suggestion. Cc: Sasha Levin Cc: Thomas Gleixner Cc: Ingo Molnar Cc: David Vrabel Cc: Konrad Rzeszutek Wilk Cc: Prarit Bhargava Cc: Richard Cochran Cc: Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit 89c23e583d4f29e026b6bb4e05b12e9e85b844cb Author: John Stultz Date: Wed Dec 11 18:50:25 2013 -0800 timekeeping: Fix lost updates to tai adjustment commit f55c07607a38f84b5c7e6066ee1cfe433fa5643c upstream. Since 48cdc135d4840 (Implement a shadow timekeeper), we have to call timekeeping_update() after any adjustment to the timekeeping structure in order to make sure that any adjustments to the structure persist. Unfortunately, the updates to the tai offset via adjtimex do not trigger this update, causing adjustments to the tai offset to be made and then over-written by the previous value at the next update_wall_time() call. This patch resovles the issue by calling timekeeping_update() right after setting the tai offset. Cc: Sasha Levin Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Signed-off-by: John Stultz Signed-off-by: Greg Kroah-Hartman commit 90eecc4b9e40c13da7c04def499bfb103b966577 Author: Steven Rostedt (Red Hat) Date: Mon Jan 13 10:30:23 2014 -0500 ftrace: Have function graph only trace based on global_ops filters commit 23a8e8441a0a74dd612edf81dc89d1600bc0a3d1 upstream. Doing some different tests, I discovered that function graph tracing, when filtered via the set_ftrace_filter and set_ftrace_notrace files, does not always keep with them if another function ftrace_ops is registered to trace functions. The reason is that function graph just happens to trace all functions that the function tracer enables. When there was only one user of function tracing, the function graph tracer did not need to worry about being called by functions that it did not want to trace. But now that there are other users, this becomes a problem. For example, one just needs to do the following: # cd /sys/kernel/debug/tracing # echo schedule > set_ftrace_filter # echo function_graph > current_tracer # cat trace [..] 0) | schedule() { ------------------------------------------ 0) -0 => rcu_pre-7 ------------------------------------------ 0) ! 2980.314 us | } 0) | schedule() { ------------------------------------------ 0) rcu_pre-7 => -0 ------------------------------------------ 0) + 20.701 us | } # echo 1 > /proc/sys/kernel/stack_tracer_enabled # cat trace [..] 1) + 20.825 us | } 1) + 21.651 us | } 1) + 30.924 us | } /* SyS_ioctl */ 1) | do_page_fault() { 1) | __do_page_fault() { 1) 0.274 us | down_read_trylock(); 1) 0.098 us | find_vma(); 1) | handle_mm_fault() { 1) | _raw_spin_lock() { 1) 0.102 us | preempt_count_add(); 1) 0.097 us | do_raw_spin_lock(); 1) 2.173 us | } 1) | do_wp_page() { 1) 0.079 us | vm_normal_page(); 1) 0.086 us | reuse_swap_page(); 1) 0.076 us | page_move_anon_rmap(); 1) | unlock_page() { 1) 0.082 us | page_waitqueue(); 1) 0.086 us | __wake_up_bit(); 1) 1.801 us | } 1) 0.075 us | ptep_set_access_flags(); 1) | _raw_spin_unlock() { 1) 0.098 us | do_raw_spin_unlock(); 1) 0.105 us | preempt_count_sub(); 1) 1.884 us | } 1) 9.149 us | } 1) + 13.083 us | } 1) 0.146 us | up_read(); When the stack tracer was enabled, it enabled all functions to be traced, which now the function graph tracer also traces. This is a side effect that should not occur. To fix this a test is added when the function tracing is changed, as well as when the graph tracer is enabled, to see if anything other than the ftrace global_ops function tracer is enabled. If so, then the graph tracer calls a test trampoline that will look at the function that is being traced and compare it with the filters defined by the global_ops. As an optimization, if there's no other function tracers registered, or if the only registered function tracers also use the global ops, the function graph infrastructure will call the registered function graph callback directly and not go through the test trampoline. Fixes: d2d45c7a03a2 "tracing: Have stack_tracer use a separate list of functions" Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit b5434bb39330d699d1491c71d83728d4d8ba87ae Author: Steven Rostedt (Red Hat) Date: Mon Jan 13 12:56:21 2014 -0500 ftrace: Fix synchronization location disabling and freeing ftrace_ops commit a4c35ed241129dd142be4cadb1e5a474a56d5464 upstream. The synchronization needed after ftrace_ops are unregistered must happen after the callback is disabled from becing called by functions. The current location happens after the function is being removed from the internal lists, but not after the function callbacks were disabled, leaving the functions susceptible of being called after their callbacks are freed. This affects perf and any externel users of function tracing (LTTng and SystemTap). Fixes: cdbe61bfe704 "ftrace: Allow dynamically allocated function tracers" Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 2cd6ab0be09b3735ba2ffbd94e0d6cee04d4f7ca Author: Steven Rostedt (Red Hat) Date: Fri Nov 8 14:17:30 2013 -0500 ftrace: Synchronize setting function_trace_op with ftrace_trace_function commit 405e1d834807e51b2ebd3dea81cb51e53fb61504 upstream. ftrace_trace_function is a variable that holds what function will be called directly by the assembly code (mcount). If just a single function is registered and it handles recursion itself, then the assembly will call that function directly without any helper function. It also passes in the ftrace_op that was registered with the callback. The ftrace_op to send is stored in the function_trace_op variable. The ftrace_trace_function and function_trace_op needs to be coordinated such that the called callback wont be called with the wrong ftrace_op, otherwise bad things can happen if it expected a different op. Luckily, there's no callback that doesn't use the helper functions that requires this. But there soon will be and this needs to be fixed. Use a set_function_trace_op to store the ftrace_op to set the function_trace_op to when it is safe to do so (during the update function within the breakpoint or stop machine calls). Or if dynamic ftrace is not being used (static tracing) then we have to do a bit more synchronization when the ftrace_trace_function is set as that takes affect immediately (as oppose to dynamic ftrace doing it with the modification of the trampoline). Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 891e3d3ded86caf891e57d8e4f4993b97a96331f Author: Dave Airlie Date: Wed Feb 5 14:47:45 2014 +1000 drm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion commit 8b7ad1bb3d440da888f2a939dc870eba429b9192 upstream. I totally sign inverted my way out of this one. Reported-by: "Sabrina Dubroca" Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit 281ee70bf3e8db2ef6bb946cd45e9d8c82c2310e Author: Dave Airlie Date: Wed Feb 5 14:13:56 2014 +1000 drm/mgag200: fix typo causing bw limits to be ignored on some chips commit ec22b4aa993abbd18f5bbbcb20a1c56be3b1d38b upstream. mode->mdev otherwise the bw limits never kick in. Reported in RHEL testing. Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit d79d8ea83a4b4746e5b69c71921473127c3443eb Author: Dave Airlie Date: Thu Jan 16 14:28:22 2014 +1000 drm/mgag200: fix oops in cursor code. commit 53dac830537b51df555ba5e7ebb236705b7eaa7c upstream. In some cases we enter the cursor code with file_priv = NULL causing an oops, we also can try to unpin something that isn't pinned, and this is a good fix for it. Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit c3269a105bca3da6aee087742f4a348536151505 Author: Thomas Hellstrom Date: Thu Jan 30 10:58:19 2014 +0100 drm/vmwgfx: Fix regression caused by "drm/ttm: make ttm reservation calls behave like reservation calls" commit cf5e3413337309050c05e13dcebe85b7194a21e5 upstream. The call to ttm_eu_backoff_reservation() as part of an error path would cause a lock imbalance if the reservation ticket was not initialized. This error is easily triggered from user-space by submitting a bogus command stream. Signed-off-by: Thomas Hellstrom Reviewed-by: Jakob Bornecrantz Cc: Maarten Lankhorst Cc: Jerome Glisse Cc: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit 3cc1638ee03d9a64948626c5e36da5cd36d92c70 Author: Dave Airlie Date: Fri Jan 24 09:50:18 2014 +1000 drm: ast,cirrus,mgag200: use drm_can_sleep commit f4b4718b61d1d5a7442a4fd6863ea80c3a10e508 upstream. these 3 were checking in_interrupt but we have situations where calling vunmap under this could cause a BUG to be hit in smp_call_function_many. Use the drm_can_sleep macro instead, which should stop this path from been taken in this case. Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit edf62c0841e48520b187099ac58a73484e8a1ae5 Author: Patrik Jakobsson Date: Wed Jan 8 19:30:40 2014 +0100 drm/gma500: Lock struct_mutex around cursor updates commit 631794b44bd3dbfba37074954d5c584c9e8725f0 upstream. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64361 Signed-off-by: Patrik Jakobsson Signed-off-by: Greg Kroah-Hartman commit 1ace73498ed6eba21c608351d6b4f35de1d5de7d Author: Laurent Pinchart Date: Wed Nov 13 14:26:01 2013 +0100 drm/rcar-du: Update plane pitch in .mode_set_base() operation commit eb86301f293da3c362db729a9f40ddb25755902b upstream. When setting a new frame buffer with the mode set base operation the pitch value might change. Set the hardware plane pitch register at the same time as the plane base address in the rcar_du_plane_update_base() function to make sure the pitch value always matches the frame buffer. Signed-off-by: Laurent Pinchart Signed-off-by: Greg Kroah-Hartman commit da31f98cc7384900b1cf429fb5997ff1dee0f0df Author: Daniel Vetter Date: Mon Jan 20 08:21:54 2014 +0100 drm/gem: Always initialize the gem object in object_init commit 6ab11a2635ce988ebc2e798947beb72cf7324119 upstream. At least drm/i915 expects that the obj->dev pointer is set even in failure paths. Specifically when the shmem initialization fails we call i915_gem_object_free which needs to deref obj->base.dev to get at the slab pointer in the device private structure. And the shmem allocation can easily fail when userspace is hitting open file limits. Doing the structure init even when the shmem file allocation fails prevents this Oops. This is a regression from commit 89c8233f82d9c8af5b20e72e4a185a38a7d3c50b Author: David Herrmann Date: Thu Jul 11 11:56:32 2013 +0200 drm/gem: simplify object initialization v2: Add regression note which Chris supplied. Testcase: igt/gem_fd_exhaustion Reported-and-Suggested-by: Linus Torvalds Cc: Linus Torvalds References: http://lists.freedesktop.org/archives/intel-gfx/2014-January/038433.html Reviewed-by: David Herrmann Cc: David Herrmann Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit f793650e5e88626acb8940f51dc3977043b18147 Author: Takashi Iwai Date: Tue Jan 21 14:34:51 2014 -0800 drm/cirrus: correct register values for 16bpp commit 2510538fa000dd13a3e57b79bf073ffb1748976c upstream. When the mode is set with 16bpp on QEMU, the output gets totally broken. The culprit is the bogus register values set for 16bpp, which was likely copied from from a wrong place. Addresses https://bugzilla.novell.com/show_bug.cgi?id=799216 Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby Cc: David Airlie Signed-off-by: Andrew Morton Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit da5f952cebc8fe41b430378b27c599f0dfdcf6ac Author: Chris Wilson Date: Mon Jan 27 13:52:34 2014 +0000 drm/i915: Decouple GPU error reporting from ring initialisation commit 372fbb8e3927fc76b0f842d8eb8a798a71d8960f upstream. Currently we report through our error state only the rings that have been initialised (as detected by ring->obj). This check is done after the GPU reset and ring re-initialisation, which means that the software state may not be the same as when we captured the hardware error and we may not print out any of the vital information for debugging the hang. This (and the implied object leak) is a regression from commit 3d57e5bd1284f44e325f3a52d966259ed42f9e05 Author: Ben Widawsky Date: Mon Oct 14 10:01:36 2013 -0700 drm/i915: Do a fuller init after reset Note that we are already starting to get bug reports with incomplete error states from 3.13, which also hampers debugging userspace driver issues. v2: Prevent a NULL dereference on 830gm/845g after a GPU reset where the scratch obj may be NULL. Signed-off-by: Chris Wilson Cc: Ben Widawsky Cc: Ville Syrjälä References: https://bugs.freedesktop.org/show_bug.cgi?id=74094 Reviewed-by: Ville Syrjälä [danvet: Add a bit of fluff to make it clear we need this expedited in stable.] Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 8b839b5d48a000069a60c9c80b13a96caa62f27e Author: Stanislaw Gruszka Date: Sat Jan 25 10:13:37 2014 +0100 i915: remove pm_qos request on error commit 22accca01713b13dac386ca90b787aadf88f6551 upstream. Not removing pm qos request and free memory for it can cause crash, when some other driver use pm qos. For example, this oops: BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [] plist_add+0x5b/0xd0 Call Trace: [] pm_qos_update_target+0x125/0x1e0 [] pm_qos_add_request+0x91/0x100 [] e1000_open+0xe4/0x5b0 [e1000e] was caused by earlier i915 probe failure: [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003004 tail 00000000 start 00003000 [drm:i915_driver_load] *ERROR* failed to init modeset i915: probe of 0000:00:02.0 failed with error -5 Bug report: http://bugzilla.redhat.com/show_bug.cgi?id=1057533 Reported-by: Giandomenico De Tullio Signed-off-by: Stanislaw Gruszka [danvet: Drop unnecessary code movement.] Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 219b7536e2b46c2a3b547f62b511500b1b1cd3f3 Author: Todd Previte Date: Thu Jan 23 00:13:41 2014 -0700 drm/i915: VLV2 - Fix hotplug detect bits commit 232a6ee9af8adb185640f67fcaaa9014a9aa0573 upstream. Add new definitions for hotplug live status bits for VLV2 since they're in reverse order from the gen4x ones. Changelog: - Restored gen4 bit definitions - Added new definitions for VLV2 - Added platform check for IS_VALLEYVIEW() in dp_detect to use the correct bit defintions - Replaced a lost trailing brace for the added switch() Signed-off-by: Todd Previte Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73951 [danvet: Switch to _VLV postfix instead of prefix and regroupg comments again so that the g4x warning is right next to those defines. Also add a _G4X suffix for those special ones. Also cc stable.] Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 4e2adfbb6e2a1a153ab461dfec1348d02ac68234 Author: Akash Goel Date: Mon Jan 13 16:24:45 2014 +0530 drm/i915: Fix the offset issue for the stolen GEM objects commit ec14ba47791965d2c08e0a681ff44eacbf3c4553 upstream. The 'offset' field of the 'scatterlist' structure was wrongly programmed with the offset value from the base of stolen area, whereas this field indicates the offset from where the interested data starts within the first PAGE pointed to by 'scattterlist' structure. As a result when a new GEM object allocated from stolen area is mapped to GTT, it could lead to an overwrite of GTT entries as the page count calculation will go wrong, refer the function 'sg_page_count'. v2: Modified the commit message. (Chris) Signed-off-by: Akash Goel Reviewed-by: Jesse Barnes Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71908 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69104 Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 70982deb825ad050d8ef31f7c2401c13858a356a Author: Chris Wilson Date: Thu Jan 2 14:32:35 2014 +0000 drm/i915: Flush outstanding requests before allocating new seqno commit 304d695c3dc8eb65206b9eaf16f8d1a41510d1cf upstream. In very rare cases (such as a memory failure stress test) it is possible to fill the entire ring without emitting a request. Under this circumstance, the outstanding request is flushed and waited upon. After space on the ring is cleared, we return to emitting the new command - except that we just cleared the seqno allocated for this operation and trigger the sanity check that a request is only ever emitted with a valid seqno. The fix is to rearrange the code to make sure the allocation of the seqno for this operation is after any required flushes of outstanding operations. The bug exists since the preallocation was introduced in commit 9d7730914f4cd496e356acfab95b41075aa8eae8 Author: Chris Wilson Date: Tue Nov 27 16:22:52 2012 +0000 drm/i915: Preallocate next seqno before touching the ring Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Daniel Vetter Signed-off-by: Chris Wilson Reviewed-by: Jani Nikula Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit d148375ecc9b1d729c92c2faca300d01a63d8fe3 Author: Ilia Mirkin Date: Sat Dec 7 11:42:19 2013 -0500 drm/nouveau/falcon: use vmalloc to create firwmare copies commit 90d6db1635d5e225623af2e2e859feb607345287 upstream. Some firmware images may be large (64K), so using kmalloc memory is inappropriate for them. Use vmalloc instead, to avoid high-order allocation failures. Signed-off-by: Ilia Mirkin Signed-off-by: Greg Kroah-Hartman commit 5735d668dad54d3307b38560a254eb73c75f0350 Author: Maarten Lankhorst Date: Tue Nov 12 13:34:08 2013 +0100 drm/nouveau: fix m2mf copy to tiled gart commit ce8f7699f2b6ffe4aa8368b8d9d370875accaa5f upstream. Commit de7b7d59d54852c introduced tiled GART, but a linear copy is still performed. This may result in errors on eviction, fix it by checking tiling from memtype. Signed-off-by: Maarten Lankhorst Signed-off-by: Ben Skeggs Signed-off-by: Greg Kroah-Hartman commit 1a46dbe17b9771a38d702143d7e03b86e4377f7e Author: Mikulas Patocka Date: Mon Jan 13 19:37:54 2014 -0500 dm sysfs: fix a module unload race commit 2995fa78e423d7193f3b57835f6c1c75006a0315 upstream. This reverts commit be35f48610 ("dm: wait until embedded kobject is released before destroying a device") and provides an improved fix. The kobject release code that calls the completion must be placed in a non-module file, otherwise there is a module unload race (if the process calling dm_kobject_release is preempted and the DM module unloaded after the completion is triggered, but before dm_kobject_release returns). To fix this race, this patch moves the completion code to dm-builtin.c which is always compiled directly into the kernel if BLK_DEV_DM is selected. The patch introduces a new dm_kobject_holder structure, its purpose is to keep the completion and kobject in one place, so that it can be accessed from non-module code without the need to export the layout of struct mapped_device to that code. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit aecead3a1e5cf7698854f1c47e725e808212de61 Author: Alex Deucher Date: Tue Jan 28 23:49:37 2014 -0500 drm/radeon/dce8: workaround for atom BlankCrtc table commit 78fe9e545ce6d510b979dc2d8e14096a279fc519 upstream. Some DCE8 boards have a funky BlankCrtc table that results in a timeout when trying to blank the display. The timeout is harmless (all operations needed from the table are complete), but wastes time and is confusing to users so work around it. bug: https://bugs.freedesktop.org/show_bug.cgi?id=73420 Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 4a6df5f4f5a2108d05358e307a4033351276019c Author: Alex Deucher Date: Mon Jan 27 18:29:35 2014 -0500 drm/radeon/DCE4+: clear bios scratch dpms bit (v2) commit 6802d4bad83f50081b2788698570218aaff8d10e upstream. The BlankCrtc table in some DCE8 boards has some logic shortcuts for the vbios when this bit is set. Clear it for driver use. v2: fix typo Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73420 Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 7d63ec0a46c7c81cf97ba9b4855763e69a6ff16e Author: Alex Deucher Date: Mon Jan 27 11:54:44 2014 -0500 drm/radeon: fix DAC interrupt handling on DCE5+ commit e9a321c6b2ac954a7dbf235f419c255a424a1273 upstream. DCE5 and newer hardware only has 1 DAC. Use the correct offset. This may fix display problems on certain board configurations. Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 2fd1fcbd994ab394128c130824b27246986e531e Author: Alex Deucher Date: Mon Jan 20 11:25:35 2014 -0500 drm/radeon: add UVD support for OLAND commit 5d029339bb8ce69aeb68280c3de67d3cea456146 upstream. It seems this got dropped when we merged UVD support last year. Add this back now. Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 177489b636ca008b0aa74a880c7b409513deba3a Author: Alex Deucher Date: Thu Jan 16 18:11:47 2014 -0500 drm/radeon: set the full cache bit for fences on r7xx+ commit d45b964a22cad962d3ede1eba8d24f5cee7b2a92 upstream. Needed to properly flush the read caches for fences. Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit a544fc55ab389380433fdb2d799dbcf155661ae4 Author: Alex Deucher Date: Thu Jan 16 18:02:59 2014 -0500 drm/radeon: fix surface sync in fence on cayman (v2) commit 10e9ffae463396c5a25fdfe8a48d7c98a87f6b85 upstream. We need to set the engine bit to select the ME and also set the full cache bit. Should help stability on TN and cayman. V2: fix up surface sync in ib execute as well Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit dc4d6a8e0a1b8c89948b21820aa456e0d4bf6e4e Author: Alex Deucher Date: Mon Jan 13 16:47:05 2014 -0500 drm/radeon: disable ss on DP for DCE3.x commit d8e24525094200601236fa64a54cf73e3d682f2e upstream. Seems to cause problems with certain DP monitors. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=40699 Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 40df7953c4f0624d36f95091d1640364acc1d1e6 Author: Marek Olšák Date: Wed Jan 8 18:16:26 2014 +0100 drm/radeon: skip colorbuffer checking if COLOR_INFO.FORMAT is set to INVALID commit 56492e0fac2dbaf7735ffd66b206a90624917789 upstream. This fixes a bug which was causing rejections of valid GPU commands from userspace. Signed-off-by: Marek Olšák Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 72baebc65786fd32bbb75a2da7e116de8098e1ab Author: Malcolm Priestley Date: Tue Dec 24 13:18:46 2013 -0300 m88rs2000: set symbol rate accurately commit dd4491dfb9eb4fa3bfa7dc73ba989e69fbce2e10 upstream. Current setting of symbol rate is not very actuate causing loss of lock. Covert temp to u64 and use mclk to calculate from big number. Calculate symbol rate by dividing symbol rate by 1000 times 1 << 24 and dividing sum by mclk. Add other symbol rate settings to function registers 0xa0-0xa3. In set_frontend add changes to register 0xf1 this must be done prior call to fe_reset. Register 0x00 doesn't need a second write of 0x1 Applied after patch m88rs2000: add m88rs2000_set_carrieroffset Signed-off-by: Malcolm Priestley Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 5542fd73bb99704c555612b18afdf3aefa1c15f9 Author: Malcolm Priestley Date: Tue Dec 24 13:17:12 2013 -0300 m88rs2000: add m88rs2000_set_carrieroffset commit 06af15d1b6f45c60358feab88004472e5428f01c upstream. Set the carrier offset correctly using the default mclk values. Add function m88rs2000_get_mclk to calculate the mclk value against crystal frequency which will later be used for other functions. Add function m88rs2000_set_carrieroffset to calculate and set the offset value. variable offset becomes a signed value. Register 0x86 is set the appropriate value according to remainder value of frequency % 192857 calculation as shown. Signed-off-by: Malcolm Priestley Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit a844280b389a72f7436e5d8cb8ce22f5e686eff5 Author: Olivier Grenie Date: Thu Dec 12 09:26:22 2013 -0300 dib8000: fix regression with dib807x commit d67350f8c4e67f5eba627e1fd111f16257ca9c95 upstream. Commit 173a64cb3fcf broke support for some dib807x versions. Fix it by providing backward compatibility with the older versions. [mkrufky@linuxtv.org: conflict handling and CodingStyle fixes] Signed-off-by: Olivier Grenie Acked-by: Patrick Boettcher Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 8978b67d9929df9b28b104b64ba2bd43f7c765f8 Author: Mauro Carvalho Chehab Date: Mon Jan 13 05:59:30 2014 -0200 nxt200x: increase write buffer size commit fa1e1de6bb679f2c86da3311bbafee7eaf78f125 upstream. The buffer size on nxt200x is not enough: ... > Dec 20 10:52:04 rich kernel: [ 31.747949] nxt200x: nxt200x_writebytes: i2c wr reg=002c: len=255 is too big! ... Increase it to 256 bytes. Reported-by: Rich Freeman Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 4902b4cf93f6453ec61ffd7ae49b3c2096d9ed73 Author: Malcolm Priestley Date: Thu Dec 12 16:38:51 2013 -0300 it913x: Add support for Avermedia H335 id 0x0335 commit 17f335c304ac19d9b11814238fe8a7519d80e2ff upstream. Trivial USB ID addition for Avermedia H335. Signed-off-by: Malcolm Priestley Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit f6306cd4efd97884e4db59b76baea167bd8895df Author: Marek Szyprowski Date: Tue Dec 3 10:12:51 2013 -0300 media: s5p_mfc: remove s5p_mfc_get_node_type() function commit b80cb8dc4162bc954cc71efec192ed89f2061573 upstream. s5p_mfc_get_node_type() relies on get_index() helper function, which in turn relies on video_device index numbers assigned on driver registration. All this code is not really needed, because there is already access to respective video_device structures via common s5p_mfc_dev structure. This fixes the issues introduced by patch 1056e4388b0454917a512618c8416a98628fc9ce ("v4l2-dev: Fix race condition on __video_register_device"), which has been merged in v3.12-rc1. Signed-off-by: Marek Szyprowski Signed-off-by: Kamil Debski Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 658c2f0d080d3eb6b3938e6de57712f1951bcce2 Author: Mauro Carvalho Chehab Date: Fri Dec 13 10:35:03 2013 -0300 dib8000: make 32 bits read atomic commit 5ac64ba12aca3bef18e61c866583155a3bbf81c4 upstream. As the dvb-frontend kthread can be called anytime, it can race with some get status ioctl. So, it seems better to avoid one to race with the other while reading a 32 bits register. I can't see any other reason for having a mutex there at I2C, except to provide such kind of protection, as the I2C core already has a mutex to protect I2C transfers. Note: instead of this approach, it could eventually remove the dib8000 specific mutex for it, and either group the 4 ops into one xfer or to manually control the I2C mutex. The main advantage of the current approach is that the changes are smaller and more puntual. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Mauro Carvalho Chehab Acked-by: Patrick Boettcher Signed-off-by: Greg Kroah-Hartman commit f7c7fd5ac833aa013e046e3dcdb4195b2f5efe9e Author: Antti Palosaari Date: Mon Dec 16 21:08:04 2013 -0300 media: anysee: fix non-working E30 Combo Plus DVB-T commit c57f87e62368c33ebda11a4993380c8e5a19a5c5 upstream. PLL was attached twice to frontend0 leaving frontend1 without a tuner. frontend0 is DVB-C and frontend1 is DVB-T. Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit d7a0fbadd815376fb4a75a3de51acbc22f305468 Author: Marek Szyprowski Date: Tue Dec 3 10:14:29 2013 -0300 media: media: v4l2-dev: fix video device index assignment commit 6c3df5da67f1f53df78c7e20cd53a481dc28eade upstream. The side effect of commit 1056e4388b045 ("v4l2-dev: Fix race condition on __video_register_device") is the increased number of index value assigned on video_device registration. Before that commit video_devices were numbered from 0, after it, the indexes starts from 1, because get_index() always count the device, which is being registered. Some device drivers rely on video_device index number for internal purposes, i.e. s5p-mfc driver stopped working after that patch. This patch restores the old method of numbering the video_device indexes. Signed-off-by: Marek Szyprowski Acked-by: Sakari Ailus Acked-by: Ricardo Ribalda Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit b9627a5dd610c6d5f55f15dd7c4296e175485e7a Author: David Rientjes Date: Thu Jan 30 15:46:11 2014 -0800 mm, oom: base root bonus on current usage commit 778c14affaf94a9e4953179d3e13a544ccce7707 upstream. A 3% of system memory bonus is sometimes too excessive in comparison to other processes. With commit a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer tries to avoid killing privileged tasks by subtracting 3% of overall memory (system or cgroup) from their per-task consumption. But as a result, all root tasks that consume less than 3% of overall memory are considered equal, and so it only takes 33+ privileged tasks pushing the system out of memory for the OOM killer to do something stupid and kill dhclient or other root-owned processes. For example, on a 32G machine it can't tell the difference between the 1M agetty and the 10G fork bomb member. The changelog describes this 3% boost as the equivalent to the global overcommit limit being 3% higher for privileged tasks, but this is not the same as discounting 3% of overall memory from _every privileged task individually_ during OOM selection. Replace the 3% of system memory bonus with a 3% of current memory usage bonus. By giving root tasks a bonus that is proportional to their actual size, they remain comparable even when relatively small. In the example above, the OOM killer will discount the 1M agetty's 256 badness points down to 179, and the 10G fork bomb's 262144 points down to 183500 points and make the right choice, instead of discounting both to 0 and killing agetty because it's first in the task list. Signed-off-by: David Rientjes Reported-by: Johannes Weiner Acked-by: Johannes Weiner Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit dedd6ba4c2e82766b65efbc4681647f1b8107e2e Author: Nicholas Bellinger Date: Mon Jan 20 03:36:24 2014 +0000 iscsi-target: Fix connection reset hang with percpu_ida_alloc commit 555b270e25b0279b98083518a85f4b1da144a181 upstream. This patch addresses a bug where connection reset would hang indefinately once percpu_ida_alloc() was starved for tags, due to the fact that it always assumed uninterruptible sleep mode. So now make percpu_ida_alloc() check for signal_pending_state() for making interruptible sleep optional, and convert iscsit_allocate_cmd() to set TASK_INTERRUPTIBLE for GFP_KERNEL, or TASK_RUNNING for GFP_ATOMIC. Reported-by: Linus Torvalds Cc: Kent Overstreet Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman commit 8cf2461787d8870b637f04c3c1cc9e7b2c3ade7d Author: Kent Overstreet Date: Sun Jan 19 08:26:37 2014 +0000 percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask commit 6f6b5d1ec56acdeab0503d2b823f6f88a0af493e upstream. This patch changes percpu_ida_alloc() + callers to accept task state bitmask for prepare_to_wait() for code like target/iscsi that needs it for interruptible sleep, that is provided in a subsequent patch. It now expects TASK_UNINTERRUPTIBLE when the caller is able to sleep waiting for a new tag, or TASK_RUNNING when the caller cannot sleep, and is forced to return a negative value when no tags are available. v2 changes: - Include blk-mq + tcm_fc + vhost/scsi + target/iscsi changes - Drop signal_pending_state() call v3 changes: - Only call prepare_to_wait() + finish_wait() when != TASK_RUNNING (PeterZ) Reported-by: Linus Torvalds Cc: Linus Torvalds Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Jens Axboe Signed-off-by: Kent Overstreet Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman commit 13f048657763363335d585ec5ffc1212df77bc97 Author: Paul Bolle Date: Thu Feb 6 22:53:29 2014 +0100 mei: mei_hbm_dispatch() returns void Building hbm.o for v3.13.2 triggers a GCC warning: drivers/misc/mei/hbm.c: In function 'mei_hbm_dispatch': drivers/misc/mei/hbm.c:596:3: warning: 'return' with a value, in function returning void [enabled by default] return 0; ^ GCC is correct, obviously. So let's return void instead of zero here. Signed-off-by: Paul Bolle Acked-by: Tomas Winkler Cc: Alexander Usyskin Signed-off-by: Greg Kroah-Hartman commit bcfd15c02eebeb514eea6abfd9c41cd98aa4db46 Author: Michel Dänzer Date: Wed Jan 8 11:40:20 2014 +0900 radeon/pm: Guard access to rdev->pm.power_state array commit 370169516e736edad3b3c5aa49858058f8b55195 upstream. It's never allocated on systems without an ATOMBIOS or COMBIOS ROM. Should fix an oops I encountered while resetting the GPU after a lockup on my PowerBook with an RV350. Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 776c716b30696438465865697bba1ad8ba479e99 Author: Alex Deucher Date: Tue Jan 7 13:51:51 2014 -0500 drm/radeon/dpm: disable mclk switching on desktop RV770 commit 8097d94116d0c17e774ba4c8256e774018dc2a46 upstream. Mclk switching doesn't seem to work reliably on these cards. Most RV770 boards specify the same mclk for all performance levels anyway so in most cases, this has no affect. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73067 Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit ce5455d172e5bf743dda74a952eaf040c4db2c5f Author: Alex Deucher Date: Tue Jan 7 10:05:02 2014 -0500 drm/radeon: warn users when hw_i2c is enabled (v2) commit d195178297de9a91246519dbfa98952b70f9a9b6 upstream. The hw i2c engines are disabled by default as the current implementation is still experimental. Print a warning when users enable it so that it's obvious when the option is enabled. v2: check for non-0 rather than 1 Signed-off-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Greg Kroah-Hartman commit f78c448024f25c7121493203a09b527ee8eeae8a Author: Joe Thornber Date: Tue Jan 21 11:07:32 2014 +0000 dm space map metadata: fix bug in resizing of thin metadata commit fca028438fb903852beaf7c3fe1cd326651af57d upstream. This bug was introduced in commit 7e664b3dec431e ("dm space map metadata: fix extending the space map"). When extending a dm-thin metadata volume we: - Switch the space map into a simple bootstrap mode, which allocates all space linearly from the newly added space. - Add new bitmap entries for the new space - Increment the reference counts for those newly allocated bitmap entries - Commit changes to disk - Switch back out of bootstrap mode. But, the disk commit may allocate space itself, if so this fact will be lost when switching out of bootstrap mode. The bug exhibited itself as an error when the bitmap_root, with an erroneous ref count of 0, was subsequently decremented as part of a later disk commit. This would cause the disk commit to fail, and thinp to enter read_only mode. The metadata was not damaged (thin_check passed). The fix is to put the increments + commit into a loop, running until the commit has not allocated extra space. In practise this loop only runs twice. With this fix the following device mapper testsuite test passes: dmtest run --suite thin-provisioning -n thin_remove_works_after_resize Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 43b9f1c8cac351b8a5f9fb923a357e276e59e9c4 Author: Joe Thornber Date: Tue Jan 7 15:49:02 2014 +0000 dm space map metadata: fix extending the space map commit 7e664b3dec431eebf0c5df5ff704d6197634cf35 upstream. When extending a metadata space map we should do the first commit whilst still in bootstrap mode -- a mode where all blocks get allocated in the new area. That way the commit overhead is allocated from the newly added space. Otherwise we risk running out of space. With this fix, and the previous commit "dm space map common: make sure new space is used during extend", the following device mapper testsuite test passes: dmtest run --suite thin-provisioning -n /resize_metadata_no_io/ Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 650afddad9fefe54cc3a3daf4cb7b21b172294db Author: Joe Thornber Date: Tue Jan 7 15:47:59 2014 +0000 dm space map common: make sure new space is used during extend commit 12c91a5c2d2a8e8cc40a9552313e1e7b0a2d9ee3 upstream. When extending a low level space map we should update nr_blocks at the start so the new space is used for the index entries. Otherwise extend can fail, e.g.: sm_metadata_extend call sequence that fails: -> sm_ll_extend -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block => returns -ENOSPC because smm->begin == smm->ll.nr_blocks Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 7cc8d93049998a20834b50fd4a7bd0f14feb59b3 Author: Mikulas Patocka Date: Mon Jan 6 23:01:22 2014 -0500 dm: wait until embedded kobject is released before destroying a device commit be35f486108227e10fe5d96fd42fb2b344c59983 upstream. There may be other parts of the kernel holding a reference on the dm kobject. We must wait until all references are dropped before deallocating the mapped_device structure. The dm_kobject_release method signals that all references are dropped via completion. But dm_kobject_release doesn't free the kobject (which is embedded in the mapped_device structure). This is the sequence of operations: * when destroying a DM device, call kobject_put from dm_sysfs_exit * wait until all users stop using the kobject, when it happens the release method is called * the release method signals the completion and should return without delay * the dm device removal code that waits on the completion continues * the dm device removal code drops the dm_mod reference the device had * the dm device removal code frees the mapped_device structure that contains the kobject Using kobject this way should avoid the module unload race that was mentioned at the beginning of this thread: https://lkml.org/lkml/2014/1/4/83 Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 465c661f3e4957f1014cbc3e3dcdfd542a7b3c7a Author: Mike Snitzer Date: Fri Dec 20 14:27:28 2013 -0500 dm thin: fix set_pool_mode exposed pool operation races commit 8b64e881eb40ac8b9bfcbce068a97eef819044ee upstream. The pool mode must not be switched until after the corresponding pool process_* methods have been established. Otherwise, because set_pool_mode() isn't interlocked with the IO path for performance reasons, the IO path can end up executing process_* operations that don't match the mode. This patch eliminates problems like the following (as seen on really fast PCIe SSD storage when transitioning the pool's mode from PM_READ_ONLY to PM_WRITE): kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event. kernel: device-mapper: thin: 253:2: no free data space available. kernel: device-mapper: thin: 253:2: switching pool to read-only mode kernel: device-mapper: thin: 253:2: switching pool to write mode kernel: ------------[ cut here ]------------ kernel: WARNING: CPU: 11 PID: 7564 at drivers/md/dm-thin.c:995 handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]() ... kernel: Workqueue: dm-thin do_worker [dm_thin_pool] kernel: 00000000000003e3 ffff880308831cc8 ffffffff8152ebcb 00000000000003e3 kernel: 0000000000000000 ffff880308831d08 ffffffff8104c46c ffff88032502a800 kernel: ffff880036409000 ffff88030ec7ce00 0000000000000001 00000000ffffffc3 kernel: Call Trace: kernel: [] dump_stack+0x49/0x5e kernel: [] warn_slowpath_common+0x8c/0xc0 kernel: [] warn_slowpath_null+0x1a/0x20 kernel: [] handle_unserviceable_bio+0x146/0x160 [dm_thin_pool] kernel: [] process_bio_read_only+0x136/0x180 [dm_thin_pool] kernel: [] process_deferred_bios+0xc5/0x230 [dm_thin_pool] kernel: [] do_worker+0x51/0x60 [dm_thin_pool] kernel: [] process_one_work+0x183/0x490 kernel: [] worker_thread+0x120/0x3a0 kernel: [] ? manage_workers+0x160/0x160 kernel: [] kthread+0xce/0xf0 kernel: [] ? kthread_freezable_should_stop+0x70/0x70 kernel: [] ret_from_fork+0x7c/0xb0 kernel: [] ? kthread_freezable_should_stop+0x70/0x70 kernel: ---[ end trace 3f00528e08ffa55c ]--- kernel: device-mapper: thin: pool mode is PM_WRITE not PM_READ_ONLY like expected!? dm-thin.c:995 was the WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY); at the top of handle_unserviceable_bio(). And as the additional debugging I had conveys: the pool mode was _not_ PM_READ_ONLY like expected, it was already PM_WRITE, yet pool->process_bio was still set to process_bio_read_only(). Also, while fixing this up, reduce logging of redundant pool mode transitions by checking new_mode is different from old_mode. Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 9d673627997147ab50c384a5c5b759a5f95d6f38 Author: Mike Snitzer Date: Tue Dec 17 13:19:11 2013 -0500 dm thin: initialize dm_thin_new_mapping returned by get_next_mapping commit 16961b042db8cc5cf75d782b4255193ad56e1d4f upstream. As additional members are added to the dm_thin_new_mapping structure care should be taken to make sure they get initialized before use. Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman commit 0905e88ed5ec3eaa249b7693e4a06fa9f38b9c1a Author: Joe Thornber Date: Tue Dec 17 12:09:40 2013 -0500 dm thin: fix discard support to a previously shared block commit 19fa1a6756ed9e92daa9537c03b47d6b55cc2316 upstream. If a snapshot is created and later deleted the origin dm_thin_device's snapshotted_time will have been updated to reflect the snapshot's creation time. The 'shared' flag in the dm_thin_lookup_result struct returned from dm_thin_find_block() is an approximation based on snapshotted_time -- this is done to avoid 0(n), or worse, time complexity. In this case, the shared flag would be true. But because the 'shared' flag reflects an approximation a block can be incorrectly assumed to be shared (e.g. false positive for 'shared' because the snapshot no longer exists). This could result in discards issued to a thin device not being passed down to the pool's underlying data device. To fix this we double check that a thin block is really still in-use after a mapping is removed using dm_pool_block_is_used(). If the reference count for a block is now zero the discard is allowed to be passed down. Also add a 'definitely_not_shared' member to the dm_thin_new_mapping structure -- reflects that the 'shared' flag in the response from dm_thin_find_block() can only be held as definitive if false is returned. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527 Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit bc2bab994cc280ba8bb03b6c10e70f00127a71dc Author: Jeff Layton Date: Sat Jan 4 07:18:03 2014 -0500 sunrpc: don't wait for write before allowing reads from use-gss-proxy file commit 1654a04cd702fd19c297c36300a6ab834cf8c072 upstream. It doesn't make much sense to make reads from this procfile hang. As far as I can tell, only gssproxy itself will open this file and it never reads from it. Change it to just give the present setting of sn->use_gss_proxy without waiting for anything. Note that we do not want to call use_gss_proxy() in this codepath since an inopportune read of this file could cause it to be disabled prematurely. Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 61b660256f542166762b562fdf7541e0f22cbb46 Author: Weston Andros Adamson Date: Tue Dec 17 12:16:11 2013 -0500 sunrpc: Fix infinite loop in RPC state machine commit 6ff33b7dd0228b7d7ed44791bbbc98b03fd15d9d upstream. When a task enters call_refreshresult with status 0 from call_refresh and !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting or max number of retries. Instead of trying forever, make use of the retry path that other errors use. This only seems to be possible when the crrefresh callback is gss_refresh_null, which only happens when destroying the context. To reproduce: 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific operations). 2) reboot - the client will be stuck and will need to be hard rebooted BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy irq event stamp: 195724 hardirqs last enabled at (195723): [] restore_args+0x0/0x30 hardirqs last disabled at (195724): [] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (195722): [] __do_softirq+0x1df/0x276 softirqs last disabled at (195717): [] irq_exit+0x53/0x9a CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: rpciod rpc_async_schedule [sunrpc] task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000 RIP: 0010:[] [] __rpc_execute+0x8a/0x362 [sunrpc] RSP: 0018:ffff880079003d18 EFLAGS: 00000246 RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007 RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900 RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7 R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900 FS: 0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0 Stack: ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000 Call Trace: [] ? trace_hardirqs_on_caller+0x145/0x1a1 [] rpc_async_schedule+0x27/0x32 [sunrpc] [] process_one_work+0x211/0x3a5 [] ? process_one_work+0x172/0x3a5 [] worker_thread+0x134/0x202 [] ? rescuer_thread+0x280/0x280 [] ? rescuer_thread+0x280/0x280 [] kthread+0xc9/0xd1 [] ? __kthread_parkme+0x61/0x61 [] ret_from_fork+0x7c/0xb0 [] ? __kthread_parkme+0x61/0x61 Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00 And the output of "rpcdebug -m rpc -s all": RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 3973a15d37dc112a1aed666f6025104e8d5f597d Author: Trond Myklebust Date: Wed Jan 29 12:12:15 2014 -0500 NFSv4: Fix a slot leak in nfs40_sequence_done commit cab92c19821a814ecf5a5279e2699bf28e66caee upstream. The check for whether or not we sent an RPC call in nfs40_sequence_done is insufficient to decide whether or not we are holding a session slot, and thus should not be used to decide when to free that slot. This patch replaces the RPC_WAS_SENT() test with the correct test for whether or not slot == NULL. Cc: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 9e246e00db332795675d65fb030bf6f50240048a Author: Boaz Harrosh Date: Wed Jan 22 20:34:54 2014 +0200 pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done commit ed7e5423014ad89720fcf315c0b73f2c5d0c7bd2 upstream. An NFS4ERR_RECALLCONFLICT is returned by server from a GET_LAYOUT only when a Server Sent a RECALL do to that GET_LAYOUT, or the RECALL and GET_LAYOUT crossed on the wire. In any way this means we want to wait at most until in-flight IO is finished and the RECALL can be satisfied. So a proper wait here is more like 1/10 of a second, not 15 seconds like we have now. In case of a server bug we delay exponentially longer on each retry. Current code totally craps out performance of very large files on most pnfs-objects layouts, because of how the map changes when the file has grown into the next raid group. [Stable: This will patch back to 3.9. If there are earlier still maintained trees, please tell me I'll send a patch] Signed-off-by: Boaz Harrosh Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit c39e05efd61f3159ab93d6f6abc55388ba680df7 Author: Weston Andros Adamson Date: Sun Jan 19 22:45:36 2014 -0500 nfs4: fix discover_server_trunking use after free commit abad2fa5ba67725a3f9c376c8cfe76fbe94a3041 upstream. If clp is new (cl_count = 1) and it matches another client in nfs4_discover_server_trunking, the nfs_put_client will free clp before ->cl_preserve_clid is set. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 1ec9b4651a4a5c2e50aafb4812dec2f7f0e65b0f Author: Trond Myklebust Date: Fri Jan 17 17:03:41 2014 -0500 NFSv4.1: Handle errors correctly in nfs41_walk_client_list commit 64590daa9e0dfb3aad89e3ab9230683b76211d5b upstream. Both nfs41_walk_client_list and nfs40_walk_client_list expect the 'status' variable to be set to the value -NFS4ERR_STALE_CLIENTID if the loop fails to find a match. The problem is that the 'pos->cl_cons_state > NFS_CS_READY' changes the value of 'status', and sets it either to the value '0' (which indicates success), or to the value EINTR. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 4a3cbb28c3bbb4cb24ea59a91c2607d806818b73 Author: Scott Mayhew Date: Fri Jan 17 15:12:05 2014 -0500 nfs: always make sure page is up-to-date before extending a write to cover the entire page commit 263b4509ec4d47e0da3e753f85a39ea12d1eff24 upstream. We should always make sure the cached page is up-to-date when we're determining whether we can extend a write to cover the full page -- even if we've received a write delegation from the server. Commit c7559663 added logic to skip this check if we have a write delegation, which can lead to data corruption such as the following scenario if client B receives a write delegation from the NFS server: Client A: # echo 123456789 > /mnt/file Client B: # echo abcdefghi >> /mnt/file # cat /mnt/file 0�D0�abcdefghi Just because we hold a write delegation doesn't mean that we've read in the entire page contents. Signed-off-by: Scott Mayhew Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 0c402bf59c6eb85d4b7dfd8d49083ea675d34c96 Author: Weston Andros Adamson Date: Mon Jan 13 16:54:45 2014 -0500 nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME commit 78b19bae0813bd6f921ca58490196abd101297bd upstream. Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP by nfs4_stat_to_errno. This allows the client to mount v4.1 servers that don't support SECINFO_NO_NAME by falling back to the "guess and check" method of nfs4_find_root_sec. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 8a362bd4ca793948a688f123bc895d7aca759a55 Author: Trond Myklebust Date: Wed Dec 4 17:39:23 2013 -0500 NFSv4: OPEN must handle the NFS4ERR_IO return code correctly commit c7848f69ec4a8c03732cde5c949bd2aa711a9f4b upstream. decode_op_hdr() cannot distinguish between an XDR decoding error and the perfectly valid errorcode NFS4ERR_IO. This is normally not a problem, but for the particular case of OPEN, we need to be able to increment the NFSv4 open sequence id when the server returns a valid response. Reported-by: J Bruce Fields Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.org Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit b1d74b1836ecf6d1a4f19c7b2bd89b7c11ce101b Author: Mika Westerberg Date: Mon Jan 13 11:17:04 2014 +0200 spi/pxa2xx: initialize DMA channels to -1 to prevent inadvertent match commit 483c319188c74e82b29a0ed7a7fa7065570f2193 upstream. Commit cddb339badb0 (spi/pxa2xx: convert to dma_request_slave_channel_compat()) converted the driver to use ACPI provided DMA helpers but it forgot to initialize the platform data for the channels to -1. Failing to do so will result inadvertent match in the filter function because 0 is a valid channel number. Prevent this from happening by initializing both platform data channels correctly to -1. Fixes: cddb339badb0 (spi/pxa2xx: convert to dma_request_slave_channel_compat()) Signed-off-by: Mika Westerberg Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 24175707cb728921b5ad1cf1095d1600a07dda2e Author: Daniel Santos Date: Sun Jan 5 17:39:26 2014 -0600 spidev: fix hang when transfer_one_message fails commit e120cc0dcf2880a4c5c0a6cb27b655600a1cfa1d upstream. This corrects a problem in spi_pump_messages() that leads to an spi message hanging forever when a call to transfer_one_message() fails. This failure occurs in my MCP2210 driver when the cs_change bit is set on the last transfer in a message, an operation which the hardware does not support. Rationale Since the transfer_one_message() returns an int, we must presume that it may fail. If transfer_one_message() should never fail, it should return void. Thus, calls to transfer_one_message() should properly manage a failure. Fixes: ffbbdd21329f3 (spi: create a message queueing infrastructure) Signed-off-by: Daniel Santos Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit dd00c35310801bcdb164a40e745f7def5cef8188 Author: Jonas Gorski Date: Tue Dec 17 21:42:07 2013 +0100 spi/bcm63xx: don't substract prepend length from total length commit 86b3bde003e6bf60ccb9c09b4115b8a2f533974c upstream. The spi command must include the full message length including any prepended writes, else transfers larger than 256 bytes will be incomplete. Signed-off-by: Jonas Gorski Acked-by: Florian Fainelli Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 6b390aa3d3daef7fbe9b085a918fdf84b3decc79 Author: Ira Weiny Date: Wed Dec 18 08:41:37 2013 -0800 IB/qib: Fix QP check when looping back to/from QP1 commit 6e0ea9e6cbcead7fa8c76e3e3b9de4a50c5131c5 upstream. The GSI QP type is compatible with and should be allowed to send data to/from any UD QP. This was found when testing ibacm on the same node as an SA. Reviewed-by: Mike Marciniszyn Signed-off-by: Ira Weiny Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman commit 452dc5f39bfc2b506f656e213828808316cadf0f Author: Max Filippov Date: Wed Dec 25 05:20:36 2013 +0400 xtensa: xtfpga: fix definitions of platform devices commit a558d99263936b8a67d4eff8918745a77bfd8c31 upstream. Remove __initdata attribute, as the devices may be used after init sections are freed. Signed-off-by: Max Filippov Signed-off-by: Greg Kroah-Hartman commit f149f695629d395117d3de473b05fe71937491af Author: Boaz Harrosh Date: Thu Nov 21 17:58:08 2013 +0200 ore: Fix wrong math in allocation of per device BIO commit aad560b7f63b495f48a7232fd086c5913a676e6f upstream. At IO preparation we calculate the max pages at each device and allocate a BIO per device of that size. The calculation was wrong on some unaligned corner cases offset/length combination and would make prepare return with -ENOMEM. This would be bad for pnfs-objects that would in that case IO through MDS. And fatal for exofs were it would fail writes with EIO. Fix it by doing the proper math, that will work in all cases. (I ran a test with all possible offset/length combinations this time round). Also when reading we do not need to allocate for the parity units since we jump over them. Also lower the max_io_length to take into account the parity pages so not to allocate BIOs bigger than PAGE_SIZE Signed-off-by: Boaz Harrosh Signed-off-by: Greg Kroah-Hartman commit 9af6a8e68fd7c2469cfe50d1c982a8b8194259e8 Author: Michael Grzeschik Date: Fri Nov 29 14:14:29 2013 +0100 mtd: mxc_nand: remove duplicated ecc_stats counting commit 0566477762f9e174e97af347ee9c865f908a5647 upstream. The ecc_stats.corrected count variable will already be incremented in the above framework-layer just after this callback. Signed-off-by: Michael Grzeschik Signed-off-by: Brian Norris Signed-off-by: Greg Kroah-Hartman commit fb6071a0f2e5674bc47d55905ab1baf89284ce2d Author: Heiko Carstens Date: Fri Jan 31 07:50:36 2014 +0100 tile: remove compat_sys_lookup_dcookie declaration to fix compile error commit 5a5e75f4714a592f31e57f248b8f5c866f278b8d upstream. With commit d8d14bd09cdd ("fs/compat: fix lookup_dcookie() parameter handling") I changed the type of the len parameter of the lookup_dcookie() syscall. However I missed that there was still a stale declaration in arch/tile/.. which now causes a compile error on tile: In file included from fs/dcookies.c:28:0: include/linux/compat.h:425:17: error: conflicting types for 'compat_sys_lookup_dcookie' fs/dcookies.c:207:1: error: conflicting types for 'compat_sys_lookup_dcookie' Simply remove the declaration in the tile architecture, which is only a leftover from before the different compat lookup_dcookie() versions have been merged. The correct declaration is now in include/linux/compat.h The build error was reported by Fenguang's build bot. Signed-off-by: Heiko Carstens Acked-by: Chris Metcalf Signed-off-by: Linus Torvalds Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit e5c84a579a5351b84bff572a021152ede1057d45 Author: Heiko Carstens Date: Wed Jan 29 14:05:46 2014 -0800 fs/compat: fix lookup_dcookie() parameter handling commit d8d14bd09cddbaf0168d61af638455a26bd027ff upstream. Commit d5dc77bfeeab ("consolidate compat lookup_dcookie()") coverted all architectures to the new compat_sys_lookup_dcookie() syscall. The "len" paramater of the new compat syscall must have the type compat_size_t in order to enforce zero extension for architectures where the ABI requires that the caller of a function performed zero and/or sign extension to 64 bit of all parameters. Signed-off-by: Heiko Carstens Cc: Al Viro Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Hendrik Brueckner Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 1670f13d201e4c683a42fe0f19f8e7fc309e06df Author: Heiko Carstens Date: Wed Jan 29 14:05:44 2014 -0800 fs/compat: fix parameter handling for compat readv/writev syscalls commit dfd948e32af2e7b28bcd7a490c0a30d4b8df2a36 upstream. We got a report that the pwritev syscall does not work correctly in compat mode on s390. It turned out that with commit 72ec35163f9f ("switch compat readv/writev variants to COMPAT_SYSCALL_DEFINE") we lost the zero extension of a couple of syscall parameters because the some parameter types haven't been converted from unsigned long to compat_ulong_t. This is needed for architectures where the ABI requires that the caller of a function performed zero and/or sign extension to 64 bit of all parameters. Signed-off-by: Heiko Carstens Cc: Al Viro Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Hendrik Brueckner Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f6150c458f9645950895fa5886d07dcce218733d Author: Heiko Carstens Date: Mon Jan 27 17:07:19 2014 -0800 compat: fix sys_fanotify_mark commit 592f6b842f64e416c7598a1b97c649b34241e22d upstream. Commit 91c2e0bcae72 ("unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE") added a new unified compat fanotify_mark syscall to be used by all architectures. Unfortunately the unified version merges the split mask parameter in a wrong way: the lower and higher word got swapped. This was discovered with glibc's tst-fanotify test case. Signed-off-by: Heiko Carstens Reported-by: Andreas Krebbel Cc: "James E.J. Bottomley" Acked-by: "David S. Miller" Acked-by: Al Viro Cc: Benjamin Herrenschmidt Cc: Ingo Molnar Cc: Ralf Baechle Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 49d42873cc4427a335917bf7cc4269b9ea191b4b Author: Mark Brown Date: Mon Jan 27 00:32:14 2014 +0000 ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API commit 49a12877d2777cadcb838981c3c4f5a424aef310 upstream. There is currently no facility in ACPI to express the hookup of voltage regulators, the expectation is that the regulators that exist in the system will be handled transparently by firmware if they need software control at all. This means that if for some reason the regulator API is enabled on such a system it should assume that any supplies that devices need are provided by the system at all relevant times without any software intervention. Tell the regulator core to make this assumption by calling regulator_has_full_constraints(). Do this as soon as we know we are using ACPI so that the information is available to the regulator core as early as possible. This will cause the regulator core to pretend that there is an always on regulator supplying any supply that is requested but that has not otherwise been mapped which is the behaviour expected on a system with ACPI. Should the ability to specify regulators be added in future revisions of ACPI then once we have support for ACPI mappings in the kernel the same assumptions will apply. It is also likely that systems will default to a mode of operation which does not require any interpretation of these mappings in order to be compatible with existing operating system releases so it should remain safe to make these assumptions even if the mappings exist but are not supported by the kernel. Signed-off-by: Mark Brown Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit ac7b8b31436d5c86e4c5da354d3faf83a96b44f5 Author: Josh Triplett Date: Tue Aug 20 17:20:14 2013 -0700 turbostat: Use GCC's CPUID functions to support PIC commit 2b92865e648ce04a39fda4f903784a5d01ecb0dc upstream. turbostat uses inline assembly to call cpuid. On 32-bit x86, on systems that have certain security features enabled by default that make -fPIC the default, this causes a build error: turbostat.c: In function ‘check_cpuid’: turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’ asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx"); ^ GCC provides a header cpuid.h, containing a __get_cpuid function that works with both PIC and non-PIC. (On PIC, it saves and restores ebx around the cpuid instruction.) Use that instead. Signed-off-by: Josh Triplett Signed-off-by: Len Brown Signed-off-by: Greg Kroah-Hartman commit f5b6d821e80b7d2086a58e4954f2ac2c1bdc94d1 Author: Josh Triplett Date: Tue Aug 20 17:20:12 2013 -0700 turbostat: Don't put unprocessed uapi headers in the include path commit b731f3119de57144e16c19fd593b8daeb637843e upstream. turbostat's Makefile puts arch/x86/include/uapi/ in the include path, so that it can include from it. It isn't in general safe to include even uapi headers directly from the kernel tree without processing them through scripts/headers_install.sh, but asm/msr.h happens to work. However, that include path can break with some versions of system headers, by overriding some system headers with the unprocessed versions directly from the kernel source. For instance: In file included from /build/x86-generic/usr/include/bits/sigcontext.h:28:0, from /build/x86-generic/usr/include/signal.h:339, from /build/x86-generic/usr/include/sys/wait.h:31, from turbostat.c:27: ../../../../arch/x86/include/uapi/asm/sigcontext.h:4:28: fatal error: linux/compiler.h: No such file or directory This occurs because the system bits/sigcontext.h on that build system includes , and asm/sigcontext.h in the kernel source includes , which scripts/headers_install.sh would have filtered out. Since turbostat really only wants a single header, just include that one header rather than putting an entire directory of kernel headers on the include path. In the process, switch from msr.h to msr-index.h, since turbostat just wants the MSR numbers. Signed-off-by: Josh Triplett Signed-off-by: Len Brown Signed-off-by: Greg Kroah-Hartman commit f350eea0ce901237bf2450ef29ffc7edad461a6d Author: Li Zefan Date: Tue Sep 10 11:43:37 2013 +0800 slub: Fix calculation of cpu slabs commit 8afb1474db4701d1ab80cd8251137a3260e6913e upstream. /sys/kernel/slab/:t-0000048 # cat cpu_slabs 231 N0=16 N1=215 /sys/kernel/slab/:t-0000048 # cat slabs 145 N0=36 N1=109 See, the number of slabs is smaller than that of cpu slabs. The bug was introduced by commit 49e2258586b423684f03c278149ab46d8f8b6700 ("slub: per cpu cache for partial pages"). We should use page->pages instead of page->pobjects when calculating the number of cpu partial slabs. This also fixes the mapping of slabs and nodes. As there's no variable storing the number of total/active objects in cpu partial slabs, and we don't have user interfaces requiring those statistics, I just add WARN_ON for those cases. Acked-by: Christoph Lameter Reviewed-by: Wanpeng Li Signed-off-by: Li Zefan Signed-off-by: Pekka Enberg Signed-off-by: Greg Kroah-Hartman commit 020043ee82cf6f8b61b76cb90af73818f66e9f60 Author: Gregory CLEMENT Date: Mon Jan 20 15:59:50 2014 +0100 ARM: mvebu: Fix kernel hang in mvebu_soc_id_init() when of_iomap failed commit dc4910d9e93f8cc56b190dd8fc9e789135978216 upstream. When pci_base is accessed whereas it has not been properly mapped by of_iomap() the kernel hang. The check of this pointer made an improper use of IS_ERR() instead of comparing to NULL. This patch fix this issue. Signed-off-by: Gregory CLEMENT Reported-by: Ezequiel Garcia Fixes: 930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support) Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman commit fd4042ce17f45dda46149a235a5e620bffbcafe6 Author: Sebastian Hesselbarth Date: Thu Jan 16 09:10:31 2014 +0100 ARM: orion: provide C-style interrupt handler for MULTI_IRQ_HANDLER commit f28d7de6bd4d41774744e011141945affa127da4 upstream. DT-enabled Marvell Kirkwood and Dove SoCs make use of an irqchip driver. As expected for irqchip drivers, it uses a C-style interrupt handler and therefore selects MULTI_IRQ_HANDLER. Now, compiling a kernel with both non-DT and DT support enabled, selecting MULTI_IRQ_HANDLER will break ASM irq handler used by non-DT boards. Therefore, we provide a C-style irq handler even for non-DT boards, if MULTI_IRQ_HANDLER is set. By installing the C-style irq handler in orion_irq_init this is transparent to all non-DT board files. While the regression report was filed on Marvell Kirkwood, also Marvell Dove non-DT boards are affected and fixed by this patch. Signed-off-by: Sebastian Hesselbarth Tested-by: Ian Campbell Reported-by: Ian Campbell Fixes: 2326f04321a9 ("ARM: kirkwood: convert to DT irqchip and clocksource") Fixes: f07d73e33d0e ("ARM: dove: convert to DT irqchip and clocksource") Acked-by: Andrew Lunn Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman commit d72025f6c31b108ecce46dd163a5e178f8aa870b Author: Wolfram Sang Date: Tue Nov 26 02:16:25 2013 +0100 mmc: core: sd: implement proper support for sd3.0 au sizes commit 9288cac05405a7da406097a44721aa4004609b4d upstream. This reverts and updates commit 77776fd0a4cc541b9 ("mmc: sd: fix the maximum au_size for SD3.0"). The au_size for SD3.0 cannot be achieved by a simple bit shift, so this needs to be implemented differently. Also, don't print the warning in case of 0 since 'not defined' is different from 'invalid'. Signed-off-by: Wolfram Sang Acked-by: Jaehoon Chung Reviewed-by: H Hartley Sweeten Signed-off-by: Chris Ball Signed-off-by: Greg Kroah-Hartman commit dc00440ad684d1df39941efe0addb997dc1fea82 Author: Ludovic Desroches Date: Wed Nov 20 16:01:11 2013 +0100 mmc: atmel-mci: fix timeout errors in SDIO mode when using DMA commit 66b512eda74d59b17eac04c4da1b38d82059e6c9 upstream. With some SDIO devices, timeout errors can happen when reading data. To solve this issue, the DMA transfer has to be activated before sending the command to the device. This order is incorrect in PDC mode. So we have to take care if we are using DMA or PDC to know when to send the MMC command. Signed-off-by: Ludovic Desroches Acked-by: Nicolas Ferre Signed-off-by: Chris Ball Signed-off-by: Greg Kroah-Hartman commit 0efb36514cc718c62d6c48885b4f332a5bb07348 Author: Ray Jui Date: Sat Oct 26 11:03:44 2013 -0700 mmc: fix host release issue after discard operation commit f662ae48ae67dfd42739e65750274fe8de46240a upstream. Under function mmc_blk_issue_rq, after an MMC discard operation, the MMC request data structure may be freed in memory. Later in the same function, the check of req->cmd_flags & MMC_REQ_SPECIAL_MASK is dangerous and invalid. It causes the MMC host not to be released when it should. This patch fixes the issue by marking the special request down before the discard/flush operation. Reported by: Harold (SoonYeal) Yang Signed-off-by: Ray Jui Reviewed-by: Seungwon Jeon Acked-by: Seungwon Jeon Signed-off-by: Chris Ball Signed-off-by: Greg Kroah-Hartman commit 5fc120a504ecd25b7fd33edff8074fac3baffe74 Author: Andrey Vagin Date: Thu Jan 30 15:46:10 2014 -0800 mm: don't lose the SOFT_DIRTY flag on mprotect commit 24f91eba18bbfdb27e71a1aae5b3a61b67fcd091 upstream. The SOFT_DIRTY bit shows that the content of memory was changed after a defined point in the past. mprotect() doesn't change the content of memory, so it must not change the SOFT_DIRTY bit. This bug causes a malfunction: on the first iteration all pages are dumped. On other iterations only pages with the SOFT_DIRTY bit are dumped. So if the SOFT_DIRTY bit is cleared from a page by mistake, the page is not dumped and its content will be restored incorrectly. This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify() is called only for present pages. Fixes commit 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes tracking"). Signed-off-by: Andrey Vagin Acked-by: Cyrill Gorcunov Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Pavel Emelyanov Cc: Borislav Petkov Cc: Wen Congyang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b64ba2f3246c0a9d345f8a2cb5c8c2a58e4595f6 Author: Cyrill Gorcunov Date: Thu Jan 23 15:53:42 2014 -0800 mm: ignore VM_SOFTDIRTY on VMA merging commit 34228d473efe764d4db7c0536375f0c993e6e06a upstream. The VM_SOFTDIRTY bit affects vma merge routine: if two VMAs has all bits in vm_flags matched except dirty bit the kernel can't longer merge them and this forces the kernel to generate new VMAs instead. It finally may lead to the situation when userspace application reaches vm.max_map_count limit and get crashed in worse case | (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes | | (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error | xinit: connection to X server lost | | waiting for X server to shut down | /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup https://bugzilla.kernel.org/show_bug.cgi?id=67651 https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0 Initial problem came from missed VM_SOFTDIRTY in do_brk() routine but even if we would set up VM_SOFTDIRTY here, there is still a way to prevent VMAs from merging: one can call | echo 4 > /proc/$PID/clear_refs and clear all VM_SOFTDIRTY over all VMAs presented in memory map, then new do_brk() will try to extend old VMA and finds that dirty bit doesn't match thus new VMA will be generated. As discussed with Pavel, the right approach should be to ignore VM_SOFTDIRTY bit when we're trying to merge VMAs and if merge successed we mark extended VMA with dirty bit where needed. Signed-off-by: Cyrill Gorcunov Reported-by: Bastian Hougaard Reported-by: Mel Gorman Cc: Pavel Emelyanov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 86e770e1b67c6fd6ee2de745210335211c65f839 Author: Michal Hocko Date: Thu Jan 23 15:53:37 2014 -0800 memcg: fix css reference leak and endless loop in mem_cgroup_iter commit 0eef615665ede1e0d603ea9ecca88c1da6f02234 upstream. Commit 19f39402864e ("memcg: simplify mem_cgroup_iter") has reorganized mem_cgroup_iter code in order to simplify it. A part of that change was dropping an optimization which didn't call css_tryget on the root of the walked tree. The patch however didn't change the css_put part in mem_cgroup_iter which excludes root. This wasn't an issue at the time because __mem_cgroup_iter_next bailed out for root early without taking a reference as cgroup iterators (css_next_descendant_pre) didn't visit root themselves. Nevertheless cgroup iterators have been reworked to visit root by commit bd8815a6d802 ("cgroup: make css_for_each_descendant() and friends include the origin css in the iteration") when the root bypass have been dropped in __mem_cgroup_iter_next. This means that css_put is not called for root and so css along with mem_cgroup and other cgroup internal object tied by css lifetime are never freed. Fix the issue by reintroducing root check in __mem_cgroup_iter_next and do not take css reference for it. This reference counting magic protects us also from another issue, an endless loop reported by Hugh Dickins when reclaim races with root removal and css_tryget called by iterator internally would fail. There would be no other nodes to visit so __mem_cgroup_iter_next would return NULL and mem_cgroup_iter would interpret it as "start looping from root again" and so mem_cgroup_iter would loop forever internally. Signed-off-by: Michal Hocko Reported-by: Hugh Dickins Tested-by: Hugh Dickins Cc: Johannes Weiner Cc: Greg Thelen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b0a1f4eca939c37031ca7f91b7e4ef12f6c55e2a Author: Michal Hocko Date: Thu Jan 23 15:53:35 2014 -0800 memcg: fix endless loop caused by mem_cgroup_iter commit ecc736fc3c71c411a9d201d8588c9e7e049e5d8c upstream. Hugh has reported an endless loop when the hardlimit reclaim sees the same group all the time. This might happen when the reclaim races with the memcg removal. shrink_zone [rmdir root] mem_cgroup_iter(root, NULL, reclaim) // prev = NULL rcu_read_lock() mem_cgroup_iter_load last_visited = iter->last_visited // gets root || NULL css_tryget(last_visited) // failed last_visited = NULL [1] memcg = root = __mem_cgroup_iter_next(root, NULL) mem_cgroup_iter_update iter->last_visited = root; reclaim->generation = iter->generation mem_cgroup_iter(root, root, reclaim) // prev = root rcu_read_lock mem_cgroup_iter_load last_visited = iter->last_visited // gets root css_tryget(last_visited) // failed [1] The issue seemed to be introduced by commit 5f5781619718 ("memcg: relax memcg iter caching") which has replaced unconditional css_get/css_put by css_tryget/css_put for the cached iterator. This patch fixes the issue by skipping css_tryget on the root of the tree walk in mem_cgroup_iter_load and symmetrically doesn't release it in mem_cgroup_iter_update. Signed-off-by: Michal Hocko Reported-by: Hugh Dickins Tested-by: Hugh Dickins Cc: Johannes Weiner Cc: Greg Thelen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 414f6b9f9fa13c636c7dc136f958cd1911fa62d0 Author: Johannes Weiner Date: Wed Jan 29 14:05:41 2014 -0800 mm/page-writeback.c: do not count anon pages as dirtyable memory commit a1c3bfb2f67ef766de03f1f56bdfff9c8595ab14 upstream. The VM is currently heavily tuned to avoid swapping. Whether that is good or bad is a separate discussion, but as long as the VM won't swap to make room for dirty cache, we can not consider anonymous pages when calculating the amount of dirtyable memory, the baseline to which dirty_background_ratio and dirty_ratio are applied. A simple workload that occupies a significant size (40+%, depending on memory layout, storage speeds etc.) of memory with anon/tmpfs pages and uses the remainder for a streaming writer demonstrates this problem. In that case, the actual cache pages are a small fraction of what is considered dirtyable overall, which results in an relatively large portion of the cache pages to be dirtied. As kswapd starts rotating these, random tasks enter direct reclaim and stall on IO. Only consider free pages and file pages dirtyable. Signed-off-by: Johannes Weiner Reported-by: Tejun Heo Tested-by: Tejun Heo Reviewed-by: Rik van Riel Cc: Mel Gorman Cc: Wu Fengguang Reviewed-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 69b896d68ac85ba0f8524ea035314bda43353b12 Author: Johannes Weiner Date: Wed Jan 29 14:05:39 2014 -0800 mm/page-writeback.c: fix dirty_balance_reserve subtraction from dirtyable memory commit a804552b9a15c931cfc2a92a2e0aed1add8b580a upstream. Tejun reported stuttering and latency spikes on a system where random tasks would enter direct reclaim and get stuck on dirty pages. Around 50% of memory was occupied by tmpfs backed by an SSD, and another disk (rotating) was reading and writing at max speed to shrink a partition. : The problem was pretty ridiculous. It's a 8gig machine w/ one ssd and 10k : rpm harddrive and I could reliably reproduce constant stuttering every : several seconds for as long as buffered IO was going on on the hard drive : either with tmpfs occupying somewhere above 4gig or a test program which : allocates about the same amount of anon memory. Although swap usage was : zero, turning off swap also made the problem go away too. : : The trigger conditions seem quite plausible - high anon memory usage w/ : heavy buffered IO and swap configured - and it's highly likely that this : is happening in the wild too. (this can happen with copying large files : to usb sticks too, right?) This patch (of 2): The dirty_balance_reserve is an approximation of the fraction of free pages that the page allocator does not make available for page cache allocations. As a result, it has to be taken into account when calculating the amount of "dirtyable memory", the baseline to which dirty_background_ratio and dirty_ratio are applied. However, currently the reserve is subtracted from the sum of free and reclaimable pages, which is non-sensical and leads to erroneous results when the system is dominated by unreclaimable pages and the dirty_balance_reserve is bigger than free+reclaimable. In that case, at least the already allocated cache should be considered dirtyable. Fix the calculation by subtracting the reserve from the amount of free pages, then adding the reclaimable pages on top. [akpm@linux-foundation.org: fix CONFIG_HIGHMEM build] Signed-off-by: Johannes Weiner Reported-by: Tejun Heo Tested-by: Tejun Heo Reviewed-by: Rik van Riel Cc: Mel Gorman Cc: Wu Fengguang Reviewed-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit ab9682ffb14f69c7547f5f13cd8cf5f37a76747a Author: Hugh Dickins Date: Thu Jan 23 15:53:32 2014 -0800 mm/memcg: iteration skip memcgs not yet fully initialized commit d8ad30559715ce97afb7d1a93a12fd90e8fff312 upstream. It is surprising that the mem_cgroup iterator can return memcgs which have not yet been fully initialized. By accident (or trial and error?) this appears not to present an actual problem; but it may be better to prevent such surprises, by skipping memcgs not yet online. Signed-off-by: Hugh Dickins Cc: Tejun Heo Acked-by: Michal Hocko Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4d712df1169cf3e3cd08395e25eea9fcbe39e354 Author: Naoya Horiguchi Date: Thu Jan 23 15:53:14 2014 -0800 mm/memory-failure.c: shift page lock from head page to tail page after thp split commit 54b9dd14d09f24927285359a227aa363ce46089e upstream. After thp split in hwpoison_user_mappings(), we hold page lock on the raw error page only between try_to_unmap, hence we are in danger of race condition. I found in the RHEL7 MCE-relay testing that we have "bad page" error when a memory error happens on a thp tail page used by qemu-kvm: Triggering MCE exception on CPU 10 mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 10 MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption MCE 0x38c535: dirty LRU page recovery: Recovered qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000] BUG: Bad page state in process qemu-kvm pfn:38c400 page:ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00 page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked) Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ... CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1 Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011 Call Trace: dump_stack+0x19/0x1b bad_page.part.59+0xcf/0xe8 free_pages_prepare+0x148/0x160 free_hot_cold_page+0x31/0x140 free_hot_cold_page_list+0x46/0xa0 release_pages+0x1c1/0x200 free_pages_and_swap_cache+0xad/0xd0 tlb_flush_mmu.part.46+0x4c/0x90 tlb_finish_mmu+0x55/0x60 exit_mmap+0xcb/0x170 mmput+0x67/0xf0 vhost_dev_cleanup+0x231/0x260 [vhost_net] vhost_net_release+0x3f/0x90 [vhost_net] __fput+0xe9/0x270 ____fput+0xe/0x10 task_work_run+0xc4/0xe0 do_exit+0x2bb/0xa40 do_group_exit+0x3f/0xa0 get_signal_to_deliver+0x1d0/0x6e0 do_signal+0x48/0x5e0 do_notify_resume+0x71/0xc0 retint_signal+0x48/0x8c The reason of this bug is that a page fault happens before unlocking the head page at the end of memory_failure(). This strange page fault is trying to access to address 0x20 and I'm not sure why qemu-kvm does this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the way we catch the bad page bug/warning because we try to free a locked page (which was the former head page.) To fix this, this patch suggests to shift page lock from head page to tail page just after thp split. SIGSEGV still happens, but it affects only error affected VMs, not a whole system. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Wanpeng Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d7c80b2d79624efd8fb6c2bfb44bd41ea2831cd4 Author: Konrad Rzeszutek Wilk Date: Tue Nov 26 15:05:40 2013 -0500 xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4). commit 51c71a3bbaca868043cc45b3ad3786dd48a90235 upstream. The user has the option of disabling the platform driver: 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) which is used to unplug the emulated drivers (IDE, Realtek 8169, etc) and allow the PV drivers to take over. If the user wishes to disable that they can set: xen_platform_pci=0 (in the guest config file) or xen_emul_unplug=never (on the Linux command line) except it does not work properly. The PV drivers still try to load and since the Xen platform driver is not run - and it has not initialized the grant tables, most of the PV drivers stumble upon: input: Xen Virtual Keyboard as /devices/virtual/input/input5 input: Xen Virtual Pointer as /devices/virtual/input/input6M ------------[ cut here ]------------ kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206! invalid opcode: 0000 [#1] SMP Modules linked in: xen_kbdfront(+) xenfs xen_privcmd CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1 Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013 RIP: 0010:[] [] get_free_entries+0x2e0/0x300 Call Trace: [] ? evdev_connect+0x1e3/0x240 [] gnttab_grant_foreign_access+0x2e/0x70 [] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront] [] xenkbd_probe+0x2f2/0x324 [xen_kbdfront] [] xenbus_dev_probe+0x77/0x130 [] xenbus_frontend_dev_probe+0x47/0x50 [] driver_probe_device+0x89/0x230 [] __driver_attach+0x9b/0xa0 [] ? driver_probe_device+0x230/0x230 [] ? driver_probe_device+0x230/0x230 [] bus_for_each_dev+0x8c/0xb0 [] driver_attach+0x19/0x20 [] bus_add_driver+0x1a0/0x220 [] driver_register+0x5f/0xf0 [] xenbus_register_driver_common+0x15/0x20 [] xenbus_register_frontend+0x23/0x40 [] ? 0xffffffffa0014fff [] xenkbd_init+0x2b/0x1000 [xen_kbdfront] [] do_one_initcall+0x49/0x170 .. snip.. which is hardly nice. This patch fixes this by having each PV driver check for: - if running in PV, then it is fine to execute (as that is their native environment). - if running in HVM, check if user wanted 'xen_emul_unplug=never', in which case bail out and don't load any PV drivers. - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci) does not exist, then bail out and not load PV drivers. - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks', then bail out for all PV devices _except_ the block one. Ditto for the network one ('nics'). - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary' then load block PV driver, and also setup the legacy IDE paths. In (v3) make it actually load PV drivers. Reported-by: Sander Eikelenboom Reported-and-Tested-by: Fabio Fantoni Signed-off-by: Konrad Rzeszutek Wilk [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug' can be used per Ian and Stefano suggestion] [v3: Make the unnecessary case work properly] [v4: s/disks/ide-disks/ spotted by Fabio] Reviewed-by: Stefano Stabellini Acked-by: Bjorn Helgaas [for PCI parts] Signed-off-by: Greg Kroah-Hartman commit b224b01f8779f9d9e431133a488aaf7785dceb83 Author: AKASHI Takahiro Date: Mon Jan 13 13:33:09 2014 -0800 audit: correct a type mismatch in audit_syscall_exit() commit 06bdadd7634551cfe8ce071fe44d0311b3033d9e upstream. audit_syscall_exit() saves a result of regs_return_value() in intermediate "int" variable and passes it to __audit_syscall_exit(), which expects its second argument as a "long" value. This will result in truncating the value returned by a system call and making a wrong audit record. I don't know why gcc compiler doesn't complain about this, but anyway it causes a problem at runtime on arm64 (and probably most 64-bit archs). Signed-off-by: AKASHI Takahiro Cc: Al Viro Cc: Eric Paris Signed-off-by: Andrew Morton Signed-off-by: Eric Paris Signed-off-by: Greg Kroah-Hartman commit a0779a24dda8a1f3c75fb73724ad0c3cf7235d97 Author: Richard Guy Briggs Date: Thu Sep 12 23:03:51 2013 -0400 audit: reset audit backlog wait time after error recovery commit e789e561a50de0aaa8c695662d97aaa5eac9d55f upstream. When the audit queue overflows and times out (audit_backlog_wait_time), the audit queue overflow timeout is set to zero. Once the audit queue overflow timeout condition recovers, the timeout should be reset to the original value. See also: https://lkml.org/lkml/2013/9/2/473 Signed-off-by: Luiz Capitulino Signed-off-by: Dan Duval Signed-off-by: Chuck Anderson Signed-off-by: Richard Guy Briggs Signed-off-by: Eric Paris Signed-off-by: Greg Kroah-Hartman commit 5ea9649adacf8869e759e6859480338aa6637b6c Author: Miklos Szeredi Date: Wed Jan 22 19:36:57 2014 +0100 fuse: fix pipe_buf_operations commit 28a625cbc2a14f17b83e47ef907b2658576a32aa upstream. Having this struct in module memory could Oops when if the module is unloaded while the buffer still persists in a pipe. Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal merge them into nosteal_pipe_buf_ops (this is the same as default_pipe_buf_ops except stealing the page from the buffer is not allowed). Reported-by: Al Viro Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit cdb18ebfef15079928d4c4408dc91967d0543816 Author: Bjorn Helgaas Date: Fri Jan 17 14:57:29 2014 -0700 Revert "EISA: Initialize device before its resources" commit 765ee51f9a3f652959b4c7297d198a28e37952b4 upstream. This reverts commit 26abfeed4341872364386c6a52b9acef8c81a81a. In the eisa_probe() force_probe path, if we were unable to request slot resources (e.g., [io 0x800-0x8ff]), we skipped the slot with "Cannot allocate resource for EISA slot %d" before reading the EISA signature in eisa_init_device(). Commit 26abfeed4341 moved eisa_init_device() earlier, so we tried to read the EISA signature before requesting the slot resources, and this caused hangs during boot. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1251816 Signed-off-by: Bjorn Helgaas Signed-off-by: Greg Kroah-Hartman commit 94e7d0713185dff1a5818a4346fb5326355c708d Author: Alex Williamson Date: Tue Jan 21 15:48:18 2014 -0800 intel-iommu: fix off-by-one in pagetable freeing commit 08336fd218e087cc4fcc458e6b6dcafe8702b098 upstream. dma_pte_free_level() has an off-by-one error when checking whether a pte is completely covered by a range. Take for example the case of attempting to free pfn 0x0 - 0x1ff, ie. 512 entries covering the first 2M superpage. The level_size() is 0x200 and we test: static void dma_pte_free_level(... ... if (!(0 > 0 || 0x1ff < 0 + 0x200)) { ... } Clearly the 2nd test is true, which means we fail to take the branch to clear and free the pagetable entry. As a result, we're leaking pagetables and failing to install new pages over the range. This was found with a PCI device assigned to a QEMU guest using vfio-pci without a VGA device present. The first 1M of guest address space is mapped with various combinations of 4K pages, but eventually the range is entirely freed and replaced with a 2M contiguous mapping. intel-iommu errors out with something like: ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083) In this case 5c2b8003 is the pointer to the previous leaf page that was neither freed nor cleared and 849c00083 is the superpage entry that we're trying to replace it with. Signed-off-by: Alex Williamson Cc: David Woodhouse Cc: Joerg Roedel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 550c4ce2935b242bd3c61b436444cee1f27ef8ee Author: Wanlong Gao Date: Tue Jan 21 15:48:41 2014 -0800 arch/sh/kernel/kgdb.c: add missing #include commit 53a52f17d96c8d47c79a7dafa81426317e89c7c1 upstream. arch/sh/kernel/kgdb.c: In function 'sleeping_thread_to_gdb_regs': arch/sh/kernel/kgdb.c:225:32: error: implicit declaration of function 'task_stack_page' [-Werror=implicit-function-declaration] arch/sh/kernel/kgdb.c:242:23: error: dereferencing pointer to incomplete type arch/sh/kernel/kgdb.c:243:22: error: dereferencing pointer to incomplete type arch/sh/kernel/kgdb.c: In function 'singlestep_trap_handler': arch/sh/kernel/kgdb.c:310:27: error: 'SIGTRAP' undeclared (first use in this function) arch/sh/kernel/kgdb.c:310:27: note: each undeclared identifier is reported only once for each function it appears in This was introduced by commit 16559ae48c76 ("kgdb: remove #include from kgdb.h"). [geert@linux-m68k.org: reworded and reformatted] Signed-off-by: Wanlong Gao Signed-off-by: Geert Uytterhoeven Reported-by: Fengguang Wu Acked-by: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 8c0b860ecba3c32e8c784bb6840363301f6a1fe3 Author: Steven Rostedt (Red Hat) Date: Thu Jan 23 12:27:59 2014 -0500 tracing: Check if tracing is enabled in trace_puts() commit 3132e107d608f8753240d82d61303c500fd515b4 upstream. If trace_puts() is used very early in boot up, it can crash the machine if it is called before the ring buffer is allocated. If a trace_printk() is used with no arguments, then it will be converted into a trace_puts() and suffer the same fate. Fixes: 09ae72348ecc "tracing: Add trace_puts() for even faster trace_printk() tracing" Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit f01b215f5a9e3eb922a64be93220f68b43c5b205 Author: Steven Rostedt (Red Hat) Date: Tue Jan 14 10:19:46 2014 -0500 tracing: Have trace buffer point back to trace_array commit dced341b2d4f06668efaab33f88de5d287c0f45b upstream. The trace buffer has a descriptor pointer that goes back to the trace array. But it was never assigned. Luckily, nothing uses it (yet), but it will in the future. Although nothing currently uses this, if any of the new features get backported to older kernels, and because this is such a simple change, I'm marking it for stable too. Fixes: 12883efb670c "tracing: Consolidate max_tr into main trace_array structure" Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 0b38e613ebf9199f55fc5b315d5e1ea755960196 Author: Tetsuo Handa Date: Mon Jan 6 21:28:15 2014 +0900 SELinux: Fix memory leak upon loading policy commit 8ed814602876bec9bad2649ca17f34b499357a1c upstream. Hello. I got below leak with linux-3.10.0-54.0.1.el7.x86_64 . [ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Below is a patch, but I don't know whether we need special handing for undoing ebitmap_set_bit() call. ---------- >>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Mon, 6 Jan 2014 16:30:21 +0900 Subject: SELinux: Fix memory leak upon loading policy Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not check return value from hashtab_insert() in filename_trans_read(). It leaks memory if hashtab_insert() returns error. unreferenced object 0xffff88005c9160d0 (size 8): comm "systemd", pid 1, jiffies 4294688674 (age 235.265s) hex dump (first 8 bytes): 57 0b 00 00 6b 6b 6b a5 W...kkk. backtrace: [] kmemleak_alloc+0x4e/0xb0 [] kmem_cache_alloc_trace+0x12e/0x360 [] policydb_read+0xd1d/0xf70 [] security_load_policy+0x6c/0x500 [] sel_write_load+0xac/0x750 [] vfs_write+0xc0/0x1f0 [] SyS_write+0x4c/0xa0 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff However, we should not return EEXIST error to the caller, or the systemd will show below message and the boot sequence freezes. systemd[1]: Failed to load SELinux policy. Freezing. Signed-off-by: Tetsuo Handa Acked-by: Eric Paris Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman