From: Michael Wang Date: Wed, 5 Jun 2013 08:49:37 +0000 (+0000) Subject: cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work() X-Git-Tag: firefly_0821_release~3680^2~354^2^2~1 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=2f7021a815f20f3481c10884fe9735ce2a56db35;p=firefly-linux-kernel-4.4.55.git cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work() Jiri Kosina and Borislav Petkov reported the warning: [ 51.616759] ------------[ cut here ]------------ [ 51.621460] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60() [ 51.629638] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode [ 51.675581] CPU: 0 PID: 244 Comm: kworker/1:1 Tainted: G W 3.10.0-rc1+ #10 [ 51.683407] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013 [ 51.690901] Workqueue: events od_dbs_timer [ 51.695069] 0000000000000009 ffff88043a2f5b68 ffffffff8161441c ffff88043a2f5ba8 [ 51.702602] ffffffff8103e540 0000000000000033 0000000000000001 ffff88043d5f8000 [ 51.710136] 00000000ffff0ce1 0000000000000001 ffff88044fc4fc08 ffff88043a2f5bb8 [ 51.717691] Call Trace: [ 51.720191] [] dump_stack+0x19/0x1b [ 51.725396] [] warn_slowpath_common+0x70/0xa0 [ 51.731473] [] warn_slowpath_null+0x1a/0x20 [ 51.737378] [] native_smp_send_reschedule+0x58/0x60 [ 51.744013] [] wake_up_nohz_cpu+0x2d/0xa0 [ 51.749745] [] add_timer_on+0x8f/0x110 [ 51.755214] [] __queue_delayed_work+0x16e/0x1a0 [ 51.761470] [] ? try_to_grab_pending+0xd1/0x1a0 [ 51.767724] [] mod_delayed_work_on+0x5a/0xa0 [ 51.773719] [] gov_queue_work+0x4d/0xc0 [ 51.779271] [] od_dbs_timer+0xcb/0x170 [ 51.784734] [] process_one_work+0x1fd/0x540 [ 51.790634] [] ? process_one_work+0x192/0x540 [ 51.796711] [] worker_thread+0x122/0x380 [ 51.802350] [] ? rescuer_thread+0x320/0x320 [ 51.808264] [] kthread+0xea/0xf0 [ 51.813200] [] ? flush_kthread_worker+0x150/0x150 [ 51.819644] [] ret_from_fork+0x7c/0xb0 [ 51.918165] nouveau E[ DRM] GPU lockup - switching to software fbcon [ 51.930505] [] ? flush_kthread_worker+0x150/0x150 [ 51.936994] ---[ end trace f419538ada83b5c5 ]--- It was caused by the policy->cpus changed during the process of __gov_queue_work(), in other word, cpu offline happened. Use get/put_online_cpus() to prevent the offline from happening while __gov_queue_work() is running. [rjw: The problem has been present since recent commit 031299b (cpufreq: governors: Avoid unnecessary per cpu timer interrupts)] References: https://lkml.org/lkml/2013/6/5/88 Reported-by: Borislav Petkov Reported-and-tested-by: Jiri Kosina Signed-off-by: Michael Wang Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 5af40ad82d23..dc9b72e25c1a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "cpufreq_governor.h" @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, if (!all_cpus) { __gov_queue_work(smp_processor_id(), dbs_data, delay); } else { + get_online_cpus(); for_each_cpu(i, policy->cpus) __gov_queue_work(i, dbs_data, delay); + put_online_cpus(); } } EXPORT_SYMBOL_GPL(gov_queue_work);