1 Documentation for /proc/sys/kernel/* kernel version 2.2.10
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
3 (c) 2009, Shen Feng<shen@cn.fujitsu.com>
5 For general info and legal blurb, please look in README.
7 ==============================================================
9 This file contains documentation for the sysctl files in
10 /proc/sys/kernel/ and is valid for Linux kernel version 2.2.
12 The files in this directory can be used to tune and monitor
13 miscellaneous and general things in the operation of the Linux
14 kernel. Since some of the files _can_ be used to screw up your
15 system, it is advisable to read both documentation and source
16 before actually making adjustments.
18 Currently, these files might (depending on your configuration)
19 show up in /proc/sys/kernel:
24 - bootloader_type [ X86 only ]
25 - bootloader_version [ X86 only ]
26 - callhome [ S390 only ]
37 - hung_task_check_count
38 - hung_task_timeout_secs
41 - kstack_depth_to_print [ X86 only ]
43 - modprobe ==> Documentation/debugging-modules.txt
45 - msg_next_id [ sysv ipc ]
56 - panic_on_unrecovered_nmi
57 - panic_on_stackoverflow
59 - powersave-nap [ PPC only ]
63 - printk_ratelimit_burst
65 - real-root-dev ==> Documentation/initrd.txt
66 - reboot-cmd [ SPARC only ]
70 - sem_next_id [ sysv ipc ]
71 - sg-big-buff [ generic SCSI device (sg) ]
72 - shm_next_id [ sysv ipc ]
77 - stop-a [ SPARC only ]
78 - sysrq ==> Documentation/sysrq.txt
85 ==============================================================
89 highwater lowwater frequency
91 If BSD-style process accounting is enabled these values control
92 its behaviour. If free space on filesystem where the log lives
93 goes below <lowwater>% accounting suspends. If free space gets
94 above <highwater>% accounting resumes. <Frequency> determines
95 how often do we check the amount of free space (value is in
98 That is, suspend accounting if there left <= 2% free; resume it
99 if we got >=4%; consider information about amount of free space
100 valid for 30 seconds.
102 ==============================================================
108 See Doc*/kernel/power/video.txt, it allows mode of video boot to be
111 ==============================================================
115 Enables/Disables automatic recomputing of msgmni upon memory add/remove
116 or upon ipc namespace creation/removal (see the msgmni description
117 above). Echoing "1" into this file enables msgmni automatic recomputing.
118 Echoing "0" turns it off. auto_msgmni default value is 1.
121 ==============================================================
125 x86 bootloader identification
127 This gives the bootloader type number as indicated by the bootloader,
128 shifted left by 4, and OR'd with the low four bits of the bootloader
129 version. The reason for this encoding is that this used to match the
130 type_of_loader field in the kernel header; the encoding is kept for
131 backwards compatibility. That is, if the full bootloader type number
132 is 0x15 and the full version number is 0x234, this file will contain
133 the value 340 = 0x154.
135 See the type_of_loader and ext_loader_type fields in
136 Documentation/x86/boot.txt for additional information.
138 ==============================================================
142 x86 bootloader version
144 The complete bootloader version number. In the example above, this
145 file will contain the value 564 = 0x234.
147 See the type_of_loader and ext_loader_ver fields in
148 Documentation/x86/boot.txt for additional information.
150 ==============================================================
154 Controls the kernel's callhome behavior in case of a kernel panic.
156 The s390 hardware allows an operating system to send a notification
157 to a service organization (callhome) in case of an operating system panic.
159 When the value in this file is 0 (which is the default behavior)
160 nothing happens in case of a kernel panic. If this value is set to "1"
161 the complete kernel oops message is send to the IBM customer service
162 organization in case the mainframe the Linux operating system is running
163 on has a service contract with IBM.
165 ==============================================================
169 Highest valid capability of the running kernel. Exports
170 CAP_LAST_CAP from the kernel.
172 ==============================================================
176 core_pattern is used to specify a core dumpfile pattern name.
177 . max length 128 characters; default value is "core"
178 . core_pattern is used as a pattern template for the output filename;
179 certain string patterns (beginning with '%') are substituted with
181 . backward compatibility with core_uses_pid:
182 If core_pattern does not include "%p" (default does not)
183 and core_uses_pid is set, then .PID will be appended to
185 . corename format specifiers:
186 %<NUL> '%' is dropped
189 %P global pid (init PID namespace)
192 %d dump mode, matches PR_SET_DUMPABLE and
193 /proc/sys/fs/suid_dumpable
197 %e executable filename (may be shortened)
199 %<OTHER> both are dropped
200 . If the first character of the pattern is a '|', the kernel will treat
201 the rest of the pattern as a command to run. The core dump will be
202 written to the standard input of that program instead of to a file.
204 ==============================================================
208 This sysctl is only applicable when core_pattern is configured to pipe
209 core files to a user space helper (when the first character of
210 core_pattern is a '|', see above). When collecting cores via a pipe
211 to an application, it is occasionally useful for the collecting
212 application to gather data about the crashing process from its
213 /proc/pid directory. In order to do this safely, the kernel must wait
214 for the collecting process to exit, so as not to remove the crashing
215 processes proc files prematurely. This in turn creates the
216 possibility that a misbehaving userspace collecting process can block
217 the reaping of a crashed process simply by never exiting. This sysctl
218 defends against that. It defines how many concurrent crashing
219 processes may be piped to user space applications in parallel. If
220 this value is exceeded, then those crashing processes above that value
221 are noted via the kernel log and their cores are skipped. 0 is a
222 special value, indicating that unlimited processes may be captured in
223 parallel, but that no waiting will take place (i.e. the collecting
224 process is not guaranteed access to /proc/<crashing pid>/). This
227 ==============================================================
231 The default coredump filename is "core". By setting
232 core_uses_pid to 1, the coredump filename becomes core.PID.
233 If core_pattern does not include "%p" (default does not)
234 and core_uses_pid is set, then .PID will be appended to
237 ==============================================================
241 When the value in this file is 0, ctrl-alt-del is trapped and
242 sent to the init(1) program to handle a graceful restart.
243 When, however, the value is > 0, Linux's reaction to a Vulcan
244 Nerve Pinch (tm) will be an immediate reboot, without even
245 syncing its dirty buffers.
247 Note: when a program (like dosemu) has the keyboard in 'raw'
248 mode, the ctrl-alt-del is intercepted by the program before it
249 ever reaches the kernel tty layer, and it's up to the program
250 to decide what to do with it.
252 ==============================================================
256 This toggle indicates whether unprivileged users are prevented
257 from using dmesg(8) to view messages from the kernel's log buffer.
258 When dmesg_restrict is set to (0) there are no restrictions. When
259 dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
262 The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
263 default value of dmesg_restrict.
265 ==============================================================
267 domainname & hostname:
269 These files can be used to set the NIS/YP domainname and the
270 hostname of your box in exactly the same way as the commands
271 domainname and hostname, i.e.:
272 # echo "darkstar" > /proc/sys/kernel/hostname
273 # echo "mydomain" > /proc/sys/kernel/domainname
274 has the same effect as
275 # hostname "darkstar"
276 # domainname "mydomain"
278 Note, however, that the classic darkstar.frop.org has the
279 hostname "darkstar" and DNS (Internet Domain Name Server)
280 domainname "frop.org", not to be confused with the NIS (Network
281 Information Service) or YP (Yellow Pages) domainname. These two
282 domain names are in general different. For a detailed discussion
283 see the hostname(1) man page.
285 ==============================================================
289 Path for the hotplug policy agent.
290 Default value is "/sbin/hotplug".
292 ==============================================================
296 Controls the kernel's behavior when a hung task is detected.
297 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
299 0: continue operation. This is the default behavior.
301 1: panic immediately.
303 ==============================================================
305 hung_task_check_count:
307 The upper bound on the number of tasks that are checked.
308 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
310 ==============================================================
312 hung_task_timeout_secs:
314 Check interval. When a task in D state did not get scheduled
315 for more than this value report a warning.
316 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
318 0: means infinite timeout - no checking done.
320 ==============================================================
324 The maximum number of warnings to report. During a check interval
325 When this value is reached, no more the warnings will be reported.
326 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
328 -1: report an infinite number of warnings.
330 ==============================================================
334 This toggle indicates whether restrictions are placed on
335 exposing kernel addresses via /proc and other interfaces.
337 When kptr_restrict is set to (0), the default, there are no restrictions.
339 When kptr_restrict is set to (1), kernel pointers printed using the %pK
340 format specifier will be replaced with 0's unless the user has CAP_SYSLOG
341 and effective user and group ids are equal to the real ids. This is
342 because %pK checks are done at read() time rather than open() time, so
343 if permissions are elevated between the open() and the read() (e.g via
344 a setuid binary) then %pK will not leak kernel pointers to unprivileged
345 users. Note, this is a temporary solution only. The correct long-term
346 solution is to do the permission checks at open() time. Consider removing
347 world read permissions from files that use %pK, and using dmesg_restrict
348 to protect against uses of %pK in dmesg(8) if leaking kernel pointer
349 values to unprivileged users is a concern.
351 When kptr_restrict is set to (2), kernel pointers printed using
352 %pK will be replaced with 0's regardless of privileges.
354 ==============================================================
356 kstack_depth_to_print: (X86 only)
358 Controls the number of words to print when dumping the raw
361 ==============================================================
365 This flag controls the L2 cache of G3 processor boards. If
366 0, the cache is disabled. Enabled if nonzero.
368 ==============================================================
372 A toggle value indicating if modules are allowed to be loaded
373 in an otherwise modular kernel. This toggle defaults to off
374 (0), but can be set true (1). Once true, modules can be
375 neither loaded nor unloaded, and the toggle cannot be set back
378 ==============================================================
380 msg_next_id, sem_next_id, and shm_next_id:
382 These three toggles allows to specify desired id for next allocated IPC
383 object: message, semaphore or shared memory respectively.
385 By default they are equal to -1, which means generic allocation logic.
386 Possible values to set are in range {0..INT_MAX}.
389 1) kernel doesn't guarantee, that new object will have desired id. So,
390 it's up to userspace, how to handle an object with "wrong" id.
391 2) Toggle with non-default value will be set back to -1 by kernel after
392 successful IPC object allocation.
394 ==============================================================
398 Enables/Disables the NMI watchdog on x86 systems. When the value is
399 non-zero the NMI watchdog is enabled and will continuously test all
400 online cpus to determine whether or not they are still functioning
401 properly. Currently, passing "nmi_watchdog=" parameter at boot time is
402 required for this function to work.
404 If LAPIC NMI watchdog method is in use (nmi_watchdog=2 kernel
405 parameter), the NMI watchdog shares registers with oprofile. By
406 disabling the NMI watchdog, oprofile may have more registers to
409 ==============================================================
413 Enables/disables automatic page fault based NUMA memory
414 balancing. Memory is moved automatically to nodes
415 that access it often.
417 Enables/disables automatic NUMA memory balancing. On NUMA machines, there
418 is a performance penalty if remote memory is accessed by a CPU. When this
419 feature is enabled the kernel samples what task thread is accessing memory
420 by periodically unmapping pages and later trapping a page fault. At the
421 time of the page fault, it is determined if the data being accessed should
422 be migrated to a local memory node.
424 The unmapping of pages and trapping faults incur additional overhead that
425 ideally is offset by improved memory locality but there is no universal
426 guarantee. If the target workload is already bound to NUMA nodes then this
427 feature should be disabled. Otherwise, if the system overhead from the
428 feature is too high then the rate the kernel samples for NUMA hinting
429 faults may be controlled by the numa_balancing_scan_period_min_ms,
430 numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
431 numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
432 numa_balancing_migrate_deferred.
434 ==============================================================
436 numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
437 numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
439 Automatic NUMA balancing scans tasks address space and unmaps pages to
440 detect if pages are properly placed or if the data should be migrated to a
441 memory node local to where the task is running. Every "scan delay" the task
442 scans the next "scan size" number of pages in its address space. When the
443 end of the address space is reached the scanner restarts from the beginning.
445 In combination, the "scan delay" and "scan size" determine the scan rate.
446 When "scan delay" decreases, the scan rate increases. The scan delay and
447 hence the scan rate of every task is adaptive and depends on historical
448 behaviour. If pages are properly placed then the scan delay increases,
449 otherwise the scan delay decreases. The "scan size" is not adaptive but
450 the higher the "scan size", the higher the scan rate.
452 Higher scan rates incur higher system overhead as page faults must be
453 trapped and potentially data must be migrated. However, the higher the scan
454 rate, the more quickly a tasks memory is migrated to a local node if the
455 workload pattern changes and minimises performance impact due to remote
456 memory accesses. These sysctls control the thresholds for scan delays and
457 the number of pages scanned.
459 numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
460 scan a tasks virtual memory. It effectively controls the maximum scanning
463 numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
464 when it initially forks.
466 numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
467 scan a tasks virtual memory. It effectively controls the minimum scanning
470 numa_balancing_scan_size_mb is how many megabytes worth of pages are
471 scanned for a given scan.
473 numa_balancing_migrate_deferred is how many page migrations get skipped
474 unconditionally, after a page migration is skipped because a page is shared
475 with other tasks. This reduces page migration overhead, and determines
476 how much stronger the "move task near its memory" policy scheduler becomes,
477 versus the "move memory near its task" memory management policy, for workloads
480 ==============================================================
482 osrelease, ostype & version:
489 #5 Wed Feb 25 21:49:24 MET 1998
491 The files osrelease and ostype should be clear enough. Version
492 needs a little more clarification however. The '#5' means that
493 this is the fifth kernel built from this source base and the
494 date behind it indicates the time the kernel was built.
495 The only way to tune these values is to rebuild the kernel :-)
497 ==============================================================
499 overflowgid & overflowuid:
501 if your architecture did not always support 32-bit UIDs (i.e. arm,
502 i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
503 applications that use the old 16-bit UID/GID system calls, if the
504 actual UID or GID would exceed 65535.
506 These sysctls allow you to change the value of the fixed UID and GID.
507 The default is 65534.
509 ==============================================================
513 The value in this file represents the number of seconds the kernel
514 waits before rebooting on a panic. When you use the software watchdog,
515 the recommended setting is 60.
517 ==============================================================
519 panic_on_unrecovered_nmi:
521 The default Linux behaviour on an NMI of either memory or unknown is
522 to continue operation. For many environments such as scientific
523 computing it is preferable that the box is taken out and the error
524 dealt with than an uncorrected parity/ECC error get propagated.
526 A small number of systems do generate NMI's for bizarre random reasons
527 such as power management so the default is off. That sysctl works like
528 the existing panic controls already in that directory.
530 ==============================================================
534 Controls the kernel's behaviour when an oops or BUG is encountered.
536 0: try to continue operation
538 1: panic immediately. If the `panic' sysctl is also non-zero then the
539 machine will be rebooted.
541 ==============================================================
543 panic_on_stackoverflow:
545 Controls the kernel's behavior when detecting the overflows of
546 kernel, IRQ and exception stacks except a user stack.
547 This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
549 0: try to continue operation.
551 1: panic immediately.
553 ==============================================================
555 perf_cpu_time_max_percent:
557 Hints to the kernel how much CPU time it should be allowed to
558 use to handle perf sampling events. If the perf subsystem
559 is informed that its samples are exceeding this limit, it
560 will drop its sampling frequency to attempt to reduce its CPU
563 Some perf sampling happens in NMIs. If these samples
564 unexpectedly take too long to execute, the NMIs can become
565 stacked up next to each other so much that nothing else is
568 0: disable the mechanism. Do not monitor or correct perf's
569 sampling rate no matter how CPU time it takes.
571 1-100: attempt to throttle perf's sample rate to this
572 percentage of CPU. Note: the kernel calculates an
573 "expected" length of each sample event. 100 here means
574 100% of that expected length. Even if this is set to
575 100, you may still see sample throttling if this
576 length is exceeded. Set to 0 if you truly do not care
577 how much CPU is consumed.
579 ==============================================================
584 PID allocation wrap value. When the kernel's next PID value
585 reaches this value, it wraps back to a minimum PID value.
586 PIDs of value pid_max or larger are not allocated.
588 ==============================================================
592 The last pid allocated in the current (the one task using this sysctl
593 lives in) pid namespace. When selecting a pid for a next task on fork
594 kernel tries to allocate a number starting from this one.
596 ==============================================================
598 powersave-nap: (PPC only)
600 If set, Linux-PPC will use the 'nap' mode of powersaving,
601 otherwise the 'doze' mode will be used.
603 ==============================================================
607 The four values in printk denote: console_loglevel,
608 default_message_loglevel, minimum_console_loglevel and
609 default_console_loglevel respectively.
611 These values influence printk() behavior when printing or
612 logging error messages. See 'man 2 syslog' for more info on
613 the different loglevels.
615 - console_loglevel: messages with a higher priority than
616 this will be printed to the console
617 - default_message_loglevel: messages without an explicit priority
618 will be printed with this priority
619 - minimum_console_loglevel: minimum (highest) value to which
620 console_loglevel can be set
621 - default_console_loglevel: default value for console_loglevel
623 ==============================================================
627 Delay each printk message in printk_delay milliseconds
629 Value from 0 - 10000 is allowed.
631 ==============================================================
635 Some warning messages are rate limited. printk_ratelimit specifies
636 the minimum length of time between these messages (in jiffies), by
637 default we allow one every 5 seconds.
639 A value of 0 will disable rate limiting.
641 ==============================================================
643 printk_ratelimit_burst:
645 While long term we enforce one message per printk_ratelimit
646 seconds, we do allow a burst of messages to pass through.
647 printk_ratelimit_burst specifies the number of messages we can
648 send before ratelimiting kicks in.
650 ==============================================================
654 This option can be used to select the type of process address
655 space randomization that is used in the system, for architectures
656 that support this feature.
658 0 - Turn the process address space randomization off. This is the
659 default for architectures that do not support this feature anyways,
660 and kernels that are booted with the "norandmaps" parameter.
662 1 - Make the addresses of mmap base, stack and VDSO page randomized.
663 This, among other things, implies that shared libraries will be
664 loaded to random addresses. Also for PIE-linked binaries, the
665 location of code start is randomized. This is the default if the
666 CONFIG_COMPAT_BRK option is enabled.
668 2 - Additionally enable heap randomization. This is the default if
669 CONFIG_COMPAT_BRK is disabled.
671 There are a few legacy applications out there (such as some ancient
672 versions of libc.so.5 from 1996) that assume that brk area starts
673 just after the end of the code+bss. These applications break when
674 start of the brk area is randomized. There are however no known
675 non-legacy applications that would be broken this way, so for most
676 systems it is safe to choose full randomization.
678 Systems with ancient and/or broken binaries should be configured
679 with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
680 address space randomization.
682 ==============================================================
684 reboot-cmd: (Sparc only)
686 ??? This seems to be a way to give an argument to the Sparc
687 ROM/Flash boot loader. Maybe to tell it what to do after
690 ==============================================================
692 rtsig-max & rtsig-nr:
694 The file rtsig-max can be used to tune the maximum number
695 of POSIX realtime (queued) signals that can be outstanding
698 rtsig-nr shows the number of RT signals currently queued.
700 ==============================================================
704 This file shows the size of the generic SCSI (sg) buffer.
705 You can't tune it just yet, but you could change it on
706 compile time by editing include/scsi/sg.h and changing
707 the value of SG_BIG_BUFF.
709 There shouldn't be any reason to change this value. If
710 you can come up with one, you probably know what you
713 ==============================================================
717 This parameter sets the total amount of shared memory pages that
718 can be used system wide. Hence, SHMALL should always be at least
719 ceil(shmmax/PAGE_SIZE).
721 If you are not sure what the default PAGE_SIZE is on your Linux
722 system, you can run the following command:
726 ==============================================================
730 This value can be used to query and set the run time limit
731 on the maximum shared memory segment size that can be created.
732 Shared memory segments up to 1Gb are now supported in the
733 kernel. This value defaults to SHMMAX.
735 ==============================================================
739 Linux lets you set resource limits, including how much memory one
740 process can consume, via setrlimit(2). Unfortunately, shared memory
741 segments are allowed to exist without association with any process, and
742 thus might not be counted against any resource limits. If enabled,
743 shared memory segments are automatically destroyed when their attach
744 count becomes zero after a detach or a process termination. It will
745 also destroy segments that were created, but never attached to, on exit
746 from the process. The only use left for IPC_RMID is to immediately
747 destroy an unattached segment. Of course, this breaks the way things are
748 defined, so some applications might stop working. Note that this
749 feature will do you no good unless you also configure your resource
750 limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't
753 Note that if you change this from 0 to 1, already created segments
754 without users and with a dead originative process will be destroyed.
756 ==============================================================
760 Non-zero if the kernel has been tainted. Numeric values, which
761 can be ORed together:
763 1 - A module with a non-GPL license has been loaded, this
764 includes modules with no license.
765 Set by modutils >= 2.4.9 and module-init-tools.
766 2 - A module was force loaded by insmod -f.
767 Set by modutils >= 2.4.9 and module-init-tools.
768 4 - Unsafe SMP processors: SMP with CPUs not designed for SMP.
769 8 - A module was forcibly unloaded from the system by rmmod -f.
770 16 - A hardware machine check error occurred on the system.
771 32 - A bad page was discovered on the system.
772 64 - The user has asked that the system be marked "tainted". This
773 could be because they are running software that directly modifies
774 the hardware, or for other reasons.
775 128 - The system has died.
776 256 - The ACPI DSDT has been overridden with one supplied by the user
777 instead of using the one provided by the hardware.
778 512 - A kernel warning has occurred.
779 1024 - A module from drivers/staging was loaded.
780 2048 - The system is working around a severe firmware bug.
781 4096 - An out-of-tree module has been loaded.
783 ==============================================================
787 The value in this file affects behavior of handling NMI. When the
788 value is non-zero, unknown NMI is trapped and then panic occurs. At
789 that time, kernel debugging information is displayed on console.
791 NMI switch that most IA32 servers have fires unknown NMI up, for
792 example. If a system hangs up, try pressing the NMI switch.
794 ==============================================================
798 This value can be used to control the frequency of hrtimer and NMI
799 events and the soft and hard lockup thresholds. The default threshold
802 The softlockup threshold is (2 * watchdog_thresh). Setting this
803 tunable to zero will disable lockup detection altogether.
805 ==============================================================