From: Ingo Molnar Date: Mon, 23 Mar 2009 20:26:52 +0000 (+0100) Subject: perf_counter: create Documentation/perf_counter/ and move perfcounters.txt there X-Git-Tag: firefly_0821_release~13973^2~462 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=6f9f791eb53b56097cd311a1a5517a8e89bdaf35;p=firefly-linux-kernel-4.4.55.git perf_counter: create Documentation/perf_counter/ and move perfcounters.txt there We'll have more files in that directory, prepare for that. Acked-by: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Ingo Molnar --- diff --git a/Documentation/perf-counters.txt b/Documentation/perf-counters.txt deleted file mode 100644 index fddd32189a50..000000000000 --- a/Documentation/perf-counters.txt +++ /dev/null @@ -1,147 +0,0 @@ - -Performance Counters for Linux ------------------------------- - -Performance counters are special hardware registers available on most modern -CPUs. These registers count the number of certain types of hw events: such -as instructions executed, cachemisses suffered, or branches mis-predicted - -without slowing down the kernel or applications. These registers can also -trigger interrupts when a threshold number of events have passed - and can -thus be used to profile the code that runs on that CPU. - -The Linux Performance Counter subsystem provides an abstraction of these -hardware capabilities. It provides per task and per CPU counters, counter -groups, and it provides event capabilities on top of those. - -Performance counters are accessed via special file descriptors. -There's one file descriptor per virtual counter used. - -The special file descriptor is opened via the perf_counter_open() -system call: - - int sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr, - pid_t pid, int cpu, int group_fd); - -The syscall returns the new fd. The fd can be used via the normal -VFS system calls: read() can be used to read the counter, fcntl() -can be used to set the blocking mode, etc. - -Multiple counters can be kept open at a time, and the counters -can be poll()ed. - -When creating a new counter fd, 'perf_counter_hw_event' is: - -/* - * Hardware event to monitor via a performance monitoring counter: - */ -struct perf_counter_hw_event { - s64 type; - - u64 irq_period; - u32 record_type; - - u32 disabled : 1, /* off by default */ - nmi : 1, /* NMI sampling */ - raw : 1, /* raw event type */ - __reserved_1 : 29; - - u64 __reserved_2; -}; - -/* - * Generalized performance counter event types, used by the hw_event.type - * parameter of the sys_perf_counter_open() syscall: - */ -enum hw_event_types { - /* - * Common hardware events, generalized by the kernel: - */ - PERF_COUNT_CYCLES = 0, - PERF_COUNT_INSTRUCTIONS = 1, - PERF_COUNT_CACHE_REFERENCES = 2, - PERF_COUNT_CACHE_MISSES = 3, - PERF_COUNT_BRANCH_INSTRUCTIONS = 4, - PERF_COUNT_BRANCH_MISSES = 5, - - /* - * Special "software" counters provided by the kernel, even if - * the hardware does not support performance counters. These - * counters measure various physical and sw events of the - * kernel (and allow the profiling of them as well): - */ - PERF_COUNT_CPU_CLOCK = -1, - PERF_COUNT_TASK_CLOCK = -2, - /* - * Future software events: - */ - /* PERF_COUNT_PAGE_FAULTS = -3, - PERF_COUNT_CONTEXT_SWITCHES = -4, */ -}; - -These are standardized types of events that work uniformly on all CPUs -that implements Performance Counters support under Linux. If a CPU is -not able to count branch-misses, then the system call will return --EINVAL. - -More hw_event_types are supported as well, but they are CPU -specific and are enumerated via /sys on a per CPU basis. Raw hw event -types can be passed in under hw_event.type if hw_event.raw is 1. -For example, to count "External bus cycles while bus lock signal asserted" -events on Intel Core CPUs, pass in a 0x4064 event type value and set -hw_event.raw to 1. - -'record_type' is the type of data that a read() will provide for the -counter, and it can be one of: - -/* - * IRQ-notification data record type: - */ -enum perf_counter_record_type { - PERF_RECORD_SIMPLE = 0, - PERF_RECORD_IRQ = 1, - PERF_RECORD_GROUP = 2, -}; - -a "simple" counter is one that counts hardware events and allows -them to be read out into a u64 count value. (read() returns 8 on -a successful read of a simple counter.) - -An "irq" counter is one that will also provide an IRQ context information: -the IP of the interrupted context. In this case read() will return -the 8-byte counter value, plus the Instruction Pointer address of the -interrupted context. - -The parameter 'hw_event_period' is the number of events before waking up -a read() that is blocked on a counter fd. Zero value means a non-blocking -counter. - -The 'pid' parameter allows the counter to be specific to a task: - - pid == 0: if the pid parameter is zero, the counter is attached to the - current task. - - pid > 0: the counter is attached to a specific task (if the current task - has sufficient privilege to do so) - - pid < 0: all tasks are counted (per cpu counters) - -The 'cpu' parameter allows a counter to be made specific to a full -CPU: - - cpu >= 0: the counter is restricted to a specific CPU - cpu == -1: the counter counts on all CPUs - -(Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.) - -A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts -events of that task and 'follows' that task to whatever CPU the task -gets schedule to. Per task counters can be created by any user, for -their own tasks. - -A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts -all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege. - -Group counters are created by passing in a group_fd of another counter. -Groups are scheduled at once and can be used with PERF_RECORD_GROUP -to record multi-dimensional timestamps. - diff --git a/Documentation/perf_counter/design.txt b/Documentation/perf_counter/design.txt new file mode 100644 index 000000000000..fddd32189a50 --- /dev/null +++ b/Documentation/perf_counter/design.txt @@ -0,0 +1,147 @@ + +Performance Counters for Linux +------------------------------ + +Performance counters are special hardware registers available on most modern +CPUs. These registers count the number of certain types of hw events: such +as instructions executed, cachemisses suffered, or branches mis-predicted - +without slowing down the kernel or applications. These registers can also +trigger interrupts when a threshold number of events have passed - and can +thus be used to profile the code that runs on that CPU. + +The Linux Performance Counter subsystem provides an abstraction of these +hardware capabilities. It provides per task and per CPU counters, counter +groups, and it provides event capabilities on top of those. + +Performance counters are accessed via special file descriptors. +There's one file descriptor per virtual counter used. + +The special file descriptor is opened via the perf_counter_open() +system call: + + int sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr, + pid_t pid, int cpu, int group_fd); + +The syscall returns the new fd. The fd can be used via the normal +VFS system calls: read() can be used to read the counter, fcntl() +can be used to set the blocking mode, etc. + +Multiple counters can be kept open at a time, and the counters +can be poll()ed. + +When creating a new counter fd, 'perf_counter_hw_event' is: + +/* + * Hardware event to monitor via a performance monitoring counter: + */ +struct perf_counter_hw_event { + s64 type; + + u64 irq_period; + u32 record_type; + + u32 disabled : 1, /* off by default */ + nmi : 1, /* NMI sampling */ + raw : 1, /* raw event type */ + __reserved_1 : 29; + + u64 __reserved_2; +}; + +/* + * Generalized performance counter event types, used by the hw_event.type + * parameter of the sys_perf_counter_open() syscall: + */ +enum hw_event_types { + /* + * Common hardware events, generalized by the kernel: + */ + PERF_COUNT_CYCLES = 0, + PERF_COUNT_INSTRUCTIONS = 1, + PERF_COUNT_CACHE_REFERENCES = 2, + PERF_COUNT_CACHE_MISSES = 3, + PERF_COUNT_BRANCH_INSTRUCTIONS = 4, + PERF_COUNT_BRANCH_MISSES = 5, + + /* + * Special "software" counters provided by the kernel, even if + * the hardware does not support performance counters. These + * counters measure various physical and sw events of the + * kernel (and allow the profiling of them as well): + */ + PERF_COUNT_CPU_CLOCK = -1, + PERF_COUNT_TASK_CLOCK = -2, + /* + * Future software events: + */ + /* PERF_COUNT_PAGE_FAULTS = -3, + PERF_COUNT_CONTEXT_SWITCHES = -4, */ +}; + +These are standardized types of events that work uniformly on all CPUs +that implements Performance Counters support under Linux. If a CPU is +not able to count branch-misses, then the system call will return +-EINVAL. + +More hw_event_types are supported as well, but they are CPU +specific and are enumerated via /sys on a per CPU basis. Raw hw event +types can be passed in under hw_event.type if hw_event.raw is 1. +For example, to count "External bus cycles while bus lock signal asserted" +events on Intel Core CPUs, pass in a 0x4064 event type value and set +hw_event.raw to 1. + +'record_type' is the type of data that a read() will provide for the +counter, and it can be one of: + +/* + * IRQ-notification data record type: + */ +enum perf_counter_record_type { + PERF_RECORD_SIMPLE = 0, + PERF_RECORD_IRQ = 1, + PERF_RECORD_GROUP = 2, +}; + +a "simple" counter is one that counts hardware events and allows +them to be read out into a u64 count value. (read() returns 8 on +a successful read of a simple counter.) + +An "irq" counter is one that will also provide an IRQ context information: +the IP of the interrupted context. In this case read() will return +the 8-byte counter value, plus the Instruction Pointer address of the +interrupted context. + +The parameter 'hw_event_period' is the number of events before waking up +a read() that is blocked on a counter fd. Zero value means a non-blocking +counter. + +The 'pid' parameter allows the counter to be specific to a task: + + pid == 0: if the pid parameter is zero, the counter is attached to the + current task. + + pid > 0: the counter is attached to a specific task (if the current task + has sufficient privilege to do so) + + pid < 0: all tasks are counted (per cpu counters) + +The 'cpu' parameter allows a counter to be made specific to a full +CPU: + + cpu >= 0: the counter is restricted to a specific CPU + cpu == -1: the counter counts on all CPUs + +(Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.) + +A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts +events of that task and 'follows' that task to whatever CPU the task +gets schedule to. Per task counters can be created by any user, for +their own tasks. + +A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts +all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege. + +Group counters are created by passing in a group_fd of another counter. +Groups are scheduled at once and can be used with PERF_RECORD_GROUP +to record multi-dimensional timestamps. +