Performance analysis
Encyclopedia
In software engineering
, profiling ("program profiling", "software profiling") is a form of dynamic program analysis
that measures, for example, the usage of memory, the usage of particular instructions
, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization
.
Profiling is achieved by instrumenting either the program source code
or its binary executable form using a tool called a profiler (or code profiler).
The methodology of the profiler itself classify the profiler as event-based, as statistical, as instrumentation, or as simulation.
, instruction set simulation
, operating system hooks
, and performance counter
s. The usage of profilers is 'called out' in the performance engineering
process.
. Software writers need tools to analyze their programs and identify critical sections of code. Compiler
writers often use such tools to find out how well their instruction scheduling
or branch prediction algorithm is performing... (ATOM, PLDI, '94)
The output of a profiler may be:-
/* ------------ source------------------------- count */
0001 IF X = "A" 0055
0002 THEN DO
0003 ADD 1 to XCOUNT 0032
0004 ELSE
0005 IF X = "B" 0055
(PSW) at set timer intervals to detect "hot spots" in executing code. This was an early example of sampling
(see below). In early 1974, Instruction Set Simulator
s permitted full trace and other performance monitoring features.
Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. In 1982, gprof extended the concept to a complete call graph
analysis.
In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation
published a paper describing ATOM. ATOM is a platform for converting a program into its own profiler. That is, at compile time
, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "instrumentation
".
In 2004, both the gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time.
profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. However context is not preserved.
. A sampling profiler probes the target program's program counter
at regular intervals using operating system
interrupt
s. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed.
The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods.
In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as system call
processing.
Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity).
Dedicated hardware can go beyond this: some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the program counter
in a truly undetectable manner.
Some of the most commonly used statistical profilers are AMD CodeAnalyst
, Apple Inc. Shark, gprof, Intel VTune
and Parallel Amplifier (part of Intel Parallel Studio
).
Instrumenting the program can cause changes in the performance of the program, potentially causing inaccurate results and heisenbugs. Instrumenting will always have some impact on the program execution, typically always slowing it. However, instrumentation can be very specific and be carefully controlled to have a minimal impact. The impact on a particular program depends on the placement of instrumentation points and the mechanism used to capture the trace. Hardware support for trace capture means that on some targets, instrumentation can be on just one machine instruction. The impact of instrumentation can often be deducted (i.e. eliminated by subtraction) from the results.
gprof is an example of a profiler that uses both instrumentation and sampling. Instrumentation is used to gather caller information and the actual timing values are obtained by statistical sampling.
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...
, profiling ("program profiling", "software profiling") is a form of dynamic program analysis
Dynamic program analysis
Dynamic program analysis is the analysis of computer software that is performed by executing programs built from that software system on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to produce interesting...
that measures, for example, the usage of memory, the usage of particular instructions
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization
Optimization (computer science)
In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...
.
Profiling is achieved by instrumenting either the program source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
or its binary executable form using a tool called a profiler (or code profiler).
The methodology of the profiler itself classify the profiler as event-based, as statistical, as instrumentation, or as simulation.
Gathering program events
Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentationInstrumentation (computer programming)
In context of computer programming, instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system...
, instruction set simulation
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
, operating system hooks
Hooking
In computer programming, the term hooking covers a range of techniques used to alter or augment the behavior of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components...
, and performance counter
Hardware performance counter
In computers, hardware performance counters, or hardware counters are a set of special-purpose registers built into modern microprocessors to store the counts of hardware-related activities within computer systems...
s. The usage of profilers is 'called out' in the performance engineering
Performance Engineering
Performance engineering within systems engineering, encompasses the set of roles, skills, activities, practices, tools, and deliverables applied at every phase of the Systems Development Life Cycle which ensures that a solution will be designed, implemented, and operationally supported to meet the...
process.
Use of profilers
Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new architecturesComputer architecture
In computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....
. Software writers need tools to analyze their programs and identify critical sections of code. Compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
writers often use such tools to find out how well their instruction scheduling
Instruction scheduling
In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines...
or branch prediction algorithm is performing... (ATOM, PLDI, '94)
The output of a profiler may be:-
- A statistical summary of the events observed (a profile)
- Summary profile information is often shown annotated against the source code statements where the events occur, so the size of measurement data is linear to the code size of the program.
/* ------------ source------------------------- count */
0001 IF X = "A" 0055
0002 THEN DO
0003 ADD 1 to XCOUNT 0032
0004 ELSE
0005 IF X = "B" 0055
- A stream of recorded events (a trace)
- For sequential programs, a summary profile is usually sufficient, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring a full trace to get an understanding of what is happening.
- The size of a (full) trace is linear to the program's instruction path lengthInstruction path lengthIn computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performance on a particular computer hardware...
, making it somewhat impractical. A trace may therefore be initiated at one point in a program and terminated at another point to limit the output.- An ongoing interaction with the hypervisorHypervisorIn computing, a hypervisor, also called virtual machine manager , is one of many hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer. It is so named because it is conceptually one level higher than a supervisory program...
(continuous or periodic monitoring via on-screen display for instance)
- An ongoing interaction with the hypervisor
- This provides the opportunity to switch a trace on or off at any desired point during execution in addition to viewing on-going metrics about the (still executing) program. It also provides the opportunity to suspend asynchronous processes at critical points to examine interactions with other parallel processes in more detail.
History
Performance analysis tools existed on IBM/360 and IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the Program status wordProgram status word
The Program status word is an IBM System/360 architecture and successors control register which performs the function of a Status register in other architectures, and more....
(PSW) at set timer intervals to detect "hot spots" in executing code. This was an early example of sampling
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
(see below). In early 1974, Instruction Set Simulator
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
s permitted full trace and other performance monitoring features.
Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. In 1982, gprof extended the concept to a complete call graph
Call graph
A call graph is a directed graph that represents calling relationships between subroutines in a computer program. Specifically, each node represents a procedure and each edge indicates that procedure f calls procedure g...
analysis.
In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
published a paper describing ATOM. ATOM is a platform for converting a program into its own profiler. That is, at compile time
Compile time
In computer science, compile time refers to either the operations performed by a compiler , programming language requirements that must be met by source code for it to be successfully compiled , or properties of the program that can be reasoned about at compile time.The operations performed at...
, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "instrumentation
Instrumentation (computer programming)
In context of computer programming, instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system...
".
In 2004, both the gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time.
Flat profiler
Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context.Call-graph profiler
Call graphCall graph
A call graph is a directed graph that represents calling relationships between subroutines in a computer program. Specifically, each node represents a procedure and each edge indicates that procedure f calls procedure g...
profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. However context is not preserved.
Event-based profilers
The programming languages listed here have event-based profilers:- JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
: the JVMTIJava Virtual Machine Tools InterfaceJava Virtual Machine Tool Interface was introduced in J2SE 5.0 . This interface allows a program to inspect the state and to control the execution of applications running in the Java Virtual Machine...
(JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave. - .NET.NET FrameworkThe .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
: Can attach a profiling agent as a COM server to the CLR using Profiling API. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method JIT / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways. - PythonPython (programming language)Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile' function to trap events like c_{call,return,exception}, python_{call,return,exception}. - RubyRuby (programming language)Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
: Ruby also uses a similar interface like Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.
Statistical profilers
Some profilers operate by samplingSampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
. A sampling profiler probes the target program's program counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
at regular intervals using operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
interrupt
Interrupt
In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
s. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed.
The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods.
In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as system call
System call
In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...
processing.
Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity).
Dedicated hardware can go beyond this: some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the program counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
in a truly undetectable manner.
Some of the most commonly used statistical profilers are AMD CodeAnalyst
CodeAnalyst
AMD CodeAnalyst is a GUI-based code profiler for x86-based machines. CodeAnalyst has similar look and feel on both Linux and Microsoft Windows platforms...
, Apple Inc. Shark, gprof, Intel VTune
VTune
Intel VTune Amplifier XE is a commercial application for software performance analysis for 32 and 64-bit x86 based machines, and has both GUI and command line interfaces. It is available for both Linux and Microsoft Windows operating systems...
and Parallel Amplifier (part of Intel Parallel Studio
Intel Parallel Studio
Intel Parallel Studio is a software development product developed by Intel that plugs into the Microsoft Visual Studio Integrated Development Environment. Its purpose is to facilitate developing programs for parallel computing...
).
Instrumenting profilers
Some profilers instrument the target program with additional instructions to collect the required information.Instrumenting the program can cause changes in the performance of the program, potentially causing inaccurate results and heisenbugs. Instrumenting will always have some impact on the program execution, typically always slowing it. However, instrumentation can be very specific and be carefully controlled to have a minimal impact. The impact on a particular program depends on the placement of instrumentation points and the mechanism used to capture the trace. Hardware support for trace capture means that on some targets, instrumentation can be on just one machine instruction. The impact of instrumentation can often be deducted (i.e. eliminated by subtraction) from the results.
gprof is an example of a profiler that uses both instrumentation and sampling. Instrumentation is used to gather caller information and the actual timing values are obtained by statistical sampling.
Instrumentation
- Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement APIs such as the Application Response MeasurementApplication Response MeasurementApplication Response Measurement is an open standard published by the Open Group for monitoring and diagnosing performance bottlenecks within complex enterprise applications that use loosely-coupled designs or service-oriented architectures....
standard. - Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy.
- Compiler assisted: Example: "gcc -pg ..." for gprof, "quantify g++ ..." for Quantify
- Binary translation: The tool adds instrumentation to a compiled binary. Example: ATOM
- Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. Examples: PinPin (computer program)Pin is a dynamic binary instrumentation framework for the IA-32 and x86-64 instruction set architectures that enables the creation of dynamic program analysis tools...
, ValgrindValgrindValgrind is a GPL licensed programming tool for memory debugging, memory leak detection, and profiling. The name valgrind comes from the main entrance to Valhalla in Norse mythology.... - Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions. Example: DynInstDynInstDynInst is a multi-platform runtime code-patching library developed at the University of Wisconsin–Madison and University of Maryland, College Park. It may be useful in the development of performance measurement tools, debuggers, and simulators. The most recent release is Version 7.0....
Interpreter instrumentation
- Interpreter debug options can enable the collection of performance metrics as the interpreter encounters each target statement. A bytecodeBytecodeBytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...
, control tableControl tableControl tables are tables that control the program flow or play a major part in program control. There are no rigid rules concerning the structure or content of a control table - its only qualifying attribute is its ability to direct program flow in some way through its 'execution' by an associated...
or JITJITJIT may refer to:* Various meanings of Just In Time:** Just-in-time compilation - a technique for improving the performance of virtual machines in computing.** Just-in-time - a business inventory strategy.* Jabber ICQ Transport....
interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.
Hypervisor/Simulator
- Hypervisor: Data are collected by running the (usually) unmodified program under a hypervisorHypervisorIn computing, a hypervisor, also called virtual machine manager , is one of many hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer. It is so named because it is conceptually one level higher than a supervisory program...
. Example: SIMMONSIMMONSIMMON was a proprietary software testing system developed in the late 1960s in the IBM Product Test Laboratory, then at Poughkeepsie, N.Y... - Simulator and Hypervisor: Data collected interactively and selectively by running the unmodified program under an Instruction Set SimulatorInstruction Set SimulatorAn instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
. Examples: SIMONSIMON (Batch Interactive test/debug)SIMON was a proprietary test/debugging toolkit for interactively testing Batch programs designed to run on IBM's System 360/370/390 architecture....
and OLIVER.
See also
- Algorithmic efficiencyAlgorithmic efficiencyIn computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...
- Static code analysisStatic code analysisStatic program analysis is the analysis of computer software that is performed without actually executing programs built from that software In most cases the analysis is performed on some version of the source code and in the other cases some form of the object code...
- BenchmarkBenchmark (computing)In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it...
- List of performance analysis tools
- Performance engineeringPerformance EngineeringPerformance engineering within systems engineering, encompasses the set of roles, skills, activities, practices, tools, and deliverables applied at every phase of the Systems Development Life Cycle which ensures that a solution will be designed, implemented, and operationally supported to meet the...
- Performance predictionPerformance predictionIn computer science, performance prediction means to estimate the execution time or other performance factors of a program on a given computer...
- PAPIPerformance Application Programming InterfaceIn computer science, Performance Application Programming Interface is a portable interface to hardware performance counters on modern microprocessors. It is being widely used to collect low level performance metrics In computer science, Performance Application Programming Interface (PAPI) is a...
is a portable interface (in the form of a library) to hardware performance counters on modern microprocessors. - Performance tuningPerformance tuningPerformance tuning is the improvement of system performance. This is typically a computer application, but the same methods can be applied to economic markets, bureaucracies or other complex systems. The motivation for such activity is called a performance problem, which can be real or anticipated....
- Worst-case execution time (WCET)
- Java performanceJava performanceThe performance of a compiled Java program will depend on how smartly its particular tasks are going to be managed by the host JVM, and how well the JVM takes advantage of the features of the hardware and OS in doing so. Thus, any Java performance test or comparison has to always report the...
- Software archaeologySoftware archaeologySoftware archaeology or software archeology is the study of poorly documented or undocumented legacy software implementations, as part of software maintenance...
External links
- Article "Need for speed — Eliminating performance bottlenecks" on doing execution time analysis of Java applications using IBM Rational Application DeveloperIBM Rational Application DeveloperIBM Rational Application Developer for WebSphere Software is a commercial Eclipse-based integrated development environment , made by IBM's Rational Software division, for visually designing, constructing, testing, and deploying Web services, portals, and Java Enterprise Edition ...
. - Profiling Runtime Generated and Interpreted Code using the VTune Performance Analyzer