Serviceability (computer)
Encyclopedia
In software engineering
and hardware engineering, serviceability (also known as supportability,) is one of the -ilities
or aspects
(from IBM's RASU (Reliability, Availability, Serviceability, and Usability)). It refers to the ability of technical support
personnel to install, configure, and monitor computer products, identify exceptions or faults, debug
or isolate faults to root cause analysis
, and provide hardware or software maintenance
in pursuit of solving a problem and restoring the product into service. Incorporating serviceability facilitating features typically results in more efficient product maintenance and reduces operational costs and maintains business continuity.
Examples of features that facilitate serviceability include:
Serviceability engineering may also incorporate some routine system maintenance related features (see: Operations, Administration and Maintenance (OA&M
.))
A service tool is defined as a facility or feature, closely tied to a product, that provides capabilities and data so as to service (analyze, monitor, debug, repair, etc) that product. Service tools can provide broad ranges of capabilities. Regarding diagnosis, a proposed taxonomy of service tools is as follows:
As a rough rule of thumb for these taxonomies, there are multiple ‘orders of magnitude’ of diagnostic data in level 1 vs. level 2 vs. level 3 service tools.
Additional characteristics and capabilities that have been observed in service tools:
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...
and hardware engineering, serviceability (also known as supportability,) is one of the -ilities
Ilities
Within systems engineering, quality attributes are non-functional requirements used to evaluate the performance of a system. These are sometimes named "ilities" after the suffix many of the words share...
or aspects
Aspect (computer science)
In computer science, an aspect of a program is a feature linked to many other parts of the program, but which is not related to the program's primary function. An aspect crosscuts the program's core concerns, therefore violating its separation of concerns that tries to encapsulate unrelated functions...
(from IBM's RASU (Reliability, Availability, Serviceability, and Usability)). It refers to the ability of technical support
Technical support
Technical support or tech support refers to a range of services by which enterprises provide assistance to users of technology products such as mobile phones, televisions, computers, software products or other electronic or mechanical goods...
personnel to install, configure, and monitor computer products, identify exceptions or faults, debug
Debugging
Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge...
or isolate faults to root cause analysis
Root cause analysis
Root cause analysis is a class of problem solving methods aimed at identifying the root causes of problems or events.Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes of one...
, and provide hardware or software maintenance
Software maintenance
Software Maintenance in software engineering is the modification of a software product after delivery to correct faults, to improve performance or other attributes....
in pursuit of solving a problem and restoring the product into service. Incorporating serviceability facilitating features typically results in more efficient product maintenance and reduces operational costs and maintains business continuity.
Examples of features that facilitate serviceability include:
- Help deskHelp deskA help desk is an information and assistance resource that troubleshoots problems with computers or similar products. Corporations often provide help desk support to their customers via a toll-free number, website and e-mail. There are also in-house help desks geared toward providing the same kind...
notification of exceptional events (e.g., by electronic mail or by sending textPlain textIn computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....
to a pagerPagerA pager is a simple personal telecommunications device for short messages. A one-way numeric pager can only receive a message consisting of a few digits, typically a phone number that the user is then requested to call...
) - Network monitoringNetwork monitoringThe term network monitoring describes the use of a system that constantly monitors a computer network for slow or failing components and that notifies the network administrator in case of outages...
- DocumentationDocumentationDocumentation is a term used in several different ways. Generally, documentation refers to the process of providing evidence.Modules of Documentation are Helpful...
- Event logging / Tracing (software)Tracing (software)In software engineering, tracing is a specialized use of logging to record information about a program's execution. This information is typically used by programmers for debugging purposes, and additionally, depending on the type and detail of information contained in a trace log, by experienced...
- Logging of program stateProgram stateOne of the key concepts in computer programming is the idea of state, essentially a snapshot of the measure of various conditions in the system. Most programming languages require a considerable amount of state information in order to operate properly - information which is generally hidden from...
, such as- Execution path and/or local and global variables
- Procedure entry and exit, optionally with incoming and return variable values (see: subroutineSubroutineIn computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
) - Exception block entry, optionally with local state (see: exception handlingException handlingException handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....
)
- Software upgradeUpgradeThe term upgrade refers to the replacement of a product with a newer version of the same product. It is most often used in computing and consumer electronics, generally meaning a replacement of hardware, software or firmware with a newer or better version, in order to bring the system up to date...
- Graceful degradation, where the product is designed to allow recovery from exceptional events without intervention by technical supportTechnical supportTechnical support or tech support refers to a range of services by which enterprises provide assistance to users of technology products such as mobile phones, televisions, computers, software products or other electronic or mechanical goods...
staff - HardwareHardwareHardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....
replacement or upgrade planning, where the product is designed to allow efficient hardware upgrades with minimal computer system downtimeDowntimeThe term downtime is used to refer to periods when a system is unavailable.Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function...
(e.g., hotswap components.)
Serviceability engineering may also incorporate some routine system maintenance related features (see: Operations, Administration and Maintenance (OA&M
OA&M
Operations, administration and management or operations, administration and maintenance is a general term used to describe the processes, activities, tools, standards, etc involved with operating, administering, managing and maintaining any system...
.))
A service tool is defined as a facility or feature, closely tied to a product, that provides capabilities and data so as to service (analyze, monitor, debug, repair, etc) that product. Service tools can provide broad ranges of capabilities. Regarding diagnosis, a proposed taxonomy of service tools is as follows:
- Level 1: Service tool that indicates if a product is functional or not functional. Describing computer servers, the states are often referred to as ‘up’ or ‘down’. This is a binary value.
- Level 2: Service tool that provides some detailed diagnostic data. Often the diagnostic data is referred to as a problem ‘signature’, a representation of key values such as system environment, running program name, etc. This level of data is used to compare one problem’s signature to another problem’s signature: the ability to match the new problem to an old one allows one to use the solution already created for the prior problem. The ability to screen problems is valuable when a problem does match a pre-existing problem, but it is not sufficient to debug a new problem.
- Level 3: Provides detailed diagnostic data sufficient to debug a new and unique problem.
As a rough rule of thumb for these taxonomies, there are multiple ‘orders of magnitude’ of diagnostic data in level 1 vs. level 2 vs. level 3 service tools.
Additional characteristics and capabilities that have been observed in service tools:
- Time of data collection: some tools can collect data immediately, as soon as problem occurs, others are delayed in collecting data.
- Pre-analyzed, or not-yet-analyzed data: some tools collect ‘external’ data, while others collect ‘internal’ data. This is seen when comparing system messages (natural language-like statements in the user’s native language) vs. ‘binary’ storage dumps.
- Partial or full set of system state data: some tools collect a complete system state vs. a partial system state (user or partial ‘binary’ storage dump vs. complete system dump).
- Raw or analyzed data: some tools display raw data, while others analyze it (examples storage dump formatters that format data, vs. ‘intelligent’ data formatters (“ANALYZE” is a common verb) that combine product knowledge with analysis of state variables to indicate the ‘meaning’ of the data.
- Programmable tools vs. ‘fixed function’ tools. Some tools can be altered to get varying amounts of data, at varying times. Some tools have only a fixed function.
- Automatic or manual? Some tools are built-in to a product, to automatically collect data when a fault or failure occurs. Other tools have to be specifically invoked to start the data collection process.
- Repair or non-repair? Some tools collect data as a fore-runner to an automatic repair process (self-healing/fault tolerant). These tools have the challenge of quickly obtaining unaltered data before the desired repair process starts.
External links
Excellent example of Serviceability Feature Requirements:- Sun Gathering Debug Data (Sun GDD). This is a set of tools developed by the Sun's support guys aimed to provide the right approach to problem resolution by leveraging proactive actions and best practices to gather the debug data needed for further analysis.
- "Carrier Grade Linux Serviceability Requirements Definition Version 4," Copyright (c) 2005-2007 by Open Source Development Labs, Inc. Beaverton, OR 97005 USA http://devresources.linux-foundation.org/dev/cgl/cgl40/cgl40-serviceability.pdf