System Monitoring
Encyclopedia
A system monitor in systems engineering
is a process within a distributed system for collecting and storing state data.
The monitoring application needs information such as log file path and number of threads to run with. Once the application is running, it needs to know what to monitor, and deduce how to monitor. Because the configuration data for what to monitor is needed in other areas of the system, such as deployment
, the configuration data should not be tailored specifically for use by the system monitor, but should be a generalized system configuration model.
The performance of the monitoring system has two aspects:
There are many issues involved with designing and implementing a system monitor. Here are a few issues to be dealt with:
(MIB), a mapping of commands/data references to the various data elements the host or device provides. The advantage of SNMP for monitoring is its low bandwidth requirements and universal usage in the industries.
Unless an application itself provides a MIB and output via SNMP, then SNMP is not suitable for collecting application data.
Other protocols are suitable for monitoring applications, such as CORBA
(language/OS-independent), JMX (Java-specific management and monitoring protocol), or proprietary TCP/IP or UDP protocols (language/OS independent for the most part).
server, clients can connect and make calls on the monitor for current state of an element, or historical states for an element for some time period.
The system monitor may be writing data directly into a database, allowing other processes to access the database outside the context of the system monitor. This is dangerous however, as the table design for the database will dictate the potential for data-sharing. Ideally the system monitor is a wrapper for whatever persistence mechanism is used, providing a consistent and 'safe' access interface for others to access the data.
Monitor poll
Agent push
Hybrid mode
Systems engineering
Systems engineering is an interdisciplinary field of engineering that focuses on how complex engineering projects should be designed and managed over the life cycle of the project. Issues such as logistics, the coordination of different teams, and automatic control of machinery become more...
is a process within a distributed system for collecting and storing state data.
Overview
The configuration for the system monitor takes two forms:- configuration data for the monitor application itself, and
- configuration data for the system being monitored. See: System configurationSystem ConfigurationA system configuration in systems engineering defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.Alternatively the term...
The monitoring application needs information such as log file path and number of threads to run with. Once the application is running, it needs to know what to monitor, and deduce how to monitor. Because the configuration data for what to monitor is needed in other areas of the system, such as deployment
Deployment
Deployment may refer to:* Deployment flowchart, a process mapping tool used to articulate the steps and stakeholders of a given process* System deployment, transforming a mechanical, electrical, or computer system from a packaged form to an operational state* Software deployment, all of the...
, the configuration data should not be tailored specifically for use by the system monitor, but should be a generalized system configuration model.
The performance of the monitoring system has two aspects:
- Impact on system domain or impact on domain functionality: Any element of the monitoring system that prevents the main domain functionality from working is in-appropriate. Ideally the monitoring is a tiny fraction of each applications footprint, requiring simplicity. The monitoring function must be highly tunable to allow for such issues as network performance, improvements to applications in the development life-cycle, appropriate levels of detail, etc. Impact on the systems' primary function must be considered.
- Efficient monitoring or ability to monitor efficiently: Monitoring must be efficient, able to handle all monitoring goals in a timely manner, within the desired period. This is most related to scalability. Various monitoring modes are discussed below.
There are many issues involved with designing and implementing a system monitor. Here are a few issues to be dealt with:
- configuration
- protocol
- performance
- data access
Protocol
There are many tools for collecting system data from hosts and devices using the SNMP (Simple Network Management Protocol). Most computers and networked devices will have some form of SNMP access. Interpretation of the SNMP data from a host or device requires either a specialized tool (typically extra software from the vendor) or a Management information baseManagement information base
A management information base is a virtual database used for managing the entities in a communications network. Most often associated with the Simple Network Management Protocol , the term is also used more generically in contexts such as in OSI/ISO Network management model...
(MIB), a mapping of commands/data references to the various data elements the host or device provides. The advantage of SNMP for monitoring is its low bandwidth requirements and universal usage in the industries.
Unless an application itself provides a MIB and output via SNMP, then SNMP is not suitable for collecting application data.
Other protocols are suitable for monitoring applications, such as CORBA
Çorba
Chorba , ciorbă , shurpa , shorpo , or sorpa is one of various kinds of soup or stew found in national cuisines across Middle East...
(language/OS-independent), JMX (Java-specific management and monitoring protocol), or proprietary TCP/IP or UDP protocols (language/OS independent for the most part).
Data access
Data access refers to the interface by which the monitor data can be utilized by other processes. For example, if the system monitor is a CORBAÇorba
Chorba , ciorbă , shurpa , shorpo , or sorpa is one of various kinds of soup or stew found in national cuisines across Middle East...
server, clients can connect and make calls on the monitor for current state of an element, or historical states for an element for some time period.
The system monitor may be writing data directly into a database, allowing other processes to access the database outside the context of the system monitor. This is dangerous however, as the table design for the database will dictate the potential for data-sharing. Ideally the system monitor is a wrapper for whatever persistence mechanism is used, providing a consistent and 'safe' access interface for others to access the data.
Mode
The data collection mode of the system monitor is critical. The modes are: monitor poll, agent push, and a hybrid scheme.Monitor poll
- In this mode, one or more processes in the monitoring system actually poll the system elements in some thread. During the loop, devices are polled via SNMP calls, hosts can be accessed via Telnet/SSH to execute scripts or dump files or execute other OS-specific commands, applications can be polled for state data, or their state-output-files can be dumped.
- The advantage of this mode is that there is little impact on the host/device being polled. The host's CPU is loaded only during the poll. The rest of the time the monitoring function plays no part in CPU loading.
- The main disadvantage of this mode is that the monitoring process can only do so much in its time. If polling takes too long, the intended poll-period gets elongated.
Agent push
- In agent-push mode, the monitored host is simply pushing data from itself to the system monitoring application. This can be done periodically, or by request from the system monitor asynchronously.
- The advantage of this mode is that the monitoring system's load can be reduced to simply accepting and storing data. It doesn't have to worry about timeouts for SSH calls, parsing OS-specific call results, etc.
- The disadvantage of this mode is that the logic for the polling cycle/options are not centralized at the system monitor, but distributed to each remote node. Thus changes to the monitoring logic must be pushed out to each node.
- Also, in agent-based monitoring, a host cannot inform that it is completely "down" or powered off, or if an intermediary system (such as a router) is preventing access to the system.
Hybrid mode
- The median mode between 'monitor-poll' and 'agent-push' is a hybrid approach, where the system configurationSystem ConfigurationA system configuration in systems engineering defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.Alternatively the term...
determines where monitoring occurs, either in the system monitor or agent. Thus when applications come up, they can determine for themselves what system elements they are responsible for polling. Everything however must post its monitored-data ultimately to the system monitor process.
- This is especially useful when setting up a monitoring infrastructure for the first time and not all monitoring mechanisms have been implemented. The system monitor can do all the polling in whatever simple means are available. As the agents become smarter, they can take on more of the load.