News server operation
Encyclopedia
Among the operators and users of commercial Usenet
news server
s, common concerns are the continually increasing storage and network capacity requirements and their effects. Completion (the ability of a server to successfully receive all traffic), retention (the amount of time articles are made available to readers) and overall system performance are the topics of frequent discussion. With the increasing demands, it is common for the transit and reader server roles to be subdivided further into numbering, storage and front end systems. These server farms are continually monitored by both insiders and outsiders, and measurements of these characteristics are often used by consumers when choosing a commercial news service.
. Most servers do not distinguish between single and multiple-part postings, dealing only at the level of the individual component articles.
database. The overview is a list of the most frequently used headers, and additional information such as article sizes, typically retrieved by the client software using the NNTP
XOVER command. Overviews make reading a newsgroup faster for both the client and server by eliminating the need to open each individual article to present them in list form.
If non-overview headers are required, such as for when using a kill file
, it may still be necessary to use the slower method of reading all the complete article headers. Many clients are unable to do this, and limit filtering to what is available in the summaries.
Once the farm is able to deliver the data to the network, then the provider has limited control over the speed to the user. Since the network path to each user is different, some users will have good routes and the data will flow quickly. Other users will have overloaded routers between them and the provider which will cause delays. About all a provider can do in that case is try moving the traffic through a different route. If the ISP
has limited connectivity to the network, routing changes may have little effect.
Frequently a user can reduce the impact of network problems by using multiple connections. Some servers allow as many as 60 simultaneous connections, but this varies widely. Likewise, newsreader
s are commonly limited to using as few as two or four connections.
The larger the article size, the fewer articles on each server. This generally means that a server can run with less overhead which makes for a more efficient server. This is because fewer articles reduces the overhead needed to process them. However, the larger the article size, the fewer servers the article will arrive on.
Transit server : These are the servers that handle basic article exchange. They exchange traffic with remote servers, supply articles to the numbering servers, and transmit articles posted from the local front end servers.
Numbering server (stamper) : This server inserts the RFC 1036 Xref: header into each article, so that the back and front end servers all present article lists in a uniform manner.
Back end server : This is the data storage system for the front end servers. They usually have multiple RAID disk arrays to hold the data. The provider can increase reliability by using multiple backend servers with redundant data, redundant arrays attached to the same server, or even both.
Front end server : These are the servers that a user would actually connect to. It is not unheard of for a large commercial news service provider to have more than 50 front end servers. These systems usually only store overviews locally, and retrieve article bodies from the back end servers. These systems typically carry the heaviest CPU load in the farm.
Large server farms typically also place load balancers between the front end servers and the network.
Retention is generally quoted separately for text and binary articles, though it may also vary between different groups within these categories. The times vary greatly according to the amount of storage available on the servers and continually increasing traffic. As of 2009, it is common for average news providers to have text retention of over 1000 days and binary retention of over 200 days. Large news providers offer text retention up to 2480 days and binary retention of 850 days or more. It's important to understand that retention time varies between different newsgroups within the text and binary categories.
It can be difficult for end users to accurately measure the retention of a server. One common method is to examine the oldest articles in a group and examine the Date: headers, but this is not always accurate. Some articles in a group may be retained for longer than others, articles from remote servers do not always arrive promptly, and at times the date headers are simply incorrect. A sampling of many or all articles, preferably in more than one newsgroup, is required to detect such anomalies.
The primary obstacle to calculating the completion percentage is how many articles were posted. Looking at only one server, one cannot know how many articles were actually inserted throughout the network. Articles may never make their way outside the originating server, or may fail to find their way out to the transit cloud. Very large articles are frequently dropped, and tend to propagate less well than smaller ones.
One way to measure completion is to access multiple servers and retrieve lists of articles. Because Message-ID: headers are nominally unique throughout the network, comparison of the lists is mostly a straightforward task. Practical limitations to this type of measurement include the impossibility of obtaining lists from all servers worldwide, the fact that many servers filter out spam
or employ Usenet Death Penalties
, and that some servers mask incompletion by hiding multipart binary sets with missing articles. It is also necessary to take into account propagation times and retention; an article may simply have not yet arrived at a given server, or it may have been present but already expired.
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...
news server
News server
A news server is a set of computer software used to handle Usenet articles. It may also refer to a computer itself which is primarily or solely used for handling Usenet. A reader server provides an interface to read and post articles, generally with the assistance of a news client. A transit...
s, common concerns are the continually increasing storage and network capacity requirements and their effects. Completion (the ability of a server to successfully receive all traffic), retention (the amount of time articles are made available to readers) and overall system performance are the topics of frequent discussion. With the increasing demands, it is common for the transit and reader server roles to be subdivided further into numbering, storage and front end systems. These server farms are continually monitored by both insiders and outsiders, and measurements of these characteristics are often used by consumers when choosing a commercial news service.
Articles and posts
End users often use the term "posting" to refer to a single message or file posted to Usenet. For articles containing plain text, this is synonymous with an article. For binary content such as pictures and files, it is often necessary to split the content among multiple articles. Typically through the use of numbered Subject: headers, the multiple-article postings are automatically reassembled into a single unit by the newsreaderNews client
A newsreader is an application program that reads articles on Usenet . Newsreaders act as clients which connect to a news server, via the Network News Transfer Protocol , to download articles and post new articles...
. Most servers do not distinguish between single and multiple-part postings, dealing only at the level of the individual component articles.
Headers and overviews
Each news article contains a complete set of header lines, but in common use the term "headers" is also used when referring to the News OverviewNOV (computers)
NOV, or News Overview, is a widely deployed indexing method for Usenet articles, also found in some Internet email implementations. Written in 1992 by Geoff Collyer, NOV replaced a variety of incompatible indexing schemes used in different client programs, each typically requiring custom...
database. The overview is a list of the most frequently used headers, and additional information such as article sizes, typically retrieved by the client software using the NNTP
Network News Transfer Protocol
The Network News Transfer Protocol is an Internet application protocol used for transporting Usenet news articles between news servers and for reading and posting articles by end user client applications...
XOVER command. Overviews make reading a newsgroup faster for both the client and server by eliminating the need to open each individual article to present them in list form.
If non-overview headers are required, such as for when using a kill file
Kill file
A kill file is a per-user file used by some Usenet reading programs to discard summarily articles matching some unwanted patterns of subject, author, or other header lines.Thus to add a person to one's kill file is to arrange for that person to be ignored...
, it may still be necessary to use the slower method of reading all the complete article headers. Many clients are unable to do this, and limit filtering to what is available in the summaries.
Spools
When the server stores the body of an article, it places it in a disk storage area generically called a "spool". There are several common ways in which the spool may be organized:- One file per article is the oldest storage scheme, still in common use on smaller servers and replicated in many clients. Its performance capability is a direct function of the underlying operating systemOperating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
's ability to create, remove and locate files within a directory, and often this scheme is insufficient to keep up with modern Usenet traffic. It does, however, allow for the greatest flexibility in managing the amount and location of storage used by the server. Nearly all current software using this scheme stores articles using the B NewsB NewsB News was a Usenet news server developed at the University of California, Berkeley by Matt Glickman and Mark Horton as a replacement for A News. It was used on Unix systems from 1981 into the 1990s and is the reference implementation for the de facto Usenet standard described in IETF RFC 850 and...
2.10 layout.
- Cyclical storage has been in increasingly common use since the 1990s. In this storage method, articles are appended serially to large indexed container files. When the end of the file is reached, new articles are written at the beginning of the file, overwriting the oldest entries. On some servers, this overwriting is not performed, but instead new container files are created as older ones are deleted. The major advantages of this system include predictable storage requirements if an overwriting scheme is employed, and some freedom from dependency on the underlying performance of the operating system. There is, however, less flexibility to retain articles by age rather than space used, and traditional text manipulation tools such as grepGrepgrep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p...
are less well suited to analyzing these files. Some degree of article longevity control can be exercised by directing subsets of the newsgroupNewsgroupA usenet newsgroup is a repository usually within the Usenet system, for messages posted from many users in different locations. The term may be confusing to some, because it is usually a discussion group. Newsgroups are technically distinct from, but functionally similar to, discussion forums on...
s to specific sets of container files.
- In some cases, a relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
or similar is used to contain the spool. This is most commonly seen with Internet forumInternet forumAn Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are at least temporarily archived...
software that also offers an NNTP interface.
- Some servers, such as INN, allow multiple storage schemes to be used at once. Various hybrid storage schemes have also been used in news servers, including different organizations of the file-per-article method, or smaller containers carrying perhaps 100 articles apiece.
Speed
Speed, for the purpose of this article, is how quickly a server can deliver an article to the user. The server that the user connects to is typically part of a server farm that has many servers dedicated to multiple tasks. How fast the data can move in this farm is the first thing that affects the speed of delivery.Once the farm is able to deliver the data to the network, then the provider has limited control over the speed to the user. Since the network path to each user is different, some users will have good routes and the data will flow quickly. Other users will have overloaded routers between them and the provider which will cause delays. About all a provider can do in that case is try moving the traffic through a different route. If the ISP
Internet service provider
An Internet service provider is a company that provides access to the Internet. Access ISPs directly connect customers to the Internet using copper wires, wireless or fiber-optic connections. Hosting ISPs lease server space for smaller businesses and host other people servers...
has limited connectivity to the network, routing changes may have little effect.
Frequently a user can reduce the impact of network problems by using multiple connections. Some servers allow as many as 60 simultaneous connections, but this varies widely. Likewise, newsreader
News client
A newsreader is an application program that reads articles on Usenet . Newsreaders act as clients which connect to a news server, via the Network News Transfer Protocol , to download articles and post new articles...
s are commonly limited to using as few as two or four connections.
Article sizes
Article sizes are limited to what the servers will accept. For text users this is generally not a problem. For Binary users this can be a problem since the maximum article size varies from site to site.The larger the article size, the fewer articles on each server. This generally means that a server can run with less overhead which makes for a more efficient server. This is because fewer articles reduces the overhead needed to process them. However, the larger the article size, the fewer servers the article will arrive on.
Servers
Users frequently call their service a server. In many cases this is very far from the truth. While each service is different, here is a list of the various types of server roles that a provider will have in each server farm it runs. Roles can be mixed at a given site, for example numbering and transit may be provided by the same system.Transit server : These are the servers that handle basic article exchange. They exchange traffic with remote servers, supply articles to the numbering servers, and transmit articles posted from the local front end servers.
Numbering server (stamper) : This server inserts the RFC 1036 Xref: header into each article, so that the back and front end servers all present article lists in a uniform manner.
Back end server : This is the data storage system for the front end servers. They usually have multiple RAID disk arrays to hold the data. The provider can increase reliability by using multiple backend servers with redundant data, redundant arrays attached to the same server, or even both.
Front end server : These are the servers that a user would actually connect to. It is not unheard of for a large commercial news service provider to have more than 50 front end servers. These systems usually only store overviews locally, and retrieve article bodies from the back end servers. These systems typically carry the heaviest CPU load in the farm.
Large server farms typically also place load balancers between the front end servers and the network.
Retention
Retention is simply defined as how long the server keeps articles. Most users want retention to be long enough so that they don't need to access the server every day. Conversely, overly long retention can overwhelm users with slow computers or network connections by making the article lists inordinately large.Retention is generally quoted separately for text and binary articles, though it may also vary between different groups within these categories. The times vary greatly according to the amount of storage available on the servers and continually increasing traffic. As of 2009, it is common for average news providers to have text retention of over 1000 days and binary retention of over 200 days. Large news providers offer text retention up to 2480 days and binary retention of 850 days or more. It's important to understand that retention time varies between different newsgroups within the text and binary categories.
It can be difficult for end users to accurately measure the retention of a server. One common method is to examine the oldest articles in a group and examine the Date: headers, but this is not always accurate. Some articles in a group may be retained for longer than others, articles from remote servers do not always arrive promptly, and at times the date headers are simply incorrect. A sampling of many or all articles, preferably in more than one newsgroup, is required to detect such anomalies.
Completion
Given the large number of articles transferred between servers and the large size of individual articles, their complete propagation to any one server farm is not guaranteed. The term "completion" is used to describe how well a service is keeping up with the traffic.The primary obstacle to calculating the completion percentage is how many articles were posted. Looking at only one server, one cannot know how many articles were actually inserted throughout the network. Articles may never make their way outside the originating server, or may fail to find their way out to the transit cloud. Very large articles are frequently dropped, and tend to propagate less well than smaller ones.
One way to measure completion is to access multiple servers and retrieve lists of articles. Because Message-ID: headers are nominally unique throughout the network, comparison of the lists is mostly a straightforward task. Practical limitations to this type of measurement include the impossibility of obtaining lists from all servers worldwide, the fact that many servers filter out spam
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...
or employ Usenet Death Penalties
Usenet Death Penalty
On Usenet, the Usenet Death Penalty is a final penalty that may be issued against Internet service providers or single users who produce too much spam or fail to adhere to Usenet standards...
, and that some servers mask incompletion by hiding multipart binary sets with missing articles. It is also necessary to take into account propagation times and retention; an article may simply have not yet arrived at a given server, or it may have been present but already expired.