Mbox
Encyclopedia
mbox is a generic term for a family of related file format
s used for holding collections of electronic mail
messages. All messages in an mbox mailbox are concatenated and stored as plain text in a single file. The beginning of each message is indicated by a line whose first five characters consist of "From" followed by a space (the so named "From_ line" or "'From ' line" or simply "From line") and the return path e-mail address. A blank line is appended to the end of each message. For a while, the mbox format was popular because text processing tools can be readily used on the plain text files used to store the e-mail messages.
Unlike the Internet protocols used for the exchange of e-mail, the format used for the storage of e-mail has never been formally defined through the RFC
standardization mechanism and has been entirely left to the developer of an e-mail client.
mbox (RFC 4155) stores mailbox messages in their original Internet Message (RFC 2822) format, usually in files directly accessible to users. A similar format is the MH Message Handling System
. Other systems, such as Microsoft Exchange Server
and the Cyrus IMAP server store mailboxes in centralised databases managed by the mail system and not directly accessible by individual users.
The maildir
mailbox format is often cited as an alternative to the mbox format for network e-mail storage systems.
, Rahul Dhesi, and others in 1996. Each originated from a different version of Unix
. mboxcl and mboxcl2 originated from the file format used by Unix System V Release 4 mail tools. mboxrd was invented by Rahul Dhesi et al. as a rationalisation of mboxo and subsequently adopted by some Unix
mail tools including qmail
.
mboxo and mboxrd locate the message start by scanning for From lines that are typically found in the e-mail message header. If a "From " string occurs at the beginning of a line in either the headers or the body of a message (unlikely for the former for correctly formatted messages, but likely for the latter), the e-mail message must be modified before the message is stored in an mbox mailbox file or the line will be taken as a message boundary. This is typically done by prepending a greater-than sign:
>From my point of view...
In the mboxo format, this can lead to corruption of the message. If a line already contained
The mboxcl and mboxcl2 formats do not scan for the
s use a modification of the mbox format for their mail folders.
, including fcntl, lockf, and "dot locking". This does not work well with network mounted file systems, such as the Network File System (NFS).
Because more than one message is stored in a single file, some form of file locking is needed to avoid the corruption that can result from two or more processes modifying the mailbox simultaneously. This could happen if a network e-mail delivery program delivers a new message at the same time as a mail reader is deleting an existing message.
mbox files should be locked also while they are being read. Otherwise the reader may see corrupted message contents if another process is modifying the mbox at the same time, even though no actual file corruption occurs.
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...
s used for holding collections of electronic mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
messages. All messages in an mbox mailbox are concatenated and stored as plain text in a single file. The beginning of each message is indicated by a line whose first five characters consist of "From" followed by a space (the so named "From_ line" or "'From ' line" or simply "From line") and the return path e-mail address. A blank line is appended to the end of each message. For a while, the mbox format was popular because text processing tools can be readily used on the plain text files used to store the e-mail messages.
Unlike the Internet protocols used for the exchange of e-mail, the format used for the storage of e-mail has never been formally defined through the RFC
Request for Comments
In computer network engineering, a Request for Comments is a memorandum published by the Internet Engineering Task Force describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems.Through the Internet Society, engineers and...
standardization mechanism and has been entirely left to the developer of an e-mail client.
mbox (RFC 4155) stores mailbox messages in their original Internet Message (RFC 2822) format, usually in files directly accessible to users. A similar format is the MH Message Handling System
MH Message Handling System
The MH Message Handling System is a free, open source e-mail client. It is different from almost all other mail reading systems in that, instead of a single program, it is made from several different programs which are designed to work from the command line provided by the shell on Unix-like...
. Other systems, such as Microsoft Exchange Server
Microsoft Exchange Server
Microsoft Exchange Server is the server side of a client–server, collaborative application product developed by Microsoft. It is part of the Microsoft Servers line of server products and is used by enterprises using Microsoft infrastructure products...
and the Cyrus IMAP server store mailboxes in centralised databases managed by the mail system and not directly accessible by individual users.
The maildir
Maildir
The Maildir e-mail format is a common way of storing e-mail messages, where each message is kept in a separate file with a unique name, and each folder is a directory...
mailbox format is often cited as an alternative to the mbox format for network e-mail storage systems.
Family
Four popular but incompatible variants on the same idea comprise a family of mbox formats: mboxo, mboxrd, mboxcl, and mboxcl2. The naming scheme was developed by Daniel J. BernsteinDaniel J. Bernstein
Daniel Julius Bernstein is a mathematician, cryptologist, programmer, and professor of mathematics at the University of Illinois at Chicago...
, Rahul Dhesi, and others in 1996. Each originated from a different version of Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
. mboxcl and mboxcl2 originated from the file format used by Unix System V Release 4 mail tools. mboxrd was invented by Rahul Dhesi et al. as a rationalisation of mboxo and subsequently adopted by some Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
mail tools including qmail
Qmail
qmail is a mail transfer agent that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure replacement for the popular Sendmail program...
.
mboxo and mboxrd locate the message start by scanning for From lines that are typically found in the e-mail message header. If a "From " string occurs at the beginning of a line in either the headers or the body of a message (unlikely for the former for correctly formatted messages, but likely for the latter), the e-mail message must be modified before the message is stored in an mbox mailbox file or the line will be taken as a message boundary. This is typically done by prepending a greater-than sign:
>From my point of view...
In the mboxo format, this can lead to corruption of the message. If a line already contained
>From
at the beginning (such as in a quotation), it is unchanged when written. When subsequently read by the mail software, the leading >
is erroneously removed. The mboxrd format solves this by converting From
to >From
and converting >From
to >>From
, etc. The transformation is then always reversible.The mboxcl and mboxcl2 formats do not scan for the
From
line. Instead, they use a Content-Length:
header to determine each message's length.Modified mbox
Some e-mail clientE-mail client
An email client, email reader, or more formally mail user agent , is a computer program used to manage a user's email.The term can refer to any system capable of accessing the user's email mailbox, regardless of it being a mail user agent, a relaying server, or a human typing on a terminal...
s use a modification of the mbox format for their mail folders.
- EudoraEudora (e-mail client)Eudora is an e-mail client used on the Apple Macintosh and Microsoft Windows operating systems. It also supports several palmtop computing platforms, including Newton and the Palm OS....
uses an mboxo variation where a sender's e-mail address is replaced by the constant string "???@???". Most mbox clients store incoming messages as received. Eudora separates out attachments embedded in the message, storing the attachments as separate individual files in one folder. - The Mozilla family ofMozillaMozilla is a term used in a number of ways in relation to the Mozilla.org project and the Mozilla Foundation, their defunct commercial predecessor Netscape Communications Corporation, and their related application software....
MUAs (Mozilla, Netscape, Thunderbird, et al.) use an mboxrd variation with more complex From line quoting rules.
File locking
Various mutually incompatible mechanisms have been used by different mbox formats to enable message file lockingFile locking
File locking is a mechanism that restricts access to a computer file by allowing only one user or process access at any specific time. Systems implement locking to prevent the classic interceding update scenario ....
, including fcntl, lockf, and "dot locking". This does not work well with network mounted file systems, such as the Network File System (NFS).
Because more than one message is stored in a single file, some form of file locking is needed to avoid the corruption that can result from two or more processes modifying the mailbox simultaneously. This could happen if a network e-mail delivery program delivers a new message at the same time as a mail reader is deleting an existing message.
mbox files should be locked also while they are being read. Otherwise the reader may see corrupted message contents if another process is modifying the mbox at the same time, even though no actual file corruption occurs.
Further reading
- qmail mbox manual page
- Internet Mail Consortium – Standards body
- mbox format specification and variations
- RFC 4155 – The application/mbox Media Type
- mbx2eml – Free Windows program for splitting mbox files into separate e-mail files
- Free mbox to eml converter – Free Windows utility for extracting of eml files from different specific mbox formats
- MBOX Batch Processor – The Windows based tool for free conversion of multiple MBOX files with MBOX type and format auto-detection. Batch tool for MBOX email extraction.