Publius Publishing System
Encyclopedia
Publius is a web protocol developed by Waldman, Rubin
& Cranor for allowing individuals the ability to publish information on the web anonymously and with a high guarantee that their publications will not be censored or modified by a third party.
The nine design goals of the Publius development team are:
The Publius system relies on a static list of m web server
s. When a publisher wishes to add a contents M to the web, it first encrypts M using some random symmetric key
K. Then K is split into n shares (parts) where at least k<n shares are required for the reconstruction of K (see also Secret sharing
). A subset of the m servers receives another share of K and the encryption result of M using the key K , E(M,K).
When a retriever whishes to obtain the original contents M, it follows a generated URL
which corresponds to the contents M combined with the portion of K as it appears on a subset of servers from the list. Gathering k different shares and a copy of E(M,K) allows the retriever to reconstruct the key K out of the shares and decrypt E(M,K) back into M. Modification or removal of the server hosted contents can be issued only by the original publishers using a combination of password and the hosting server domain name.
At present, Publius supports the hosting of HTML
pages, images and other file format
s such as PDFs and PostScript
s.
When a publisher wishes to add a web contents in the Publius web, its Publius client software
(Publius Client Proxy) executes the following steps:
Diagram describing the selection of servers out of the servers list to hold encrypted contents under hashed directory names.
After the publish operation is done, each chosen server at location at the servers list holds the following files under a directory named :
When a retriever wishes to browse for a web contents in the Publius web, its Publius client software (Publius Client Proxy)executes the following steps:
The delete operation is implemented by invoking a CGI
script running over the servers. To each server the hash result of (namely, the MD5
hash result for the concatenation of the server domain name with the publisher's password) is sent along with the corresponding string and compared with the one already stored in the password file under the directory ; if there is a match, the file file is removed from that directory.
The update operation similarly uses the hashed concatenation of the server domain name with publisher's password in order to authenticate the original ownership of the hosted contents. Under this operation, the update itself is done by adding additional update file under the which contains the new Publius URL
matching for the updated contents (recall that the Publius URL is tied with the published contents and the share of the encryption key and is verified against the contents when retrieved). In fact, the update operation is equivalent to the publish operation with the addition of adding the update file to the old directory for redirecting future retrieve request to the new URL
. When a retrieve operation will be issued for the old URL
, the Publius proxy client will be redirected to fetch the new URL
, the same will be done with the rest of the k-1 chosen servers; if the k resulting URL
s do not match, then another set of k servers will be chosen for retrieval.
s. Those have the following format:
Where is the concatenation of the hash results of the original contents combined with some key share, as were described for the publish operation in the previous section. The options section of the URL
is 16 bits represented by a two characters ASCII
string, containing:
See also: Gibbs, W. Wayt: "Speech Without Accountability", Scientific American 283:4 (Oct 2000)
Avi Rubin
Aviel David Rubin a graduate of the University of Michigan and Professor of Computer Science at Johns Hopkins University, Technical Director of the Information Security Institute at Johns Hopkins, Director of ACCURATE, President and co-founder of and an expert in systems and networking security...
& Cranor for allowing individuals the ability to publish information on the web anonymously and with a high guarantee that their publications will not be censored or modified by a third party.
The nine design goals of the Publius development team are:
- Censorship resistance - decreasing the chance that a third party will manage to modify or delete the published materials.
- Tamper evident - unauthorized changes are traceable.
- Source Anonymous - there is no way to tell who published the material once it is available on the web.
- Updatable - publishers are allowed to modify or delete their material.
- Deniable - third parties participating in publishing the materials lacks the responsibility for the hosted content.
- Fault tolerant - system should function even when some involved third parties are faulty or malicious.
- Persistent - there is no expiration date for published materials.
- Extensible - support for future protocol extensions or growth in the number of publishers.
- Freely available - all software tools required for the system should be out of charge.
Overview
The Publius web system consists of the following agents:- Publishers - participants who publish their contents on the web.
- Servers - which host the publishers contents on the web (considered as part of the third parties).
- Retrievers - participants who browse the web contents published by the publishers.
The Publius system relies on a static list of m web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....
s. When a publisher wishes to add a contents M to the web, it first encrypts M using some random symmetric key
Symmetric-key algorithm
Symmetric-key algorithms are a class of algorithms for cryptography that use trivially related, often identical, cryptographic keys for both encryption of plaintext and decryption of ciphertext. The encryption key is trivially related to the decryption key, in that they may be identical or there is...
K. Then K is split into n shares (parts) where at least k<n shares are required for the reconstruction of K (see also Secret sharing
Secret sharing
Secret sharing refers to method for distributing a secret amongst a group of participants, each of whom is allocated a share of the secret. The secret can be reconstructed only when a sufficient number of shares are combined together; individual shares are of no use on their own.More formally, in a...
). A subset of the m servers receives another share of K and the encryption result of M using the key K , E(M,K).
When a retriever whishes to obtain the original contents M, it follows a generated URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
which corresponds to the contents M combined with the portion of K as it appears on a subset of servers from the list. Gathering k different shares and a copy of E(M,K) allows the retriever to reconstruct the key K out of the shares and decrypt E(M,K) back into M. Modification or removal of the server hosted contents can be issued only by the original publishers using a combination of password and the hosting server domain name.
At present, Publius supports the hosting of HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
pages, images and other file format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...
s such as PDFs and PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...
s.
Operations
The Publius protocol allows the following operations:- Publish - in which a publisher spreads its contents across the Publius web servers.
- Retrieve - in which a retriever follows a specific URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
to gather desired contents. - Update - in which a publisher replaces its server hosted contents file by another.
- Delete - in which a publisher removes its server hosted file.
When a publisher wishes to add a web contents in the Publius web, its Publius client software
Client (computing)
A client is an application or system that accesses a service made available by a server. The server is often on another computer system, in which case the client accesses the service by way of a network....
(Publius Client Proxy) executes the following steps:
- Random symmetric keySymmetric-key algorithmSymmetric-key algorithms are a class of algorithms for cryptography that use trivially related, often identical, cryptographic keys for both encryption of plaintext and decryption of ciphertext. The encryption key is trivially related to the decryption key, in that they may be identical or there is...
K is generated. - The original content M is encrypted under Symmetric-key algorithmSymmetric-key algorithmSymmetric-key algorithms are a class of algorithms for cryptography that use trivially related, often identical, cryptographic keys for both encryption of plaintext and decryption of ciphertext. The encryption key is trivially related to the decryption key, in that they may be identical or there is...
with the key K. Resulting with the encryption E(M,K). - K is split into n shares using Shamir's Secret SharingShamir's Secret SharingShamir's Secret Sharing is an algorithm in cryptography. It is a form of secret sharing, where a secret is divided into parts, giving each participant its own unique part, where some of the parts or all of them are needed in order to reconstruct the secret....
method in such that at least k<n shares are required for the reconstruction of K under the method of interpolationInterpolationIn the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....
. - For each of the n shares, the following computation takes place: where is the concatenation result of the original contents M with the key share ; H is the MD5MD5The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
cryptographic hash functionCryptographic hash functionA cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...
and wrap is the bitwise xor result of the two halves of the string which returned by H. - The hosting servers are chosen out of the servers list; the chosen locations in the servers list are determined by in order to obtain n values in the range [1,m]. If less than k unique locations were found, this step is repeated till unique locations are found.
- In each server which appears in the servers list at a directory named is created containing the encrypted contents , the chosen server's share of key K (namely, ) and additional information(a password file containing the MD5MD5The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
hash value of the chosen server domain nameDomain nameA domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....
concatenated with a user chosen password used for authentication when a publisher wishes to update or remove its contents from the server). - A unique Publius URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
is constructed by concatenation of the d different identifiers of the servers containing the encrypted contents M and a key share of K.
Diagram describing the selection of servers out of the servers list to hold encrypted contents under hashed directory names.
After the publish operation is done, each chosen server at location at the servers list holds the following files under a directory named :
- file - which contains encrypted E(M,K) contents of the original contents M.
- share - which holds the share of the chosen server of the encryption key K (namely, ).
- password - which holds the MD5MD5The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
hash value for the concatenation of the server domain name with a user chosen password. This is used for authentication for delete or update operations initiated by the publisher for the contents hosted by the chosen server.
When a retriever wishes to browse for a web contents in the Publius web, its Publius client software (Publius Client Proxy)executes the following steps:
- The URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
is parsed back into 8 bytes units (which are the units which were concatenated during the publish process). - For each unit parsed out of the Publius URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
, the hosting server is located from the servers list by computing which indicates on the server's location in the list. - k servers are chosen arbitrarily out of the located servers in order to reconstruct the key K using an interpolationInterpolationIn the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....
over the retrieved k shares, one from each chosen server. - Among those k chosen servers, one is chosen for retrieving the encrypted contents E(M,K). This is issued using an HTTP GET request to the server for a file named file stored in the server directory named .
- The k shares of the key K are fetched in a similar way, known to be located in a server file named share under the directory.
- The original message is decrypted from E(M,K) using the reconstructed key K.
- The retriever then verifies that the contents M wasn't modified nor did the key share by recomputing and comparing it with the corresponding chunk which was parsed from the Publius URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
.- If a mismatch was found, another set of k servers can be tried, or maybe the contents should have been downloaded from another server.
- If verified successfully, the original contents M can be viewed by the web browser.
The delete operation is implemented by invoking a CGI
Common Gateway Interface
The Common Gateway Interface is a standard method for web servers software to delegate the generation of web pages to executable files...
script running over the servers. To each server the hash result of (namely, the MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
hash result for the concatenation of the server domain name with the publisher's password) is sent along with the corresponding string and compared with the one already stored in the password file under the directory ; if there is a match, the file file is removed from that directory.
The update operation similarly uses the hashed concatenation of the server domain name with publisher's password in order to authenticate the original ownership of the hosted contents. Under this operation, the update itself is done by adding additional update file under the which contains the new Publius URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
matching for the updated contents (recall that the Publius URL is tied with the published contents and the share of the encryption key and is verified against the contents when retrieved). In fact, the update operation is equivalent to the publish operation with the addition of adding the update file to the old directory for redirecting future retrieve request to the new URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
. When a retrieve operation will be issued for the old URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
, the Publius proxy client will be redirected to fetch the new URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
, the same will be done with the rest of the k-1 chosen servers; if the k resulting URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
s do not match, then another set of k servers will be chosen for retrieval.
Publius URLs
Encrypted web contents in the Publius protocol are traceable by their Publius URLUniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
s. Those have the following format:
Where is the concatenation of the hash results of the original contents combined with some key share, as were described for the publish operation in the previous section. The options section of the URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
is 16 bits represented by a two characters ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
string, containing:
- version number - which allows the extension of the protocol and backwards compatibilityBackward compatibilityIn the context of telecommunications and computing, a device or technology is said to be backward or downward compatible if it can work with input generated by an older device...
when interacting between different versions. - number of shares required for key reconstruction.
- update flag - which its purpose will be mentioned in the next section.
Security Analysis
- In order for a Publius contents (or its update) to become inaccessible, it is required that all of the n copies of the encrypted contents will be deleted or corrupted so they will fail the verification against their URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
. Another possibility is by losing more than n-k of the encryption key K shares; leaving us with less than k shares which are insufficient for the reconstruction of K. Choosing high values for n and low values for k guarantees low chance of inaccessible contents to occur. - Tampering in the form of inserting false updates by third parties can become unlikely as the value for k increases. Larger value for k means that more shares will participate in the verification of the retrieved contetnts and by doing so, increasing the chance that such modification will be noticed.
- The update flag field in the Publius URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
is useful for prevention from redirecting to a false updates; even if a false update was added by a third party, it will be ignored as long as the update flag in the URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
is set to zero.
External links
See also: Gibbs, W. Wayt: "Speech Without Accountability", Scientific American 283:4 (Oct 2000)