PGP word list
Encyclopedia
The PGP Word List is a list of word
s for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO phonetic alphabet
used by pilots, except a longer list of words is used, each word corresponding to one of the 256 unique numeric byte values.
, a computational linguist, and Philip Zimmermann, creator of PGP
. The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme
space. The candidate word lists were randomly drawn from Grady Ward
's Moby Pronunciator
list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a DEC Alpha
, a particularly fast machine in that era.
The Zimmermann/Juola list was originally designed to be used in PGPfone
, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a man-in-the-middle attack
(MiTM). It was called a biometric word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP
to compare and verify PGP public key fingerprints over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by Jon Callas
. More recently, it has been used in Zfone
and the ZRTP
protocol, the successor to PGPfone.
The list is actually composed of two lists, each containing 256 phonetically
distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the even list has words of two syllables, the odd list has three. The two lists have a maximum word length of 9 and 11 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart.
A PGP public key fingerprint that displayed in hexadecimal as
would display in PGP Words (the "biometric" fingerprint) as
The order of bytes in a bytestring depends on Endianness
.
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...
s for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO phonetic alphabet
NATO phonetic alphabet
The NATO phonetic alphabet, more accurately known as the NATO spelling alphabet and also called the ICAO phonetic or spelling alphabet, the ITU phonetic alphabet, and the international radiotelephony spelling alphabet, is the most widely used spelling alphabet...
used by pilots, except a longer list of words is used, each word corresponding to one of the 256 unique numeric byte values.
History and structure
The PGP Word List list was designed in 1995 by Patrick JuolaPatrick Juola
Dr. Patrick Juola is a professor of computer science at Duquesne University and an expert in the field of computer linguistics and computer security. He is credited with co-creating the original biometric word list. Dr...
, a computational linguist, and Philip Zimmermann, creator of PGP
Pretty Good Privacy
Pretty Good Privacy is a data encryption and decryption computer program that provides cryptographic privacy and authentication for data communication. PGP is often used for signing, encrypting and decrypting texts, E-mails, files, directories and whole disk partitions to increase the security...
. The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme
Phoneme
In a language or dialect, a phoneme is the smallest segmental unit of sound employed to form meaningful contrasts between utterances....
space. The candidate word lists were randomly drawn from Grady Ward
Grady Ward
William Grady Ward is an American software engineer, lexicographer, and Internet activist who has featured prominently in the Scientology versus the Internet controversy....
's Moby Pronunciator
Moby Project
The Moby Project is a collection of public-domain lexical resources. It was created by Grady Ward. The resources were dedicated to the public domain, and are now mirrored at Project Gutenberg...
list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a DEC Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...
, a particularly fast machine in that era.
The Zimmermann/Juola list was originally designed to be used in PGPfone
PGPfone
PGPfone was a secure voice telephony system developed by Philip Zimmermann in 1995. The PGPfone protocol had little in common with Zimmermann's popular PGP email encryption package, except for the use of the name. It used ephemeral Diffie-Hellman protocol to establish a session key, which was...
, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a man-in-the-middle attack
Man-in-the-middle attack
In cryptography, the man-in-the-middle attack , bucket-brigade attack, or sometimes Janus attack, is a form of active eavesdropping in which the attacker makes independent connections with the victims and relays messages between them, making them believe that they are talking directly to each other...
(MiTM). It was called a biometric word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP
Pretty Good Privacy
Pretty Good Privacy is a data encryption and decryption computer program that provides cryptographic privacy and authentication for data communication. PGP is often used for signing, encrypting and decrypting texts, E-mails, files, directories and whole disk partitions to increase the security...
to compare and verify PGP public key fingerprints over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by Jon Callas
Jon Callas
Jon Callas is an American computer security expert and Chief Technical Officer of Entrust. Callas has a long history of work in the computer security field, and is a frequent speaker at industry conferences. Additionally, Callas is a contributor to multiple IETF RFCs...
. More recently, it has been used in Zfone
Zfone
Zfone is software for secure voice communication over the Internet , using the ZRTP protocol. It is created by Phil Zimmermann, the creator of the PGP encryption software. Zfone works on top of existing SIP- and RTP-programs, but should work with any SIP- and RTP-compliant VoIP-program.Zfone turns...
and the ZRTP
ZRTP
ZRTP is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over Internet Protocol phone telephony call based on the Real-time Transport Protocol. It uses Diffie-Hellman key exchange and the Secure Real-time Transport Protocol for...
protocol, the successor to PGPfone.
The list is actually composed of two lists, each containing 256 phonetically
Phonetics
Phonetics is a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent aspects of sign. It is concerned with the physical properties of speech sounds or signs : their physiological production, acoustic properties, auditory...
distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the even list has words of two syllables, the odd list has three. The two lists have a maximum word length of 9 and 11 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart.
Hex | |Even Word | |Odd Word |
---|---|---|
00 | aardvark | adroitness |
01 | absurd | adviser |
02 | accrue | aftermath |
03 | acme | aggregate |
04 | adrift | alkali |
05 | adult | almighty |
06 | afflict | amulet |
07 | ahead | amusement |
08 | aimless | antenna |
09 | Algol | applicant |
0A | allow | Apollo |
0B | alone | armistice |
0C | ammo | article |
0D | ancient | asteroid |
0E | apple | Atlantic |
0F | artist | atmosphere |
10 | assume | autopsy |
11 | Athens | Babylon |
12 | atlas | backwater |
13 | Aztec | barbecue |
14 | baboon | belowground |
15 | backfield | bifocals |
16 | backward | bodyguard |
17 | banjo | bookseller |
18 | beaming | borderline |
19 | bedlamp | bottomless |
1A | beehive | Bradbury |
1B | beeswax | bravado |
1C | befriend | Brazilian |
1D | Belfast | breakaway |
1E | berserk | Burlington |
1F | billiard | businessman |
20 | bison | butterfat |
21 | blackjack | Camelot |
22 | blockade | candidate |
23 | blowtorch | cannonball |
24 | bluebird | Capricorn |
25 | bombast | caravan |
26 | bookshelf | caretaker |
27 | brackish | celebrate |
28 | breadline | cellulose |
29 | breakup | certify |
2A | brickyard | chambermaid |
2B | briefcase | Cherokee |
2C | Burbank | Chicago |
2D | button | clergyman |
2E | buzzard | coherence |
2F | cement | combustion |
30 | chairlift | commando |
31 | chatter | company |
32 | checkup | component |
33 | chisel | concurrent |
34 | choking | confidence |
35 | chopper | conformist |
36 | Christmas | congregate |
37 | clamshell | consensus |
38 | classic | consulting |
39 | classroom | corporate |
3A | cleanup | corrosion |
3B | clockwork | councilman |
3C | cobra | crossover |
3D | commence | crucifix |
3E | concert | cumbersome |
3F | cowbell | customer |
Hex | |Even Word | |Odd Word |
---|---|---|
40 | crackdown | Dakota |
41 | cranky | decadence |
42 | crowfoot | December |
43 | crucial | decimal |
44 | crumpled | designing |
45 | crusade | detector |
46 | cubic | detergent |
47 | dashboard | determine |
48 | deadbolt | dictator |
49 | deckhand | dinosaur |
4A | dogsled | direction |
4B | dragnet | disable |
4C | drainage | disbelief |
4D | dreadful | disruptive |
4E | drifter | distortion |
4F | dropper | document |
50 | drumbeat | embezzle |
51 | drunken | enchanting |
52 | Dupont | enrollment |
53 | dwelling | enterprise |
54 | eating | equation |
55 | edict | equipment |
56 | egghead | escapade |
57 | eightball | Eskimo |
58 | endorse | everyday |
59 | endow | examine |
5A | enlist | existence |
5B | erase | exodus |
5C | escape | fascinate |
5D | exceed | filament |
5E | eyeglass | finicky |
5F | eyetooth | forever |
60 | facial | fortitude |
61 | fallout | frequency |
62 | flagpole | gadgetry |
63 | flatfoot | Galveston |
64 | flytrap | getaway |
65 | fracture | glossary |
66 | framework | gossamer |
67 | freedom | graduate |
68 | frighten | gravity |
69 | gazelle | guitarist |
6A | Geiger | hamburger |
6B | glitter | Hamilton |
6C | glucose | handiwork |
6D | goggles | hazardous |
6E | goldfish | headwaters |
6F | gremlin | hemisphere |
70 | guidance | hesitate |
71 | hamlet | hideaway |
72 | highchair | holiness |
73 | hockey | hurricane |
74 | indoors | hydraulic |
75 | indulge | impartial |
76 | inverse | impetus |
77 | involve | inception |
78 | island | indigo |
79 | jawbone | inertia |
7A | keyboard | infancy |
7B | kickoff | inferno |
7C | kiwi | informant |
7D | klaxon | insincere |
7E | locale | insurgent |
7F | lockup | integrate |
Hex | |Even Word | |Odd Word |
---|---|---|
80 | merit | intention |
81 | minnow | inventive |
82 | miser | Istanbul |
83 | Mohawk | Jamaica |
84 | mural | Jupiter |
85 | music | leprosy |
86 | necklace | letterhead |
87 | Neptune | liberty |
88 | newborn | maritime |
89 | nightbird | matchmaker |
8A | Oakland | maverick |
8B | obtuse | Medusa |
8C | offload | megaton |
8D | optic | microscope |
8E | orca | microwave |
8F | payday | midsummer |
90 | peachy | millionaire |
91 | pheasant | miracle |
92 | physique | misnomer |
93 | playhouse | molasses |
94 | Pluto | molecule |
95 | preclude | Montana |
96 | prefer | monument |
97 | preshrunk | mosquito |
98 | printer | narrative |
99 | prowler | nebula |
9A | pupil | newsletter |
9B | puppy | Norwegian |
9C | python | October |
9D | quadrant | Ohio |
9E | quiver | onlooker |
9F | quota | opulent |
A0 | ragtime | Orlando |
A1 | ratchet | outfielder |
A2 | rebirth | Pacific |
A3 | reform | pandemic |
A4 | regain | Pandora |
A5 | reindeer | paperweight |
A6 | rematch | paragon |
A7 | repay | paragraph |
A8 | retouch | paramount |
A9 | revenge | passenger |
AA | reward | pedigree |
AB | rhythm | Pegasus |
AC | ribcage | penetrate |
AD | ringbolt | perceptive |
AE | robust | performance |
AF | rocker | pharmacy |
B0 | ruffled | phonetic |
B1 | sailboat | photograph |
B2 | sawdust | pioneer |
B3 | scallion | pocketful |
B4 | scenic | politeness |
B5 | scorecard | positive |
B6 | Scotland | potato |
B7 | seabird | processor |
B8 | select | provincial |
B9 | sentence | proximate |
BA | shadow | puberty |
BB | shamrock | publisher |
BC | showgirl | pyramid |
BD | skullcap | quantity |
BE | skydive | racketeer |
BF | slingshot | rebellion |
Hex | |Even Word | |Odd Word |
---|---|---|
C0 | slowdown | recipe |
C1 | snapline | recover |
C2 | snapshot | repellent |
C3 | snowcap | replica |
C4 | snowslide | reproduce |
C5 | solo | resistor |
C6 | southward | responsive |
C7 | soybean | retraction |
C8 | spaniel | retrieval |
C9 | spearhead | retrospect |
CA | spellbind | revenue |
CB | spheroid | revival |
CC | spigot | revolver |
CD | spindle | sandalwood |
CE | spyglass | sardonic |
CF | stagehand | Saturday |
D0 | stagnate | savagery |
D1 | stairway | scavenger |
D2 | standard | sensation |
D3 | stapler | sociable |
D4 | steamship | souvenir |
D5 | sterling | specialist |
D6 | stockman | speculate |
D7 | stopwatch | stethoscope |
D8 | stormy | stupendous |
D9 | sugar | supportive |
DA | surmount | surrender |
DB | suspense | suspicious |
DC | sweatband | sympathy |
DD | swelter | tambourine |
DE | tactics | telephone |
DF | talon | therapist |
E0 | tapeworm | tobacco |
E1 | tempest | tolerance |
E2 | tiger | tomorrow |
E3 | tissue | torpedo |
E4 | tonic | tradition |
E5 | topmost | travesty |
E6 | tracker | trombonist |
E7 | transit | truncated |
E8 | trauma | typewriter |
E9 | treadmill | ultimate |
EA | Trojan | undaunted |
EB | trouble | underfoot |
EC | tumor | unicorn |
ED | tunnel | unify |
EE | tycoon | universe |
EF | uncut | unravel |
F0 | unearth | upcoming |
F1 | unwind | vacancy |
F2 | uproot | vagabond |
F3 | upset | vertigo |
F4 | upshot | Virginia |
F5 | vapor | visitor |
F6 | village | vocalist |
F7 | virus | voyager |
F8 | Vulcan | warranty |
F9 | waffle | Waterloo |
FA | wallet | whimsical |
FB | watchword | Wichita |
FC | wayside | Wilmington |
FD | willow | Wyoming |
FE | woodlark | yesteryear |
FF | Zulu | Yucatan |
Examples
Each byte in a bytestring is encoded as a single word. A sequence of bytes is rendered in network byte order, from left to right. For example, the leftmost (i.e. byte 0) is considered "even" and is encoded using the PGP Even Word table. The next byte to the right (i.e. byte 1) is considered "odd" and is encoded using the PGP Odd Word table. This process repeats until all bytes are encoded. Thus, "E582" produces "topmost Istanbul", whereas "82E5" produces "miser travesty".A PGP public key fingerprint that displayed in hexadecimal as
E582 94F2 E9A2 2748 6E8B
061B 31CC 528F D7FA 3F19
would display in PGP Words (the "biometric" fingerprint) as
topmost Istanbul Pluto vagabond
treadmill Pacific brackish dictator
goldfish Medusa afflict bravado
chatter revolver Dupont midsummer
stopwatch whimsical cowbell bottomless
The order of bytes in a bytestring depends on Endianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
.
Other word lists for data
There are several other word lists for conveying data in a clear unambiguous way via a voice channel:- the NATO phonetic alphabetNATO phonetic alphabetThe NATO phonetic alphabet, more accurately known as the NATO spelling alphabet and also called the ICAO phonetic or spelling alphabet, the ITU phonetic alphabet, and the international radiotelephony spelling alphabet, is the most widely used spelling alphabet...
maps individual letters and digits to individual words - the S/KEYS/KEYS/KEY is a one-time password system developed for authentication to Unix-like operating systems, especially from dumb terminals or untrusted public computers on which one does not want to type a long-term password. A user's real password is combined in an offline device with a short set of...
system maps 64 bit numbers to 6 short words of 1 to 4 characters each from a publicly accessible 2048-word dictionary. The same dictionary is used in RFC 2289. - the DicewareDicewareDiceware is a method for creating passphrases, passwords, and other cryptographic variables using ordinary dice as a hardware random number generator. For each word in the passphrase, five dice rolls are required. The numbers that come up in the rolls are assembled as a five digit number, e.g....
system maps 5 base-6 random digits (almost 13 bits of entropy) to a word from a dictionary of 7,776 unique words. - FIPS 181: Automated Password Generator converts random numbers into somewhat pronounceable "words".
- mnemonic encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words.