[From IBM document C20-1707-0, Identification Techniques, 1969.]
Sounds-Like Codes
Under this type of system, in order to overcome the problem
of variation in the spelling of names and provide a "blocking
factor" for organizing a name file, the records are filed under
a phonetically coded version of the individual's name.
Filing by the phonetic code collects records of names that sound alike.
Disadvantages are the similar indexing of names that do no actually
sounds alike and very poor alphabetic sequence of the file. Different
types of sounds-alike coding systems are described below.
Alphanumeric Coding
The most commonly used phonetic coding technique adds a
three-digit numeric code to the first letter of the last name.
This phonetic code uses a coding table (Figure 16) and a set of
syntactical coding rules as shown in the next column.
Value |
Letter |
Phonic Category |
1 | B,F,P,V | (Labials) |
2 | C,G,J,K,Q,S,X,Z | (Gutterals and sibilants) |
3 | D,T | (Dentals) |
4 | L | (Long liquid) |
5 | M,N | (Nasals) |
6 | R | (Short liquid) |
Figure 16. A phonetic coding table
The basic coding rules are:
- Vowels in positions other than the first position of the name are
given no value; therefore A, E, I, O, U, and Y
are not coded.
- Consonants that sound alike are given the same numeric value.
- Consonants H and W are given no value;
- Double letters are coded as a single letter (for example,
Rugge = RG = R200).
- When two or more letters with the same numerical value
are together, they are considered as one letter (for example,
Scott = ST = S300, because C has the same value as S, and
two T's are considered as a single T; Stock = STC = S320,
since K has the same value as C).
- Names with internal spacing or punctuation (for example,
D'Amico = DMC = D520; Von Zann = VNZN = V525).
- If the name is too long for the code, only the first three valid
consonants after the initial letter are coded
(e.g. Anderson = ANDRSN = ANDR = A536).
- if the first letter of the first name or middle initial is A, E,
I, O, U, Y, W, or H, it is coded as zero. If there is no middle initial,
the code is zero (e.g., Day, Abner = D = (A)BN = D000-015).
Example: This code made up of one alphabetic
character and three digits for last name, three digits for
first name and one digit for middle initial would be as follows:
Johnson, Thomas J. = JNSN TMS J. = J525-352-2
Simple Numeric Coding
Another technique is entirely numeric and offers fewer
synonyms by using fifteen values rather than six. It uses
essentially the same basic rules as indicated above, except
that A, E, I, O, U, Y, W and H are coded when they are
the first letter of the last name, first name, or middle initial
(Figure 17). In addition, a set of constant multipliers is
used to convert the letter value to digits (Figure 18). The
products of these calculations are then summed to give the
identification code.
Value |
Letter |
0 | Nothing (blank) |
1 | B |
2 | C |
3 | D |
4 | F |
5 | G |
6 | J |
7 | K, Q |
8 | L |
9 | M |
10 | N |
11 | P, V |
12 | R |
13 | S |
14 | T |
15 | X, Z |
|
16 | A (Use only |
17 | E for first |
18 | H letter last |
19 | I name, first |
20 | O letter first |
21 | U name, and |
22 | W middle |
23 | Y initial.) |
Figure 17. Phonetic coding table (The Computable Name Code)
Value |
Letter |
41,000,000 | First letter last name |
2,560,000 | First consonant last name |
160,000 | Second consonant last name |
10,000 | Third consonant last name |
400 | First letter first name |
25 | First consonant first name |
1 | Middle initial |
Figure 18. Value table for constant multipliers
(The Computable Name Code)
Example: The code is made up of first letter and three
consonants of last name, first letter aand one consonant of
first name, and middle initial.
Johnson, Thomas J. = JNSN TM J.
J = 6 × 41,000,000 = 246,000,000
N = 10 × 2,560,000 = 25,600,000
S = 13 × 160,000 = 2,080,000
N = 10 × 10,000 = 100,000
T = 14 × 400 = 5,600
M = 9 × 25 = 225
J = 6 × 1 = 6
"Thomas J. Johnson" becomes 273,785,831
A variation of the approach adds the last letter of first
and last name to the value table for constant multipliers,
creating a twelve-position identification code.
Numeric Frequency Coding
Another system attempts to distribute better the
number of names to be scanned in any one sounds-like
grouping. The sounds-like coding technique tends to
distribute names disproportionately. To achieve a better
balance, phonetic frequency coding equates infrequently
used letters with more frequent letters by assigning them
a numeric value. Frequency of letter occurrence was
measured by counting pages of the San Francisco telephone
directory. The coding tables for this code are given in
Figures 19 and 20.
The phonetic frequency coding rules are as follows:
- Initial phonetic scan
- Make double letters single, and CK=K, DT=T, SC=S,
KN=N, MN=N.
- Eliminate vowels, W, H, and Y.
- Coding scan
- Replace first letter of name with appropriate value
from phonetic frequency table 1 (Figure 19).
- Compare the last two or three letters of last name
with the phonetic frequency common suffix table (Figure 20)
and, if a match is found, use the value as a second digit.
If no match is found, use the value found from phonetic
frequency value table 3 searching by the last phonetic letter
of last name.
- Determine the last three phonetic digits from
phonetic frequency value table 2, using the first
letter of first name and rescanning the second and
third letters of last name.
Table 1 |
Table 2 |
Table 3 |
|
0 | S, Z |
0 | S, Z |
0 | B, C, K, Q, V |
1 | C, K, Q, V |
1 | C, K, Q |
1 | D, T |
2 | F, P, U, W |
2 | F, P, X |
2 | F, L, P |
3 | A, B |
3 | A, B |
3 | G, J, X |
4 | L, O, R |
4 | O, R |
4 | M, N |
5 | D, H, I |
5 | D, H, I |
5 | R |
6 | E, M, N, X |
6 | M, N |
6 | S, Z |
7 | G, J, T, Y |
7 | G, J, T, Y |
7 | A, E, H, I, O, U, W, Y |
8 | |
8 | U, V, W |
8 | |
9 | |
9 | E, L |
9 | |
Figure 19. Phonetic frequency value tables.
Value |
Suffix |
3 | TRS |
7 | DRS, MN, TR |
8 | PRS, SN, STN |
9 | SR, STR, TD, TN |
Figure 20. Phonetic frequency common suffix table.
Example: The code is made up of one digit for first letter
of last name, one digit for suffix or last letter of last name,
one digit for first letter of first name, and two digits for
second and third letters of last name.
Johnson, Thomas J. |
Initial phonetic scan: | JNSN, T |
Coding Scan: | J |
(first letter last name - Table 1) | = 7 |
| SN |
(suffix - common suffix table) | = 8 |
| T |
(first letter first name - table 2) | = 7 |
| N |
(second letter last name - table 2) | = 6 |
| S |
(third letter last name - table 2) | = 0 |
| |
Total | = 78760 |
If the name were spelled Johnzon, Thomas J., coding would be as
shown below.
Johnzon, Thomas J. |
Initial phonetic scan: | JNZN, T |
Coding Scan: | J |
(first letter last name - Table 1) | = 7 |
| N |
(last letter last name - table 3
since neither "ZN" or "TZN"
are common suffixes) | = 4 |
| T |
(first letter first name - table 2) | = 7 |
| N |
(second letter last name - table 2) | = 6 |
| Z |
(third letter last name - table 2) | = 0 |
| |
Total | = 74760 |
Original scans:
|