The Caverphone within
linguistics and
computing, is a
phonetic matching algorithm[1][2] invented to identify English names with their sounds, originally built to process a custom dataset compound between 1893 and 1938 in southern
Dunedin, New Zealand.[3] Started from a similar concept as
metaphone, it has been developed to accommodate and process general English since then.[3]
Etymology
The Caverphone was created by David Hood in the
Caversham Project at the
University of Otago in
New Zealand in 2002, revised in 2004. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches. The algorithm is optimised for accents present in the study area (southern part of the city of
Dunedin, New Zealand).
Procedure
Caverphone 1.0
The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.
The algorithm is as follows:
Convert to lowercase
Remove anything not A-Z
If the name starts with...
cough, replace it by cou2f
rough, replace it by rou2f
tough, replace it by tou2f
enough, replace it by enou2f
gn, replace it by 2n
If the name ends with
mb, replace it by m2
Replace
cq with 2q
ci with si
ce with se
cy with sy
tch with 2ch
c with k
q with k
x with k
v with f
dg with 2g
tio with sio
tia with sia
d with t
ph with fh
b with p
sh with s2
z with s
any initial vowel with an A
all other vowels with a 3
3gh3 with 3kh3
gh with 22
g with k
groups of the letter s with a S
groups of the letter t with a T
groups of the letter p with a P
groups of the letter k with a K
groups of the letter f with a F
groups of the letter m with a M
groups of the letter n with a N
w3 with W3
wy with Wy
wh3 with Wh3
why with Why
w with 2
any initial h with an A
all other occurrences of h with a 2
r3 with R3
ry with Ry
r with 2
l3 with L3
ly with Ly
l with 2
j with y
y3 with Y3
y with 2
remove all
2
3
put six 1 on the end
take the first six characters as the code
Caverphone 2.0
Start with a word
Convert to lowercase
Remove anything not in the standard alphabet (typically a-z)[note 1]
^Phua, Clifton; Lee, Vincent; Smith, Kate (2006). "The Personal Name Problem And a Recommended Data Mining Solution". Encyclopedia of Data Warehousing and Mining.
CiteSeerX10.1.1.127.5111.