1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
|
# C) Copyright 2002 - 2003, Creativyst, Inc. # http://www.Creativyst.com # ###########################################
sub SoundEx { my($WordString, $LengthOption) = @_; my($WordStr, $CurChar, $LastChar, $SoundExLen, $WSLen, $FirstLetter);
if($LengthOption) { $SoundExLen = $LengthOption; } if($SoundExLen > 10) { $SoundExLen = 10; } if($SoundExLen < 4) { $SoundExLen = 4; }
if(!$WordString) { return(""); }
$WordString = uc($WordString); # Clean and tidy # $WordStr = $WordString; $WordStr =~ s/[^A-Z]/ /sig; # replace non-chars with space $WordStr =~ s/^\s//sg; # remove leading space $WordStr =~ s/\s$//sg; # remove trailing space
# Some of our own improvements # $WordStr =~ s/DG/G/sg; # Change DG to G $WordStr =~ s/GH/H/sg; # Change GH to H $WordStr =~ s/KN/N/sg; # Change KN to N $WordStr =~ s/GN/N/sg; # Change GN to N $WordStr =~ s/MB/M/sg; # Change MB to M $WordStr =~ s/PH/F/sg; # Change PH to F $WordStr =~ s/TCH/CH/sg; # Change PH to F $WordStr =~ s/MP([STZ])/M$1/sg; # MP if followed by S,T,or Z $WordStr =~ s/^PS/S/sg; # Change leading PS to S $WordStr =~ s/^PF/F/sg; # Change leading PF to F
# Above improvements could # change this first letter # $FirstLetter = substr($WordStr,0,1);
# Begin Classic SoundEx # $WordStr =~ s/[AEIOUYHW]/0/sg; $WordStr =~ s/[BPFV]/1/sg; $WordStr =~ s/[CSGJKQXZ]/2/sg; $WordStr =~ s/[DT]/3/sg; $WordStr =~ s/L/4/sg; $WordStr =~ s/[MN]/5/sg; $WordStr =~ s/R/6/sg;
# Remove extra equal adjacent digits # $WSLen = length($WordStr); $LastChar = substr($WordStr, 0, 1); for($i = 1; $i < $WSLen;$i++) { $CurChar = substr($WordStr,$i,1); if($CurChar eq $LastChar) { substr($WordStr,$i,1," "); } else { $LastChar = $CurChar; } }
$WordStr = substr($WordStr,1); # Drop first letter code $WordStr =~ s/\s//sg; # remove spaces $WordStr =~ s/0//sg; # remove zeros $WordStr .= "0000000000"; # pad with zeros on right
$WordStr = "$FirstLetter$WordStr"; # Add first letter of word
$WordStr = substr($WordStr,0,$SoundExLen); # size to taste
return($WordStr); } |