The purpose of this project is to learn the basics of lisp, before we have
considered core areas of AI, such as search or KR.
The program involves writing a spelling correction system that uses a Soundex
function (that I will provide) as the basis for suggesting candidates. Soundex
maps words onto an alphanumeric code such that words that have similar
phonemes (sounds) get mapped to the same code. It is good for correcting
genuine mis-spellings, rather than typos.
In your code, you are free to use any common lisp function or data type. In fact,
before you consider writing any function I encourage you to search the lisp
documentation online to see if it already exists. Most will. In general, the built in
functions will run faster than those that you write yourself.
Functions you will need to write:
1. load_dictionary
Args: a string that is the name of a file
Body: Given a file where each line contains a single word followed by
an integer corresponding to the frequency in a corpus,
it should build the following two tables and return the number of words
(that is lines) read.
*spelling_dict* which is a table with an entry for each word, where the
word is the key and the frequency is the value
*soundex_dict* which is a table with an entry for each soundex code,
where the code is the key and the value is the list of words from
the *spelling_dict* that map to that key
EXAMPLE:
(load_dictionary "test1000.txt")
should return 1000. Sample entries are noted below.
The value for 'the in the *spelling_dict* table should be 23135851162
The value for 'the in the *soundex_dict* table should be a list with
the following elements (possibly in a different order):
(T THE THEY TO TOO TWO)
2. lookup
Args: a string corresponding to word
Body: Confirms that the word is in the dictionary and returns the stored
frequency, if the word is there, and NIL otherwise
EXAMPLE:
After loading "test1000.txt",
(lookup "the") should return
23135851162
3. correctSX
Args: a string corresponding to a word
Body: Calculates and returns a list of candidate respellings, based on the
Soundex equivalent words, sorted alphabetically. Returns NIL if there were no
soundex equivalents found
EXAMPLE:
After loading "test1000.txt"
(correctSX "thim") should return
(TEAM TEEN THAN THEM THEN TIME TOWN)
4. correctSX_SIM
Args: A string corresponding to a possible word
Body: Calculates and returns a list of candidate respellings, based on the
Soundex equivalent words, sorted by their similarity to the provided
word where similarity is defined as the number of letters in
common. For example: the similarity "cat" "tic" is 2. It returns NIL if
no soundex equivalents are found
EXAMPLE:
After loading "test1000.txt", (correctSX_SIM "thim") should return
(THEM TIME TEAM THEN THAN TOWN TEEN)
Assessment criteria:
Correctness
Your code should load and run correctly. The input and output must match the
specification exactly so that it can be tested using another lisp program.
Incorrectly formatted input or ouput values will prevent the entire function from
being counted as correct.
Documentation and Testing
Your code should have internal comments to explain what each function is for
and what types of arguments it expects.
Test your program using enough cases to show that it works for different size
puzzles and search strategies. To submit multiple examples of input and output
files, after you do a test run, you should manually rename the input and output
files, e.g.
Mp1.in_test1 mp1.out_test1
You can save examples of output created interactively using “dribble”, which
toggles between opening and closing a log stream, e.g. (dribble “mp1.out_test1”)
<rest of lisp session> (dribble).
In a separate document, you should provide a brief description of your program
design (mentioning any of the built-in lisp functions you used) and any limitations
or deviations of your program from the assignment writeup.
Deliverables
Your solution should be in a file called mp1.lisp. If you wish, you may divide your
code into multiple files, but your mp1 file should load all the other files, in the
correct order so that your solution can be run using a single command.