1. We distingiush between lowercase and uppercase letters for example
words
"The" and "the" are different beause of the case of the first letter
't'.The
integer value used for the letters should be the ASCII value of the
letter.
2. You have to read the words directly from the files text1 and text2
using
filereader. No command line arguments are provided. The project files
are
executed using
java project4
3. The index 0 is the rightmost letter of the word. For example if the
word is
"Purdue" then x0 is 'e',x1 is the rightmost 'u',x2 is 'd' and so on.
4.Every punctuation mark separates a word from other.The special case
is the
hyphen('-').If two words are connected with the hyphen then that is
a single
word. For example "foo-bar" is a single word and NOT two different
words
separated by a hyphen. And each word is inserted only once.
5.Since the size of a hashtable should be a prime number, whenever the
load
factor increases above 0.95 ,you dont have to just double the the size
of
the hashtable. Instead you go as the following. You start with size
B = 71,
if you have to increase you increase it to B = 149 then to B= 307 then
to B=617
The final size of the hashtable will be one of the above values of
B only.
6.You have to calculate the avarage number of comparison for different
load factor ranges. Have five different counts, each one for a particular
range.
Whenever you find a collision in the HashTable for text1, check the
load factor
at this point and increment the appropriate variable depending on the
load factor.
7. Each time you pass the load factor of 0.95, you have to make
a new hash table
with double the number of hash entries (Use the values provided for
B ).Each time
you make a new hashtable you have to rehash the entries that
were present in old
hashtable to the new hashtable. And you have to throw away the
previous counts
(the average number of comparisons)and start recomputing as you insert
into the
new hashtable.
8. For counting the distinct words, count all the words in a file
except those which
already occured. For example, if the word 'ALICE' appears twice, count
it only once.
9. Finally you have to compute the average number of comparisons per
word
from the counts for each range. The count divided by the total number
of words
inserted in that range gives the average number for that range.
10.This project will be graded manually so you need not worry about
extra spaces or
delimiters. However, your format should look like this:
(Note: This is not a sample solution..)
Common words: Alice Jump She
Distinct words:
text1=10
text2=6
Average Number of Comparisions:
[0,0.5) =2.0
[0.5,0.65)= 3.2
[0.65,0.75)= 6.10
[0.75,0.85) =9.02
[0.85,0.95) =11