Sunteți pe pagina 1din 5

Computer Science 2631 – Assignment 2 - Fall 2018

Assignment 2: Data Structure Comparison


Given: Wednesday, September 26, 2018
Electronic Copy Due: Friday, October 26, 2018

Objectives:

• to practice using a hash table.

• to gain experience with experimental programming.

• to gain an understanding of how efficient hashing is.

The Problem
In this assignment, you (and up to one classmate) will write one or more programs to compare a hash
table to another data structure. The second data structure must be a linked list or a sorted array. The
chosen data structure and the hash table must be designed to function as a set capable of adding as
check for the existence of a word (which will be called lookup in this handout). The goal of your work
will be to answer all of the following questions:

1. How many unique words can you add to your chosen data structure in one second?

2. How many words can you add to a hash table in one second?

3. How many unique words can you add to each of the two data structures in ten seconds?

4. Graph the time it takes to perform 10,000 lookup operations on the data structure of your choice
for a large variety of sizes (at a minimum use 200, 2,000, 20,000, and 200,000). Does the timing seem
to match the theoretical growth rate you would expect for that data structure?

5. Graph the time it takes to perform 10,000 lookup operations on on a hash table for a large variety
of sizes (at a minimum use 200, 2,000, 20,000, and 200,000). Does the timing seem to match the
theoretical growth rate you would expect for a hash table?

6. Is there a definite faster algorithm for lookups or additions? If not, in which cases should you use
each of the data structures?

Team Work
For this assignment, you may work alone or in groups of two. For the demonstration (described below),
each group member must be able to answer questions about the operation of the group’s program or
the entire group will be penalized.

1
Implementation
For full marks, the group must implement the chosen data structure (or adapt it from a previous
assignment) and the hash table in their program. If you fail to implement one of the data structures,
you will also lose marks on the demonstration and question portions of the assignment.

Your data structures should be given the size for which they will be working when they are created.
You should use this to ensure that if you’ve chosen a sorted array that it is big enough and also that
your hash table is created with enough buckets before the experiment begins. Since a linked list grows
dynamically, you don’t need to take the final size into account when you create your data structure.

You will not need to include a remove operation for this assignment.

Experimental Procedure

To perform the experiments, you want to ensure that both of your data structures are tested under the
same conditions. To do this, you should use a new random number generator with the same seed for
each of the experiments. The procedure would be something like the following:

1. Create a data structure with a size appropriate for the experiment.

2. Construct a random number generator with a given seed.

3. Read in or generate a number of unique words equal to twice the size. You can use the file
top333333.txt which contains the top 333,333 words used in the English language, one per line. If
you need more than 333,333 you can read the words in again and add 2 to each word to make a
new "word".

4. Randomly add half of the unique words to the data structure so that it has exactly the correct
number of words. (If the words are in random order, you could just add the first half of the words,
if not, since duplicates cannot be added, be sure to loop until the correct number have been added
successfully, not just attempted.)

5. Start a timer. Note that the timer will not include the initial time to add items to the data structure.
In computing, starting a timer really involves storing the current exact time, e.g.:

long startTime = System.nanoTime();

6. Attempt to find an item 10,000 times. Use the random number generator and the unique words
above to randomly choose which word to find in each case. Note that because you have twice as
many random words, half of the time the word will not be found.

7. End the timer by getting the end time and subtracting the start time from it, e.g.:

long totalTimeInNanoseconds = System.nanoTime() - startTime;

2
Potential Interface

It is recommended that both of your data structures implement a common interface so that you will
only need to write your experiment code once. Here is one potential interface (feel free to use or
expand upon this interface):

/**
 * A set that holds a collection of unique words.
 *
 * @author Jason Heard
 */

public interface WordSet {

  /**
  * Returns the number of words in this set.
  *
  * @return the number of words in this set.
  */
  int size();

  /**
  * Determines if this set contains the specified word.
  *
  * @param word the word to look for in the set.
  * @return <code>true</code> if this set contains the specified word; false
  * otherwise.
  */
  boolean contains(String word);

  /**
  * Ensures the specified word is in this set. If this set already contains
  * the element, the set is not changed and returns <code>false</code>.
  *
  * @param word the word to add to the set.
  * @return <code>true</code> if this set did not already contain the word;
  * <code>false</code> otherwise.
  */
  boolean add(String word);

3
Data and Graph Generation

As you write your program, remember that the goal of the program is to answer the above questions.
To do that, your program should be written to perform as much of that information as possible. Part of
the marks for this assignment will be the "helpfulness" of your program. The marks for this portion
will be assigned as follows:

Mark "Helpfulness"
0 The program does not produce any output or cannot run.
1 The program only produces the data for some of the above questions.
2 The program must be run multiple times to generate the data needed to answer the above
questions, but is able to produce the data needed for all of the questions.
3 The program outputs all of the information for the above questions as text to the console.
(You can present a menu so that you can test a portion of the program, but one mode must
run all of the tests.)
4 The program outputs all of the information for the first three questions as text to the console
and saves a CSV (comma separated values) file for graphing questions 4 and 5.
5 The program outputs all of the information for the first three questions as text to the console
and saves or displays a graph for questions 4 and 5.

To produce a graph, consider using the JFreeChart library or the JOpenChart library. To produce a CSV
file, simply write to a text file with one line per row in the final spreadsheet and separate the columns
in each row with a comma. If the file is saved with the file extension .csv then it should open in Excel if
you double-click on it. For example, the following CSV file:

,200,2000
Linked List,1.23,2.34
Hashtable,,0.9

Would produce this spreadsheet in Excel:

200 2000
Linked List 1.23 2.34
Hashtable 0.9

Demonstration
Each group or individual must bring answers to the questions outlined in the Problem section above to
a demonstration during the lab hours on Tuesday, October 30. During this demonstration each group
member will be asked about the operation of the program and about the answers to the questions. In
addition, the group will be asked to demonstrate their program(s) that they created to answer the
questions.

4
Submission and Marking
Create a folder named Lastname_Firstname_Asg02 or
Lastname1_Firstname1_and_Lastname2_Firstname2_Asg02 and place your .java files in the folder. Submit
your source code to the submit folder (I:\Labs\CompSci\Submit\2631\001). If you are submitting your
files via the web interface from off campus, you will need to compress your files into one .zip file.
There will be marks for naming your submission correctly.

Code style will not be marked for this assignment other than high-level design of your two
implemented data structures.

S-ar putea să vă placă și