|
As with
every assignment in this course, a design document, output,
source code listing, and detailed summary of your work are
required. It is important that these be labeled clearly. If
you are taking the class on campus you are required to submit
these assignment in clearly labeled envelopes.
I.
Design Document (10%)
The design document should be written prior to coding. Clearly
describe all classes you will create. This requires a listing
of all attributes, methods, and algorithms that will be used.
Furthermore, you should clearly relate how your objects will
interact with each other. There should not be any code
in your design document. The goal of your design document
should be for any programmer to be able to implement the project
in a language of his choosing using only your design document
as a reference.
II.
Source Code (10%)
Give a listing of your source code. Make use of some form
of source code utility that will print your listing in a very
clear fashion. We have been lax in taking points off for badly
formatted source code thus far, but will start to do so. Line
numbers by themselves do not make for legible source
printouts --- make sure the utility you use highlights reserved
words, comments, etc. Additionally, documentation is very
important.
III.
Output (70%)
Report
1
Set the amount of memory to be used during the indexing phase
to 8MB. Output the following items:
1. The
maximum number of bytes of memory used during indexing (should
not be bigger than 8MB)
2. The size of your inverted index on disk
3. The size of your relevance feedback index on disk
4. The top 50 phrases (by term frequency) found in your index
Report
2
Conduct query processing runs using indices constructed with
the following specifications:
1) terms-only,
without relevance feedback
2) phrases-only, without relevance feedback
3) terms + phrases, without relevance feedback
4) terms + phrases, with relevance feedback
For each
run, use treceval
to calculate average precision. Generate a table of average
precision following the template given here.
Use the
qrels file provided here
- we know that it contains documents that do not exist in
the collection, but we want to have a standard qrels file
so that results can be compared. For each query, obtain the
top 100 documents for the query if at all possible.
IV.
Summary (10%)
Write a careful summary of problems you encountered. Proofread
and spellcheck this summary! The first item in the
summary must include the status of your project, i.e., a listing
of what works and what doesn't work. Failing to mention a
non-working part of your assignment is grounds for massive
point deductions.
Also
include in your summary things you wish you had been told
prior to being given the assignment. It is suggested that
you keep a log of events that occurred while working on the
assignment so that you will have useful information for your
summary. Do not simply list problems, rather, talk about the
main problems encountered and what you did to resolve them.
Make sure this summary is extremely well written and very
easy to read. A typical summary will range from one to three
pages in length.
|