For this assignment you will evaluate two Web search engines, and compare their performance: (1) Yahoo, and (2) your own search engine built for this course. Evaluation is the last step toward building a search engine. You will build upon the crawler, indexer, and retrieval systems developed in the first three assignments. You are strongly advised to start working early. The assessments will be done by your fellow students, and you will submit the results. Our evaluation system will be relatively simple.
Read and follow the directions in this document very carefully, as they are filled with details that are meant to facilitate your task. You may discuss the assignment with instructors and other students (in class or online), but not with people outside class. You may consult printed and/or online references, books, tutorials, etc. Most importantly, you are to write your code individually and any work you submit must be your own.
All the scripts you create should be world-executable and world-readable
so the AI can see and run them, especially CGI scripts so your fellow
students can evaluate your search engines. All the text and Berkeley DB
files created should be world-readable so that the AI can inspect them.
Also for this purpose the parent directories of your working directory
should be world-executable. Work in your CGI directory for this assignment as
explained under How do I run a CGI script over the Web? in the
FAQ page.
Rememeber that your data files need to reside in your directory under
/var/www/data/
.
Install the CGI script evaluate.cgi
(available via
Oncourse) that shows a set of queries and hits to a user (subject). The
queries should be contained in a text file (call it
queries.txt
), one per line, and the name of the file is a
parameter in the script. For each query, the script displays a set of
hits, and for each hit it collects a relevance assessment from the user.
Note that the hits are shuffled
to avoid biasing the subject. It must be impossible during evaluation to
determine the source of a page. The user can click on hit URLs, which
will open in a new window to evaluate the relevance of each page.
Modify the evaluate.cgi
script to enter your query file
name, your search engine name, and names of DB_Files in which to store
hits and relevance assessments. You can also modify the set of relevance
scores and labels. Finally, fix the search1, search2
subroutines so that they use, respectively, your search engine and the
Yahoo API to return a ranked list of hits based on those search
engines. Note: since this script will be executed by the web server user
(typically apache
or www
or
httpd
) rather than you, the server must have permissions
to read/write any files and directories that the script has to access.
Also, the files created by the script will be owned by the web server, so
if you want access to those files for analysis later, you need to make
sure that the script creates the files with appropriate permissions.
Prepare 5 queries in your queries file. These should be selected to make sure that your search engine covers relevant pages, i.e., there are some relevant pages among those you crawled.
For each query, show the subject a total of at most 10 pages, half of which must be retrieved by your search engine and half obtained from Yahoo. To obtain Yahoo hits, use Yahoo! Search Web Services via the Yahoo::Search Perl module for the Yahoo Search API (you need an AppID).
At least 3 fellow students should be enlisted to evaluate your search engines. You may use the discussion board to post a request for volunteer subjects. It is your responsibility to clearly instruct subjects as to how to run your evaluation. To ensure sufficient assessments are obtained, each student is required to evaluate 3 fellow students' search engines. Respond to a request for subjects on the discussion board to indicate that you are evaluating that system. All this will work only if everyone posts their request well ahead of the deadline. For this reason everyone is required to post requests 72 hours before the due date for the assignment. Doing this is crucial in order to complete the assignment and be graded. All assessments must be completed in one day, 48 hours before the due date. To ensure that all assessments are completed in time, each student's raw score for this assignment will be discounted by 30% for each assessment not completed 48 hours before the due date. For example, if you only complete 1 evaluation in time, you will get a 60% penalty!
Use the script rel_sets.pl
(available via Oncourse) that
uses the relevance assessments collected from subjects to construct a
relevant set and tabulate results. The DB_Files with hits and relevance
feedback must be passed to the scripts as command line arguments. Modify
the rel_sets.pl
script based on your choice of definition
for consensus relevance --- for example, is a hit relevant if all, or a
majority of the subjects label it as somewhat relevant? Or if just one
subject labels it as very relevant?
Finally, modify the rel_sets.pl
script and/or write a
separate script that reads the output of the rel_sets.pl
script (or use a spreadsheet application) to first compute
precision and recall for each query as
a function of rank level. For example, the output of the modified script
might look like this for some query:
rank nrel e1 prec recl e2 prec recl ---- ---- -- ---- ---- -- ---- ---- 1 2 0 0.00 0.00 1 1.00 0.50 2 2 1 0.50 0.50 1 1.00 1.00 3 2 0 0.33 0.50 0 0.67 1.00 ...Then aggregate the results across queries. Save the results to a text file and use a plotting application (e.g., excel or gnuplot) to build a plot with the precision-recall curves for the two engines. You may use 5- or 11-point averages with interpolation or per-rank averages across queries, as discussed in class. The final plot should be exported and saved to a file in
PNG
format and posted on your capricorn
website (see instructions on readme file below).
Make sure your code is thoroughly debugged and tested. Always use the
-w
warning flag and the strict
module.
Make sure your code is thoroughly legible, understandable, and commented. Cryptic or uncommented code is not acceptable.
Make sure your system works well. Test if thoroughly yourself before engaging your fellow students for actual assessments.
Place your scripts, query and data files, plot file, and any other
support files created or used by your scripts in a directory named
a4
. Further, place in this directory an HTML file named
a4-readme.html
with concise but detailed documentation on
your evaluation system: how you implemented them (eg what data
structures), what parameters you used, what choices you made (eg what
consensus rule for definition of relevance), etc. Link your plot file
with precision-recall curves. Add one paragraph commenting on the result
of your evaluation. It is important to also list (1) your 5 evaluation
queries, (2) the names of the subjects who performed the relevance
assessments to evaluate your system, and (3) the names of the students
for whom you performed relevance assessments. Give credit to any source
of assistance (students with whom you discussed your assignments,
instructors, books, online sources, etc.) -- note there is no penalty
for credited sources but there is penalty for uncredited sources even if
admissible. Include your full name and IU network ID. Make sure this
documentation is properly formatted as valid XHTML. Hint 1: if your code is
properly commented, writing the readme file should be a simple matter of
collating all the comments from your code and formatting. All the
documentation required in the readme file should already be contained in
your code comments. Hint 2: use a text editor to write your
documentation, not a WYSIWYG HTML editor, and especially not M$Word.
(Re)move any unnecessary temporary files from the a4
directory. Create a gzipped archive of the a4
directory
using the tar czf
command as usual (see Assignment 1),
naming the resulting file a4-login.tgz
where
login
is your username. Now upload the file
a4-login.tgz
to the A4 Drop Box on
Oncourse by the deadline.
The assignment's raw score will be graded based on the following criteria: