INFO I427 Search Informatics (3 CR)
Google under the hood

Assignment 1| Assignment 2 | Assignment 3 | Assignment 4 | Project

Assignments and Project FAQ

What machine can I use?

For assignments and projects, we will work on a unix (actually linux) machine called capricorn.informatics.indiana.edu. Accounts will be or have been created on capricorn for registered students. To log into capricorn, use your IU username (aka network ID) and password. For example if your email is janedoe@indiana.edu, then your username is janedoe.

How do I log in and out?

To log into capricorn, you must use ssh. IU has various ssh clients depending on what machine you are working from. From a unix machine or terminal (eg, the Terminal in Mac OS X) you would simply type ssh capricorn.informatics.indiana.edu, then enter your username and password as prompted. Note: if your client unix username (eg, your Mac OS X username) does not match your capricorn username (IU network ID), then specify your username by typing, say, ssh -l janedoe capricorn.informatics.indiana.edu. At the end of your session you would type exit to logout. From a windows machine, you will enter the capricorn hostname in the GUI of an ssh client; everything else is the same.

How do I get started in UNIX?

Once logged into capricorn, you use the unix shell to enter commands and edit, debug, and run your script files. We will give a brief primer of unix in class but unless you are already familiar with unix you are strongly encouraged to consult one of many excellent online tutorials, an IT Training self-study, or books.

How do I edit and write my files?

In unix, there are several editors you can use. The easiest, but least powerful is nano, a clone of pico. More powerful alternatives are vi and emacs. You are strongly encouraged to learn either vi or emacs, they are worth the steep learning curve. A much less efficient alternative is to use a good code editor with which you are familiar on your client machine, and then transfer your file(s) to capricorn using scp or sftp. Debugging is very inconvenient with this approach.

Can I practice creating a personal homepage?

Sure. Edit an HTML file named index.html in the public_html subdirectory of your home directory. Make sure the file is world-readable. You can then access this page at http://capricorn.informatics.indiana.edu/~myusername/.

How do I get started with Perl?

In class we will briefly discuss how to write, compile, debug, test and run your perl scripts. Unless you have some prior programming experience with perl, you are strongly encouraged to attend one of the STEPS workshops on perl offered by IT Training & Education especially for this course. Because they are partially funded by the Student Technology Fee, these workshops are free for enrolled IU students (present your IU student ID). You will also receive a set of workshop materials at no additional charge. You can obtain the materials for free even if you cannot attend the workshop, by presenting your IU student ID.

How do I get Perl modules installed?

All the perl modules we are likely to need for the class assignments and project are already installed on capricorn. If you think you need an additional module, post a message on the Oncourse Discussion. There is likely to be a simpler alternative, otherwise I will discuss the module installation with our sysadmin.

How do I run a CGI script over the Web?

Name the CGI script using the extension cgi, for example myscript.cgi, and place it in the /var/www/cgi-bin/myusername directory. Make sure that the script has the appropriate permissions: it should be world-readable and world-executable (755). Test your script locally (from the shell) first, to make sure it compiles and runs without errors before it is run over the Web. Finally, point a browser to http://capricorn.informatics.indiana.edu/cgi-bin/myusername/myscript.cgi. You can also place static HTML pages in /var/www/html/myusername, for example pages containing forms whose actions are CGI scripts. Such a page would then be accessible at a URL like http://capricorn.informatics.indiana.edu/myusername/mypage.html. Finally, if your script needs to write and read data files, these files should be located in /var/www/data/myusername. Make all of your files group-readable and group-writable so that Fil can help debugging if needed.

What if I have a runaway script?

It may happen that you launch a script that is either very slow, or never terminates due to a bug. If you run such a script from the browser (see above advice about testing locally first), the script will be executed by the Web server (apache) and so you cannot directly kill its process like you would normall kill processes that you run from the shell. This is because apache would own the process, not you. To allow you to kill such a script, the sysadmin placed a kill_proc script in https://capricorn.informatics.indiana.edu/cgi-bin/kill_proc/kill_proc.pl. Find the process ID (PID) of the apache-spawned process you want to kill by running ps or top from the shell. Then run the kill_proc script from your browser, entering the PID. This script requires authentication, and we log its use; any abuse (eg, killing somebody else's process) will not be tolerated.

Where can I find more help online?

For unix commands, use man (type man command at the prompt). For perl, use perldoc module or perldoc -f function (perldoc is also online). Google can also be very helpul, of course. Finally, in case it is not obvious from this brief introduction, the IU Knowledge Base is a fantastic resource. Use it!