CSCI B656 Web Mining (3 CR)


Computing, storage and network | Software and data

Computing, storage and network resources

For your class project, it is possible that you will require significant cpu, storage, and/or network bandwidth resources. If you are not a SOIC student, see the instructor to discuss your options. If you are a SOIC student, the following FAQs describe the computer systems for cpu-intensive processing and the storage facilities available to students: Please follow the guidelines for the use of these facilities as mentioned in these documents. The following additional considerations should be followed when doing processing that is likely to generate high volumes of network traffic:
  1. Processing that will generate sustained periods of high disk activity should be limited to a single process. If you need to have multiple computers and/or processes doing simultaneous, high-bandwidth I/O to central storage facilities please get approval first.
  2. Processing that will generate high volumes of network traffic to non-IU systems should be limited to no more than 200Kbps of sustained traffic. Please get approval before running processing that will exceed this level for more than 1 hour.
  3. Running any process that will systematically scan ranges of IP addresses or TCP port numbers is prohibited. For example, using a utility like nmap to scan a remote system for open ports is prohibited. Likewise, scanning ranges of IP numbers for accessible systems is also prohibited.
  4. Read and follow these guidelines on crawling/scraping online social network and other data. In the past, students who have not complied with crawling etiquette have gotten themselves, the instructor, and IU in trouble. Don't be the next one!
This list is not intended to include all possible activities that are prohibited or likely to cause system disruptions. If you are unsure if your intended activities are within these acceptable use policies, please ask before you proceed. Also if you feel that your project requires resources beyond those available via the above facilities and policies, please see the instructor so that we can discuss a suitable course of action.

Software and data resources

Note: some links may be out of date; please flag those to the instructor.