A B C D E F G H I J L M N P R S T U X

H

handleEndTag(HTML.Tag, int) - Method in class spider.util.HTMLParser
Handle the end tag.
handleStartTag(HTML.Tag, MutableAttributeSet, int) - Method in class spider.util.HTMLParser
Note the start of a tag and put the new state in the state stack
handleText(char[], int) - Method in class spider.util.HTMLParser
Handle text.
hasBadExtension(String) - Method in class spider.crawl.BadExtensions
find out if the given URL has a bad extension
Hashing - class spider.util.Hashing.
uses MD5 hashing to convert arbitrary strings into a 128 bit hexadecimal (String).
Hashing() - Constructor for class spider.util.Hashing
 
Helper - class spider.util.Helper.
A bunch of static helper functions
Helper() - Constructor for class spider.util.Helper
 
History - class spider.crawl.History.
helps to maintain history of a crawl with timestamps
History.HistoryElement - class spider.crawl.History.HistoryElement.
inner class to record history data
History.HistoryElement() - Constructor for class spider.crawl.History.HistoryElement
 
History() - Constructor for class spider.crawl.History
 
HTMLParser - class spider.util.HTMLParser.
Description: The class provides methods to parse an html page and convert it into an XML format
HTMLParser() - Constructor for class spider.util.HTMLParser
 
HTMLParser(Stopper) - Constructor for class spider.util.HTMLParser
Constructer (if a stopper is provided - stopper alows for removing stop words)
htmlToXML(String, String) - Method in class spider.util.HTMLParser
convert the html into an xml format(naive) Currently all the HTML tags are kept (some corrected) but the only attribute that is stored is href
HubSeeker - class spider.crawl.HubSeeker.
 
HubSeeker(String[], long, String) - Constructor for class spider.crawl.HubSeeker
 

A B C D E F G H I J L M N P R S T U X