Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV LETTER
NEXT LETTER
FRAMES
NO FRAMES
All Classes
A
B
C
D
E
F
G
H
I
J
L
M
N
P
R
S
T
U
X
H
handleEndTag(HTML.Tag, int)
- Method in class spider.util.
HTMLParser
Handle the end tag.
handleStartTag(HTML.Tag, MutableAttributeSet, int)
- Method in class spider.util.
HTMLParser
Note the start of a tag and put the new state in the state stack
handleText(char[], int)
- Method in class spider.util.
HTMLParser
Handle text.
hasBadExtension(String)
- Method in class spider.crawl.
BadExtensions
find out if the given URL has a bad extension
Hashing
- class spider.util.
Hashing
.
uses MD5 hashing to convert arbitrary strings into a 128 bit hexadecimal (String).
Hashing()
- Constructor for class spider.util.
Hashing
Helper
- class spider.util.
Helper
.
A bunch of static helper functions
Helper()
- Constructor for class spider.util.
Helper
History
- class spider.crawl.
History
.
helps to maintain history of a crawl with timestamps
History.HistoryElement
- class spider.crawl.
History.HistoryElement
.
inner class to record history data
History.HistoryElement()
- Constructor for class spider.crawl.
History.HistoryElement
History()
- Constructor for class spider.crawl.
History
HTMLParser
- class spider.util.
HTMLParser
.
Description: The class provides methods to parse an html page and convert it into an XML format
HTMLParser()
- Constructor for class spider.util.
HTMLParser
HTMLParser(Stopper)
- Constructor for class spider.util.
HTMLParser
Constructer (if a stopper is provided - stopper alows for removing stop words)
htmlToXML(String, String)
- Method in class spider.util.
HTMLParser
convert the html into an xml format(naive) Currently all the HTML tags are kept (some corrected) but the only attribute that is stored is href
HubSeeker
- class spider.crawl.
HubSeeker
.
HubSeeker(String[], long, String)
- Constructor for class spider.crawl.
HubSeeker
Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV LETTER
NEXT LETTER
FRAMES
NO FRAMES
All Classes
A
B
C
D
E
F
G
H
I
J
L
M
N
P
R
S
T
U
X