Package spider.crawl

Class Summary
ActiveThreads keeping track of crawler threads so that they can be stopped when none of them have further URLs to crawl
BadExtensions A list of bad extensions that need not be kept in the frontier
BasicCrawler Performs breadth first crawling.
BestFirst Best First crawler (that is extended from a BreadthFirst crawler)
Cache  
DOMCrawler A crawler that builds context of a URL through DOM tree representation of an HTML page
FetcherPool A pool of multi-threaded fetchers that can be used to fetch many pages at the same time
Frontier  
FrontierElement  
Globals Global parameters that are used by the crawlers
History helps to maintain history of a crawl with timestamps
HubSeeker  
Statistics maintain statistics related with crawl 1.
Tester Example code for running a crawler make sure you put a valid e-mail address