|
Class Summary |
| ActiveThreads |
keeping track of crawler threads so that they can be stopped when none of them have further URLs to crawl |
| BadExtensions |
A list of bad extensions that need not be kept in the frontier |
| BasicCrawler |
Performs breadth first crawling. |
| BestFirst |
Best First crawler (that is extended from a BreadthFirst crawler) |
| Cache |
|
| DOMCrawler |
A crawler that builds context of a URL through DOM tree representation of an HTML page |
| FetcherPool |
A pool of multi-threaded fetchers that can be used to fetch many pages at the same time |
| Frontier |
|
| FrontierElement |
|
| Globals |
Global parameters that are used by the crawlers |
| History |
helps to maintain history of a crawl with timestamps |
| HubSeeker |
|
| Statistics |
maintain statistics related with crawl
1. |
| Tester |
Example code for running a crawler
make sure you put a valid e-mail address |