|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--spider.crawl.BasicCrawler
Performs breadth first crawling. It treats the frontier as a FIFO queue picking up the next URL based on the order in which it was added to it. The crawler does not visit a page that it has already visited.
Constructor Summary | |
BasicCrawler(java.lang.String[] seeds,
long maxPages,
java.lang.String dir)
construct the crawler with the seeds |
Method Summary | |
long |
getMaxFrontier()
Returns the maxFrontier. |
long |
getMaxPages()
Returns the maxPages. |
int |
getMaxThreads()
Returns the maxThreads. |
java.lang.String |
getStorageFile()
Returns the storageFile. |
int |
getTopN()
Returns the topN. |
boolean |
reStartCrawl()
Allows to restart the crawler based on the last state of the history. |
void |
setFrontierAdd(boolean b)
set the frontier to allow (true) or disallow (false) addition of new URLs. |
void |
setMaxFrontier(int maxFrontier)
Sets the maxFrontier - maximum size of the frontier. |
void |
setMaxThreads(int maxThreads)
Sets the maxThreads - maximum number of threads. |
void |
setStatFile(java.lang.String statFile)
Sets the statFile. |
void |
setStorageFile(java.lang.String storageFile)
Sets the storageFile. |
void |
setTopN(int topN)
Sets the topN. |
boolean |
startCrawl()
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public BasicCrawler(java.lang.String[] seeds, long maxPages, java.lang.String dir)
seeds
- - URLs that are starting points for crawl
maxPages - maximum pages to be fetched
dir - the directory to store the results in (the directory is created if it does not exist)Method Detail |
public boolean startCrawl()
public long getMaxFrontier()
public void setMaxFrontier(int maxFrontier)
maxFrontier
- The maxFrontier to setpublic int getMaxThreads()
public void setMaxThreads(int maxThreads)
maxThreads
- The maxThreads to setpublic long getMaxPages()
public int getTopN()
public void setTopN(int topN)
topN
- The topN to setpublic java.lang.String getStorageFile()
public void setStorageFile(java.lang.String storageFile)
storageFile
- The storageFile to setpublic void setStatFile(java.lang.String statFile)
statFile
- The statFile to setpublic boolean reStartCrawl()
public void setFrontierAdd(boolean b)
b
-
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |