spider.crawl
Class DOMCrawler
java.lang.Object
|
+--spider.crawl.BasicCrawler
|
+--spider.crawl.DOMCrawler
- Direct Known Subclasses:
- HubSeeker
- public class DOMCrawler
- extends BasicCrawler
A crawler that builds context of a URL through DOM tree representation of an HTML page
- Author:
- Gautam Pant
- See Also:
BasicCrawler
Constructor Summary |
DOMCrawler(java.lang.String[] seeds,
long maxPages,
java.lang.String dir)
|
DOMCrawler(java.lang.String[] seeds,
long maxPages,
java.lang.String dir,
int delta)
|
Method Summary |
java.lang.String |
getQuery()
Returns the query. |
java.util.Hashtable |
getURLScores(java.util.Hashtable lc,
double pageScore,
java.lang.String parentURL)
|
void |
setQuery(java.lang.String query)
Sets the query. |
Methods inherited from class spider.crawl.BasicCrawler |
getMaxFrontier, getMaxPages, getMaxThreads, getStorageFile, getTopN, reStartCrawl, setFrontierAdd, setMaxFrontier, setMaxThreads, setStatFile, setStorageFile, setTopN, startCrawl |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DOMCrawler
public DOMCrawler(java.lang.String[] seeds,
long maxPages,
java.lang.String dir)
DOMCrawler
public DOMCrawler(java.lang.String[] seeds,
long maxPages,
java.lang.String dir,
int delta)
getURLScores
public java.util.Hashtable getURLScores(java.util.Hashtable lc,
double pageScore,
java.lang.String parentURL)
- Returns:
- Hashtable - with scores as values for each url
getQuery
public java.lang.String getQuery()
- Returns the query.
- Returns:
- String
setQuery
public void setQuery(java.lang.String query)
- Sets the query.
- Parameters:
query
- The query to set