spider.crawl
Class DOMCrawler

java.lang.Object
  |
  +--spider.crawl.BasicCrawler
        |
        +--spider.crawl.DOMCrawler
Direct Known Subclasses:
HubSeeker

public class DOMCrawler
extends BasicCrawler

A crawler that builds context of a URL through DOM tree representation of an HTML page

Author:
Gautam Pant
See Also:
BasicCrawler

Constructor Summary
DOMCrawler(java.lang.String[] seeds, long maxPages, java.lang.String dir)
           
DOMCrawler(java.lang.String[] seeds, long maxPages, java.lang.String dir, int delta)
           
 
Method Summary
 java.lang.String getQuery()
          Returns the query.
 java.util.Hashtable getURLScores(java.util.Hashtable lc, double pageScore, java.lang.String parentURL)
           
 void setQuery(java.lang.String query)
          Sets the query.
 
Methods inherited from class spider.crawl.BasicCrawler
getMaxFrontier, getMaxPages, getMaxThreads, getStorageFile, getTopN, reStartCrawl, setFrontierAdd, setMaxFrontier, setMaxThreads, setStatFile, setStorageFile, setTopN, startCrawl
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DOMCrawler

public DOMCrawler(java.lang.String[] seeds,
                  long maxPages,
                  java.lang.String dir)

DOMCrawler

public DOMCrawler(java.lang.String[] seeds,
                  long maxPages,
                  java.lang.String dir,
                  int delta)
Method Detail

getURLScores

public java.util.Hashtable getURLScores(java.util.Hashtable lc,
                                        double pageScore,
                                        java.lang.String parentURL)
Returns:
Hashtable - with scores as values for each url

getQuery

public java.lang.String getQuery()
Returns the query.

Returns:
String

setQuery

public void setQuery(java.lang.String query)
Sets the query.

Parameters:
query - The query to set