spider.util
Class XMLParser

java.lang.Object
  |
  +--spider.util.XMLParser

public class XMLParser
extends java.lang.Object

To be used in conjunction with HTMLParser. The XML created using the HTML parser can be used as input to the XML parser. DOM tree based parsing is used

Author:
Gautam Pant
See Also:
HTMLParser

Constructor Summary
XMLParser(java.io.File f)
          a constructor that take in a file to be parsed
XMLParser(java.lang.String text)
          a constructor that take in a text to be parsed
 
Method Summary
 java.lang.String getContents()
          get the contents of the document that is being parsed
 org.w3c.dom.Document getDocument()
          returns the starting node of the DOM tree.
 java.util.Hashtable getLinkContext(int rel_depth)
          provides links with context depth of aggregation node is based on rel_depth
 java.util.Hashtable getLinkContext(java.lang.String look)
          provides a given link's context at different levels in the DOM tree
 java.util.Hashtable getLinkContextAdaptive(int w)
          climbs up the tree until it finds appropriate sized (w words) context
 java.lang.String[] getLinks()
          get links from given XML (html).
 java.lang.String getText()
          get text from the given XML (html)
 boolean startParser()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLParser

public XMLParser(java.io.File f)
a constructor that take in a file to be parsed


XMLParser

public XMLParser(java.lang.String text)
a constructor that take in a text to be parsed

Method Detail

getContents

public java.lang.String getContents()
get the contents of the document that is being parsed


startParser

public boolean startParser()
Returns:
true if success, false if failure - boolean
See Also:
opens the given xml document (file or string) and return the document object

getLinks

public java.lang.String[] getLinks()
get links from given XML (html). startParser() must be called before call to this method.

Returns:
an array of URLs - String[]
See Also:
startParser()

getText

public java.lang.String getText()
get text from the given XML (html)

Returns:
- the text in the document - String

getLinkContext

public java.util.Hashtable getLinkContext(int rel_depth)
provides links with context depth of aggregation node is based on rel_depth


getLinkContext

public java.util.Hashtable getLinkContext(java.lang.String look)
provides a given link's context at different levels in the DOM tree


getLinkContextAdaptive

public java.util.Hashtable getLinkContextAdaptive(int w)
climbs up the tree until it finds appropriate sized (w words) context


getDocument

public org.w3c.dom.Document getDocument()
returns the starting node of the DOM tree. The user may do their own traversal over the DOM tree.