XMLParser

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

spider.util
Class XMLParser

java.lang.Object
  |
  +--spider.util.XMLParser

public class XMLParser
extends java.lang.Object

To be used in conjunction with HTMLParser. The XML created using the HTML parser can be used as input to the XML parser. DOM tree based parsing is used

Author:: Gautam Pant
See Also:: HTMLParser

Constructor Summary

XMLParser(java.io.File f)
          a constructor that take in a file to be parsed

XMLParser(java.lang.String text)
          a constructor that take in a text to be parsed

Method Summary

java.lang.String getContents()
          get the contents of the document that is being parsed

org.w3c.dom.Document getDocument()
          returns the starting node of the DOM tree.

java.util.Hashtable getLinkContext(int rel_depth)
          provides links with context depth of aggregation node is based on rel_depth

java.util.Hashtable getLinkContext(java.lang.String look)
          provides a given link's context at different levels in the DOM tree

java.util.Hashtable getLinkContextAdaptive(int w)
          climbs up the tree until it finds appropriate sized (w words) context

java.lang.String[] getLinks()
          get links from given XML (html).

java.lang.String getText()
          get text from the given XML (html)

boolean startParser()


Methods inherited from class java.lang.Object

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

XMLParser

public XMLParser(java.io.File f)

a constructor that take in a file to be parsed

XMLParser

public XMLParser(java.lang.String text)

a constructor that take in a text to be parsed

Method Detail

getContents

public java.lang.String getContents()

get the contents of the document that is being parsed

startParser

public boolean startParser()

Returns:: true if success, false if failure - boolean
See Also:: opens the given xml document (file or string) and return the document object

getLinks

public java.lang.String[] getLinks()

get links from given XML (html). startParser() must be called before call to this method.

Returns:: an array of URLs - String[]
See Also:: startParser()

getText

public java.lang.String getText()

get text from the given XML (html)

Returns:: - the text in the document - String

getLinkContext

public java.util.Hashtable getLinkContext(int rel_depth)

provides links with context depth of aggregation node is based on rel_depth

getLinkContext

public java.util.Hashtable getLinkContext(java.lang.String look)

provides a given link's context at different levels in the DOM tree

getLinkContextAdaptive

public java.util.Hashtable getLinkContextAdaptive(int w)

climbs up the tree until it finds appropriate sized (w words) context

getDocument

public org.w3c.dom.Document getDocument()

returns the starting node of the DOM tree. The user may do their own traversal over the DOM tree.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Constructor Summary
`XMLParser(java.io.File f)` a constructor that take in a file to be parsed
`XMLParser(java.lang.String text)` a constructor that take in a text to be parsed

Method Summary
`java.lang.String`	`getContents()` get the contents of the document that is being parsed
`org.w3c.dom.Document`	`getDocument()` returns the starting node of the DOM tree.
`java.util.Hashtable`	`getLinkContext(int rel_depth)` provides links with context depth of aggregation node is based on rel_depth
`java.util.Hashtable`	`getLinkContext(java.lang.String look)` provides a given link's context at different levels in the DOM tree
`java.util.Hashtable`	`getLinkContextAdaptive(int w)` climbs up the tree until it finds appropriate sized (w words) context
`java.lang.String[]`	`getLinks()` get links from given XML (html).
`java.lang.String`	`getText()` get text from the given XML (html)
`boolean`	`startParser()`

spider.util Class XMLParser

XMLParser

XMLParser

getContents

startParser

getLinks

getText

getLinkContext

getLinkContext

getLinkContextAdaptive

getDocument

spider.util
Class XMLParser