|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--javax.swing.text.html.HTMLEditorKit.ParserCallback | +--spider.util.HTMLParser
Description: The class provides methods to parse an html page and convert it into an XML format
Field Summary |
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
IMPLIED |
Constructor Summary | |
HTMLParser()
|
|
HTMLParser(Stopper stop)
Constructer (if a stopper is provided - stopper alows for removing stop words) |
Method Summary | |
void |
handleEndTag(javax.swing.text.html.HTML.Tag tag,
int pos)
Handle the end tag. |
void |
handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attribs,
int pos)
Note the start of a tag and put the new state in the state stack |
void |
handleText(char[] text,
int pos)
Handle text. |
java.lang.String |
htmlToXML(java.lang.String html,
java.lang.String url)
convert the html into an xml format(naive) Currently all the HTML tags are kept (some corrected) but the only attribute that is stored is href |
boolean |
isStemmer()
Returns the stemmer. |
void |
setStemmer(boolean stemmer)
Sets the stemmer. |
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
flush, handleComment, handleEndOfLineString, handleError, handleSimpleTag |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public HTMLParser(Stopper stop)
public HTMLParser()
Method Detail |
public void handleStartTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet attribs, int pos)
handleStartTag
in class javax.swing.text.html.HTMLEditorKit.ParserCallback
public void handleEndTag(javax.swing.text.html.HTML.Tag tag, int pos)
handleEndTag
in class javax.swing.text.html.HTMLEditorKit.ParserCallback
public void handleText(char[] text, int pos)
handleText
in class javax.swing.text.html.HTMLEditorKit.ParserCallback
public java.lang.String htmlToXML(java.lang.String html, java.lang.String url)
html
- string - String, the url - String
public boolean isStemmer()
public void setStemmer(boolean stemmer)
stemmer
- The stemmer to set
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |