|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Object
|
+--javax.swing.text.html.HTMLEditorKit.ParserCallback
|
+--spider.util.HTMLParser
Description: The class provides methods to parse an html page and convert it into an XML format
| Field Summary |
| Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
IMPLIED |
| Constructor Summary | |
HTMLParser()
|
|
HTMLParser(Stopper stop)
Constructer (if a stopper is provided - stopper alows for removing stop words) |
|
| Method Summary | |
void |
handleEndTag(javax.swing.text.html.HTML.Tag tag,
int pos)
Handle the end tag. |
void |
handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attribs,
int pos)
Note the start of a tag and put the new state in the state stack |
void |
handleText(char[] text,
int pos)
Handle text. |
java.lang.String |
htmlToXML(java.lang.String html,
java.lang.String url)
convert the html into an xml format(naive) Currently all the HTML tags are kept (some corrected) but the only attribute that is stored is href |
boolean |
isStemmer()
Returns the stemmer. |
void |
setStemmer(boolean stemmer)
Sets the stemmer. |
| Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
flush, handleComment, handleEndOfLineString, handleError, handleSimpleTag |
| Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public HTMLParser(Stopper stop)
public HTMLParser()
| Method Detail |
public void handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attribs,
int pos)
handleStartTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback
public void handleEndTag(javax.swing.text.html.HTML.Tag tag,
int pos)
handleEndTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback
public void handleText(char[] text,
int pos)
handleText in class javax.swing.text.html.HTMLEditorKit.ParserCallback
public java.lang.String htmlToXML(java.lang.String html,
java.lang.String url)
html - string - String, the url - String
public boolean isStemmer()
public void setStemmer(boolean stemmer)
stemmer - The stemmer to set
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||