Lexical Architecture
Primary lexical scanner is a robust HTML lexer
- Custom lexer for TREC-style document formats
- Dictionary-driven phrase recognition a clickable option
- WordNet, Moby database, local instance generated from bibliographic citation keywords
Alternative lexer implements Brill’s rule-driven Part-Of-Speech tagger
- Used for one of the QA runs
- Possible to use for adaptive filtering