JavaCC
Spring Application Framework
Apache Struts
Apache MyFaces
Snowball stemmer
PyParsing
Sqlite
WordNet
NumPY
Rabin Hash Function
Apache Lucene
Lucene.Net
LINQ to Lucene
Explanation
HtmlStripper.java
config.xml
XMLParser.java
HTML parser
Some partly finished tokenizers exist on the web that you can adapt to your needs. You do not have to get an approval for using any of these.