I was using the spambayes package. After poking around a bit, it seems that they (somewhat reasonably) don't tokenize the wordstream themselves, but require you to provide the tokenized version. As a result, it was classifying based on letters and not words, giving expectedly unpredictable results.
Comments 4
Reply
Reply
Reply
Reply
Leave a comment