(Untitled)

Mar 26, 2007 09:57



Read more... )

Leave a comment

Comments 4

ringzero March 26 2007, 14:22:43 UTC
What's your data for the classifier? The words in the item as a unigram (with an independence assumption)?

Reply

alfedenzo March 26 2007, 16:04:03 UTC
I was using the spambayes package. After poking around a bit, it seems that they (somewhat reasonably) don't tokenize the wordstream themselves, but require you to provide the tokenized version. As a result, it was classifying based on letters and not words, giving expectedly unpredictable results.

Reply

infohigh March 26 2007, 16:50:58 UTC
lolol :)

Reply

daniel_ream March 27 2007, 00:35:38 UTC
There's a Da Vinci Code-esque plot in here somewhere.

Reply


Leave a comment

Up