Comments | alfedenzo: (без темы)

alfedenzo

(Untitled)

Mar 26, 2007 09:57

( Read more... )

Comments 4

ringzero March 26 2007, 14:22:43 UTC

What's your data for the classifier? The words in the item as a unigram (with an independence assumption)?

alfedenzo March 26 2007, 16:04:03 UTC

I was using the spambayes package. After poking around a bit, it seems that they (somewhat reasonably) don't tokenize the wordstream themselves, but require you to provide the tokenized version. As a result, it was classifying based on letters and not words, giving expectedly unpredictable results.

infohigh March 26 2007, 16:50:58 UTC

lolol :)

daniel_ream March 27 2007, 00:35:38 UTC

There's a Da Vinci Code-esque plot in here somewhere.