Found a decent corpus I can use to train a tagger: the Open American National Corpus, which is a free download, multi-genre, and 15 million tokens long. It includes conversations, both face-to-face and on the phone, which suits my problem much better than WSJ would. It's easy to read and nicely structured, and is already tagged. The bad news is, it
(
Read more... )