When you're scanning the document you can choose searchable, or not searchable. If you choose searchable it's a really big difference in file size. Not searchable is much smaller, so if that's okay for your purpose it should work.
It depends on what you're using to create the PDFs. If you're scanning to get an image file and then making a PDF out of that image file, the PDF isn't text, which is tiny, it's just a wrapper around an image file, which is not. If you can get files of the original journal articles in some other form than big honking images, and run those through Distiller, you'll get much smaller files. If you turn down the resolution on the scanner, or get it to spit out something more compressed (like .JPGs) you'll get smaller files.
Here you go. I'm assuming that you have Adobe Acrobat and can scan directly into the Acrobat application. Once you do that, Acrobat can do the text recognition and should be able to convert the scanned text to text instead of holding it as an image, and then you win.
What everyone else says is excellent, but I haven't seen them note this yet: if you do scan as an image and then do character recognition (OCR), it's absolutely imperative that you then proof the results tightly. OCR has improved greatly in the past few years, but (especially when coming off of scanned paper) errors happen.
If by some chance you can get the articles electronically and then create PDFs from them, you stand a much higher chance of getting a small file without errors.
Comments 7
Reply
Reply
document -> scanned to image -> character recognition application -> text -> convert to pdf
Reply
Instructions are here.
If you don't have Acrobat available, it's time to use the academic discount. :)
Reply
If by some chance you can get the articles electronically and then create PDFs from them, you stand a much higher chance of getting a small file without errors.
Reply
Reply
Leave a comment