I bring to you the solution to a problem you never knew you might…: darkbliss42

darkbliss42

(no subject)

Oct 16, 2010 20:13

I bring to you the solution to a problem you never knew you might have! (Or perhaps never even knew existed...!)

vixen and I were settling in to watch a few episodes of anime when much to my chagrin the audio sync kept drifting >.>. If you remember we use mediatomb + mencoder to transcode all sorts of stuff and stream to our ps3. This bothers me. To no end. Even more so than stuttery video.

So anyway, I dug and I dug and I dug and finally came up with a solution that worked (Also fixed two other things that had been bothering me). Turns out the mkvs we were playing had aac audio files embedded that were recorded at 44.1khz, and the output we're making is ac3 at 48khz. However mencoder doesn't auto scale the frequency...whoops. So the audio kept drifting.

Here is the updated mencoder command!

mencoder "$input" -oac lavc -ovc lavc \
            -of mpeg -ofps 30 -mc 0 \
            -lavcopts vcodec=mpeg2video:vbitrate=6000:acodec=ac3:abitrate=448 \
            -vf scale=704:384,harddup,softskip,hqdn3d \
            -af lavcresample=48000 \
            -alang "$alang" -slang "$slang" \
            -mpegopts muxrate=36000 \
            -font 'augie' \
            -subfont-text-scale 2.5 -subpos 95 -subwidth 90 \
            -o "$output" &> /dev/null

-af lavcresample=48000 - audio filter to resample the input up to 48khz. Also included is a fix to the evil evil problem of freezing video from the mpeg mux rate falling too low (buffer underflow). Basically we kludge it together by using mpegopts muxrate=36000. Set the mux rate to 36kbps, this in theory is enough to prevent buffer underflow, so far it works for every example I've found of freezing video in our collection.

So you might be wondering what this frequency stuff is about. Well sound is a continuous wave; but we need to some how save it in to a file for use with computers. So the solution is to take samples (measure the sound) at interval. This however brings about other problems, if you don't sample often enough you run across an effect called 'aliasing' in which sounds with frequency higher than your sampling rate end up sounding like a lower frequency. A couple of smart guys called Shannon and Nyquist can up with this nifty theory that you need to sample at twice the max frequency you're recording to avoid aliasing.

Since humans can only hear between about 20hz-20khz (people with really good hearing at least), you'd need to sample at at least 40khz to make sure you don't get any aliasing. Now for the life of me I used to know why 44.1khz was the frequency cds were recorded at but I cannot remember anymore. 48khz is for high quality audio...again I used to know why this was, but well, I can't remember anymore. The main thing to remember is higher frequency (generally) means better sound reproduction (and larger file sizes).

Now the problem here is that we're playing something sampled at 44.1khz and 48khz (about 1.09 times speed). It causes a small frequency shift. You can definitely hear the frequency shift if you play them side by side, but otherwise it is hard to notice, it's why you can speed up/slow down audio when changing frame rates, as we do above when we change most of the sources from ~23fps to ~30fps. So over time it becomes more and more pronounced.

And then there is another confounding factor, the audio is recorded in variable bitrate, so some sections will skew the audio sync a lot where as some not so much.

Anyway, I must put a disclaimer in here that I'm not a sound theory person (I do however find it VERY interesting, like codecs in general), so take everything I say with a grain of salt and don't be citing me.

I've pulled most of what I couldn't remember (and some stuff I never knew!) from http://en.wikipedia.org/wiki/Sampling_rate