That's pretty weird. My first thought was that it could be doing all the constants at compile-time in a higher precision, but -O1 and -O3 both should do that. Does the discrepancy go away when you use double precision?
I examined the transient response today. It is correct for both levels of optimisation!
Strangely, when we keep iterating the same input, after the transient output is long gone and we're into the 'settle onto the input value' stage, what happens is after about 3072 iterations, the numbers are identical, then around 4096 iterations, the -O3 numbers start to be VERY slightly larger. (And actually, probably MORE accurate than -O1).
Switching to double DOES make the problems go away! I think what is happening is some values in this application could be treated like constants, while in an ACTUAL program, they couldn't, since they might change. Regardless, The -O3 answers are closer to 2 and thus are MORE accurate, so the "constants" produced (like dt= 1/blah) are probably more accurate then with -O1.
Good news, I suppose. I just can't understand how -O3 is giving me almost 25% speedup (40 seconds -> 30 seconds) without 'cheating' somehow, but the output does seem quite accurate...
Comments 3
Reply
Strangely, when we keep iterating the same input, after the transient output is long gone and we're into the 'settle onto the input value' stage, what happens is after about 3072 iterations, the numbers are identical, then around 4096 iterations, the -O3 numbers start to be VERY slightly larger. (And actually, probably MORE accurate than -O1).
Switching to double DOES make the problems go away! I think what is happening is some values in this application could be treated like constants, while in an ACTUAL program, they couldn't, since they might change. Regardless, The -O3 answers are closer to 2 and thus are MORE accurate, so the "constants" produced (like dt= 1/blah) are probably more accurate then with -O1.
Good news, I suppose. I just can't understand how -O3 is giving me almost 25% speedup (40 seconds -> 30 seconds) without 'cheating' somehow, but the output does seem quite accurate...
Reply
Reply
Leave a comment