Comments | cloquewerk: Mozilla A-Team: Peptest results, an exercise in statistical analysis

cloquewerk

Mozilla A-Team: Peptest results, an exercise in statistical analysis

Mar 04, 2012 14:45

UPDATE: It's been pointed out that the current metric (sum of squares of unresponsive periods, divided by 1000) is used in Talos and has had a fair bit of thought put into it. I was curious what not squaring the results would do, but I wouldn't go with another metric without more careful thought.

UPDATE 2: It has also been pointed out that peptest ( Read more... )

peptest, mozilla

Comments 4

anonymous March 4 2012, 22:10:42 UTC

This is a performance test suite. Its output should go to a graph and stats bot, like the other performance test suite, rather than going green or orange immediately.

- Jesse Ruderman

cloquewerk March 4 2012, 22:27:08 UTC

Definitely considering this avenue.

slajoie March 6 2012, 13:36:22 UTC

If frequent outliers are causing problems, why not run the test a bunch of times and look at the median value to decide if there has been a regression?

The severe outliers are probably caused by random things on the test machine anyway, not the code being tested. Of course that's true until it isn't and then you might miss something... But if you yell fire every couple days for a while, you're definitely going to miss the first real fire.

Either way you can report the full results separately.

cloquewerk March 15 2012, 21:11:56 UTC

Heya, thanks for your comment. :)

Along the lines you suggested, we've decided to looks at trends over time rather than make each individual result a pass/fail. Running the tests multiple times against every build is a good idea but will, of course, increase load on our tester slaves. I think, since we have so many commits, we should be good just calculating averages over some period (1 or 2 or 7 days) and identifying when that changes significantly. More experimentation to do!