Mozilla A-Team: Peptest results, an exercise in statistical analysis

Mar 04, 2012 14:45

UPDATE: It's been pointed out that the current metric (sum of squares of unresponsive periods, divided by 1000) is used in Talos and has had a fair bit of thought put into it. I was curious what not squaring the results would do, but I wouldn't go with another metric without more careful thought.

UPDATE 2: It has also been pointed out that peptest ( Read more... )

peptest, mozilla

Leave a comment

Comments 4

anonymous March 4 2012, 22:10:42 UTC
This is a performance test suite. Its output should go to a graph and stats bot, like the other performance test suite, rather than going green or orange immediately.

- Jesse Ruderman

Reply

cloquewerk March 4 2012, 22:27:08 UTC
Definitely considering this avenue.

Reply


slajoie March 6 2012, 13:36:22 UTC
If frequent outliers are causing problems, why not run the test a bunch of times and look at the median value to decide if there has been a regression?

The severe outliers are probably caused by random things on the test machine anyway, not the code being tested. Of course that's true until it isn't and then you might miss something... But if you yell fire every couple days for a while, you're definitely going to miss the first real fire.

Either way you can report the full results separately.

Reply

cloquewerk March 15 2012, 21:11:56 UTC
Heya, thanks for your comment. :)

Along the lines you suggested, we've decided to looks at trends over time rather than make each individual result a pass/fail. Running the tests multiple times against every build is a good idea but will, of course, increase load on our tester slaves. I think, since we have so many commits, we should be good just calculating averages over some period (1 or 2 or 7 days) and identifying when that changes significantly. More experimentation to do!

Reply


Leave a comment

Up