UPDATE: It's been pointed out that the current metric (sum of squares of unresponsive periods, divided by 1000) is used in Talos and has had a fair bit of thought put into it. I was curious what not squaring the results would do, but I wouldn't go with another metric without more careful thought.
UPDATE 2: It has also been pointed out that peptest
(
Read more... )
Comments 4
- Jesse Ruderman
Reply
Reply
The severe outliers are probably caused by random things on the test machine anyway, not the code being tested. Of course that's true until it isn't and then you might miss something... But if you yell fire every couple days for a while, you're definitely going to miss the first real fire.
Either way you can report the full results separately.
Reply
Along the lines you suggested, we've decided to looks at trends over time rather than make each individual result a pass/fail. Running the tests multiple times against every build is a good idea but will, of course, increase load on our tester slaves. I think, since we have so many commits, we should be good just calculating averages over some period (1 or 2 or 7 days) and identifying when that changes significantly. More experimentation to do!
Reply
Leave a comment