What If ...? #6 - Airing Dirty Laundry (Scoring and Grading)

Dec 08, 2009 08:42

What if I didn't have a single clean subject today and just wanted to show ways I've been an idiot over the years while bringing up potential issues with grading and scoring? You might get an essay like this.



At many puzzle events, the returning of results can act like a "punch to the gut". You'll never get more points than you expect (unless somehow there are "free guess" puzzles in a round), but you can certainly get less. For many years, I've argued with nickbaxter for the removal (or modification) of the 5 point penalty on the USPC for incorrect submissions, since in almost all cases the answers were not from people trying to guess a solution, but simply people who had made and not caught a mistake. At a WSC/WPC, this would get you 0. The -5, in addition to the lost points, is the punch to the gut here on a single mistake. The only -5 point penalty I've received in my championship years was for a mistake on the counting puzzle in 2007 and even I agree, on a potentially guessable counting puzzle, some number of penalty points (tied to expectation of correct answer, but not 5) is appropriate. Wei-Hwa similarly made a mistake that year on a 6 option "guessable" puzzle, so the penalties we each received did not change the 5 point gap we had for both finishing the test but with different puzzles wrong and simply brought us closer to 3rd place.

While my error-checking works great at home, when I'm somewhat rested and healthy, it has failed me tremendously over the years in overseas competitions. I certainly do not enjoy traveling, cannot sleep well in foreign beds (and I'm often 8+ more hours jet lagged than the average European contestant at a WPC). Europe also seems to have a cultural connection with smoking, and certainly in and around the hotels where we travel. If I'm not sick and coughing when I arrive at a WSC/WPC, I certainly am before I leave.

What are the punches to the gut I receive at WPC/WSCs? At these events there are often rounds which are poorly timed, where a majority of the top ten are done with 10-20 minutes on the clock, but absolute accuracy over 10+ puzzles required for position and time bonuses. Year after year I feel I've finished a round 1st or near 1st, only to get back a solid punch to the gut and a huge loss of points when the papers come back. Here are some of my greatest hits (in several senses of that word):

2007 Prague WSC:



Round 1 - blank cell (10 puzzle points + 9 time bonus (3 minutes)) - 135 instead of 154



Round 2 - transposition (10 puzzle points + 24 time bonus (8 of 30 minutes)) - 215 instead of 249
(Ironically on this one, I checked the rows of every puzzle in the round to spend 6 points after finishing with ~10 minutes on the clock. My error was in the columns.)

Result: Expecting to be near if not the leader after 2 rounds, these mistakes (in back to back starting rounds) left me 45 points and many places behind. A masterful, and clean, 3rd round performance brought me back to 1st where I'd finish, both in qualification and after a separate playoff.

2007 Brazil: The year of the missing segments. I'd make many dumb errors here, in several rounds I thought I'd finished. The most painful were the back to back returns of Rounds 1 and 2, where I finished each in 1st place.



Round 1 - missing line (4 puzzle points + 30 position points + 15 time points (3 minutes of 30) - 196 instead of 245. (Note: you can see the first judge marked this OK, then a second judge or a second look made it NO!)



Round 2 - missing line (3 puzzle points + 80 position points + 175 time points (35 minutes of 90) - 447 instead of 705. (Note: read below for why this shows the NO! crossed out to OK.)

Result: Expecting to be the leader after 2 rounds, these mistakes left me in 10th and only marginally ahead of the next 8 solvers who would also be competing for playoff spots after the end of Day 1. A humiliating Bills loss on MNF, coupled with the Yankees elimination in the ALDS, made the whole day one of my worst five in memory. During the morning of Day 2, the Round 2 mistake was protested, all the points were returned, and I ended up with a >100 point lead after qualification. The "mistake" and regrading is still controversial, and is shown here publicly for the first time.

2008 Belarus -
(A full gut-punch point-wise, but only half of it due to the stupid error I show here as I also had an error on a clouds puzzle in this part that I thought I had checked. Still, to think I broke the only sudoku in the whole competition....)



Crypted Part - 1 instead of 6 in sudoku - 24 (+22 + 4) puzzle points + ~35 placement points (2 of 8 finishers) + 67 time points (16.75 minutes of 60) - 150 instead of 302.

Result: After being tied with many others after day 1 and the morning of day 2, the afternoon rounds of Day 2 would decide the final order and I expected to be in the top 2. Instead, my inability to correctly solve a sudoku would cost a sizeable number of points, and drop me into the middle of the pack entering the playoffs. I recovered no ground in the last round either. More stupid errors during the playoffs would keep me from reaching the grand-final.

2009 Zilina -



Team Round - Relay puzzle #2 - repeated "3" in a box instead of a 9. - 9 points + remaining 36 points for team in round + 5 points (placement bonus) = 9 points in round instead of 59.

Result: With no individual results in the team results (for an unknown reason), with most team rounds not working, this was anyone's "title". We finished only 27 points behind the champion Slovakians (but 27 points behind was also in 8th place). My boneheaded error cost us 50 points in the round. You do the math. Ugh! Hard hard punch to the gut, since it affected not just me this time.

2009 Antalya -



Individual Round - Tapa - 25 puzzle points + 20 position points + 33 time bonus points (11 minutes) - 175 instead of 253.



Individual Round - Matchmaker - 65 puzzle points + 45 time bonus points (15 minutes) - 250 instead of 360.

Result: Holding onto a separated 2nd place position, and hoping to catch up to Ulrich after Day 1, these lost points instantly dropped me back into the pack in back to back rounds to start Day 2. Further sloppy solving led to an unsatisfying 7th place finish.

So, after all of these "huge point" "mistakes", it is clear that my claim last week that "points" in the early rounds is the best measure of performance is not as crystal clear as I make it out to be (but I still contend it is the best). I'm sure other solvers will have their own stories of fails like these, just perhaps not at this magnitude. Accounting for this kind of mistake may be THE ONE thing playoffs can help get right. Intermediate judging of puzzles can (with penalty) let a solver who is bat-crazy from fatigue fix an obvious mistake at a time it matters most. Not all playoffs do intermediate judging. Zilina could have, and should have. If no one was going to win a way into the final by solving 4 puzzles, having a clear time defined for when certain solvers got to 3 out of 4 correct is infinitely better than the "when you gave up" "tactic" that was used instead. I would have demonstrated my 3 solutions with 16 minutes left. I would have moved on, and who knows what happens from there. The USSC could have some means (since people say done and then try to write again every year by accident) to declare a time, but let a solver keep checking to possibly amend a solution (possibly with penalty) given the 4' by 4' format makes visualization especially tough. It did not this year, which leads to another great punch to the gut:



Result: Expected $10,000, iPod Touch, (small) trophy, spot on US team as US Sudoku Champion. Instead $4,000, and a sudden intense curiosity on how the third place finisher "Eugene" was doing since he should have had 12 more minutes to complete his grid. So my mistake helped me catch a cheat, but that hardly recovers the knowledge I had about as long to catch my error as it took me to solve the puzzle, when my error is not that hard to spot (and correct).

And not all mistakes are necessarily punches to the gut. Many are lesser hits in rounds that no one finished (and therefore just for points themselves). Most of the rounds in Eger and Borovets had this kind of quality, aside from the small 5-6 puzzle focused rounds that did have time/placement bonus. I certainly made my share of stupid errors there, just not for 100+ points. Some mistakes I've made in the past are quite humorous (and in general show I just can't do math).

I like these in particular:



Belarus - Crypted Square from sprint round - the only puzzle I had wrong in completing this round (albeit I only lost ~16 points for the mistake.) The middle number is hardly the average of the adjacent pairs.



Antalya - my inability to get 48 in this top product is shocking. There goes 17 points worth another good chuckle.

Also, no presentation of punches to the gut would be complete without some additional fun photography of judging from this year in puzzles. Specifically, I'd like to point out that grading is a really tough business. Here are two solutions to a non-sudoku that was secretly a sudoku in Zilina.



I've transposed R35C46 and replaced an 8 with a 5 in row 5. Its very correctable but looks very wrong.



The other Tom has written a wrong digit somewhere in his grid, but got credit. (Hint, its in R5C1.) Judges are not infallible.

Judges can even make errors of subtraction, as witnessed in the grading of the Finals:


Whole countries, focused on "tactics" in explaining my loss, can demonstrate in their own ways their own problem with tactics, such as seen on a building near the competition hall in Zilina (and my favorite travel photo of 2009):


Then there are less clear errors on indecipherable papers (top is my Semi-Final #3, bottom is Salih Alan's Semi-Final #3):





The "infamous" Salih Alan missing paper, shown as the second example above with an incorrect grid #3 as initially marked, led to a later WPF decision to award him ninth place and move me to 10th when his paper was discovered and regraded after the finals themselves in Zilina. Looking at Salih's grid, if you want to find the correct digit in R8C3, it is there. But it is not jumping off the paper (nor in some other spots like R3C1). I'd argue that in the same way my "single blank cell" grid above from Prague is effectively correct minus one cell, at best Salih's paper is effectively correct minus one cell, but it is easy to see how judges could view this grid as either correct or incorrect and certainly as confusing.

6 months later, Salih would again get coaches huddled around for a question on one of his sudoku papers (I sadly have no picture of it, as I was waiting the start of the finals at the time and not taking photography). Salih had turned in a grid with a blank cell in the semifinals of the Sudoku Cup in Antalya, but there was a single notation, similar to that used in other paired cells to indicate either/or choices, that was actually the correct digit for the cell. If you squinted, a 5 was there, but again it was not in the same style as his other numbers and up for debate. Intermediate grading and resubmission with 1 minute penalty would have clarified Salih's case here (and let him advance anyway) which is why Byron's lack of protest is probably the wise choice after he took 4-5 more minutes to finish the puzzle cleanly.

So, I've shown a lot of different things up above, and certainly that I cannot compete cleanly after a lot of travel and that judging is hard; here are some potential what ifs?:

What if the USPC eliminated its 5 points penalty? Well, my scores would most often not change but many of my friends would have 5 or 10 more than they ended up with. I'd probably try to take some credit for improving the overall skill level of the US for one year, even if it was an accounting change.

What If, instead of grading an entire grid of sudoku, judges were expected to grade two rows or two columns. This would make grading papers much faster (even at the cost of some "errors" in grids). What if solvers were further asked to transcribe their numbers for that row or column onto the bottom of the page to further simplify this process. This would then also clean up the situation for solvers like Salih who leave a mess on the paper to consider. It would get around international inconsistencies in 1's and 7's as well. It would eliminate blank cell errors, but not necessarily transpositions (although you can check a row/column easier than a whole puzzle for this kind of constraint failure). Not all papers will be as easy to grade as "Eugene's Round Three" as the scars of solving are often borne by the solution, and solution transcription is an interesting consideration.

The OAPC round at Antalya was the most compelling to me along this line of thought as you entered strings before submission as you would online, and only your strings were graded. I believe the graders were simply exhausted which delayed this particular round, and have no reason to believe this round was hard to grade except to sum up the final score. I probably would have collected the books themselves to be able to deal internally with the discovery of a broken puzzle, or to assist regrades/confirm cheating, but simplifying the marking of papers is most likely a good thing since graders are under high time-stress and are as fallible as the rest of us.

What if solvers' papers were scanned, anonymized, and the community could see/grade them? Its an odd set-up, but I bet a couple people who want to see the WSC/WPC puzzles could be turned into graders with such a "bribe". Its probably unnecessary, but I like wild speculation. Scanning does limit the ability to "cheat" on a regrade - which is one reason we copy some number of tests before handing them back in any college to catch people who might do this by editing their papers. Seeing papers would clarify the Eugene situation immediately, but also clarify how dirty some solvers' grids can be to judge, leading to recommendations for other strategies/requirements to improve competitions.

What if rounds with time bonuses but 10+ puzzles did not use absolute accuracy, except for placement bonus, but instead used 2 minute penalties for each "bonehead" error you are caught on. The American Crossword Puzzle Tournament uses this rule: "A bonus of 25 points for each full minute you finished ahead of the suggested solution time - BUT reduced by 25 points for each missing or incorrect letter (but not beyond the point the bonus returns to zero)". So, a bonus there is not taken to zero, just reduced (there is a separate bonus for completeness of a puzzle separate from the time bonus). And that's on 1 puzzle in a round. Rio's 35 minute time bonus loss (before it was regraded) was removing >60x the value of the puzzle that the borderline "mistake" was made on, which is much longer than the time it would have taken to correct said "mistake" if it was found. Position bonuses should be based on "adjusted finish time" with penalties much as many other sports are, when the total time of a run allows for a small mistake without scrubbing the entire thing. Round scores will drop, but round bonuses should not be summarily erased, as the relative time to turn in 19 correct and 1 "incorrect" puzzle is a meaningful signal in the ranking as well.

What if playoffs consistently performed intermediate grading (with time penalties) of puzzles? Well, in general I'd again be doing better, and I've been receiving this benefit a lot recently. In Antalya, on 3 occasions out of ~12 turned in puzzles I had stupid, correctable errors on otherwise solved puzzles. In Goa, on 1 occasion out of 4 semi-final sudoku I was also caught this way (an 8 instead of a 9 in one cell). In Rio, on 2 occasions (but only once correctly) I was caught for a wrong answer where the fences puzzles wanted 3 loops and not 1. I fixed it in 10 seconds. The other time, I was marked wrong because I used a different notation than expected on a Museum puzzle where I consider one solved when I've drawn in all paths that connect rooms, whether I've marked walls or not. I didn't storm out of the room or protest this lost minute and a half, but intermediate grading is not infallible either. So, I've benefitted from it a lot already, I guess. In Prague and Goa in the playoffs where there was no such system in place in the finals, I simply spent 30 seconds on each puzzle checking. I found an error in Prague, but not in Goa, so it might be a wash which system to use, although I've never caught other solvers checking as much or as often as I end up doing at sudoku tournaments. On one puzzle finals (like USSC), I've never checked, and you can see above what that leads to.

What if more puzzle competitions were done without me being 10 hours+ jet-lagged? Then I might have taken fewer punches to the gut and would have more fun with competitions. Instead, I'm left with the memories of the many many points I've lost every year, the titles I've cost myself and my team. I suspect that my ability to catch errors is heavily compromised by fatigue, but I still feel much better at eliminating them with the USPC format as I check my puzzles, first to confirm constraints, then check the extraction of an answer string, then to type that string itself. I slow myself down to make sure there are no errors before I "retire" a page of the test to never look at again. I enjoyed the OAPC (and was successful on them as well) which suggests I am a better solver than I am when I have traveled to Eastern Europe and start to cough and not sleep again. I hope to see further "league" tournaments, that run all year, with online tests - in part because they are cheap to compete in and can grow the community of solvers in countries that do not always have the funds to compete, or that have too many good solvers to get everyone on a team.

What if, after airing so many complaints I've had with organizers, and showing the most painful moments I've had in competitions in memory, I can eventually work through the pain and illogic of my recent performances? Who knows....

competition, wsc, sudoku, what if, wpc, ussc

Previous post Next post
Up