Work Log: Collsion system crash mystery: clown

clown_spawn

Work Log: Collsion system crash mystery

Jul 05, 2009 14:06

At our startup, we're pushing hard for a new build, but we're finding that the new build crashes on several games. Notably, the build typically crashes when an actor is being destroyed as the result of some collision, and if the actor has been resized at some point before being destroyed.

Some programmers possess a certain gracefulness of thought and design. Both of the engineering founders of my startup possess this gracefulness, where they are able to build elegant foundations and far-ranging solutions that solve multiple problems. My new focus on design work came out of this recognition. However, this particular problem is where some of my strengths really come into play - my abilities to cast a wide net for information, sift through large amounts of data, discard irrelevant data, and hone in on suspicious facts.

The new build contains a lot of collision enhancements, so a lot of bugs came out of the collision refactoring. This particular bug would break when the physics library instantiated a new shape. This lead me to looking through the code for all occasions where we create new physics objects and to understand what was happening at the time of instantiation. You can't place breakpoints in Lua, so I littered Lua code with log statements showing when new actors were being created, deleted, and when actors are creating or deleting their collision shapes.

It turns out that the build was crashing when the actors were updating their collision shapes. Our physics library doesn't allow resizing of collision shapes. If the actor has changed in size, the shape needs to be deleted and rebuilt. The log statements showed control entering the functions that handled collision shape resizing, but never exiting.

Actors only resize their collision shape if they have any behaviors that resize the actor. In this case, the actors that caused the crash all had some growth or resize occurring. To test this out, I removed the resizing behavior and found that the crashes went away. Ok, so resizing, which caused the collision shape to rebuild, was somehow at the bottom of this.

What was curious was that some of the actors being destroyed were calling on functionality that should not have been there. Actors rely on the Strategy design to provide both active and passive behavior, and collision shape functionality is part of an actor's passive behavior. Sometimes, actors don't need collision information whereas other times, the collision information is no longer attached to the actor when the actor is deleted. In the instances where the crash occurred, actors were trying to update collision shape information when they had no collision shape information.

In addition to log statements indicating control and execution, I started printing out pointer addresses. Where were new collision functionality objects being instantiated and deleted? Turns out that when an actor was being destroyed, a previously destroyed collision shape object was updating its shape. When actors are destroyed, they're not deleted but instead shoved into a resource pool that we use when we need new actors. Therefore, some destroyed actor's collision shape object was responding to actor destruction events.

The event observation and unobservation code looked correct in the actors and collision shape classes. On digging into the code for the event center however, I saw that objects that were trying to unobserve were not actually doing so, meaning actors in the actor pool were still responding to current events. A simple one line fix made everything right again. And then I had to remove all the log statements I littered in the code.

work