I'm posting this so that I can print it out and tape it to my forehead for future reference.
When trying to determine the root cause of strange behavior in your code, you have to sometimes use a process of elimination. Sometimes, it's best to check the easiest possibilities first, regardless of how likely you may think any given cause is. My code was behaving in pseudo consistent ways, something would fail, repeatedly. I would add code to try and catch the error before thing went haywire. I was convinced I had a memory overwrite, or a stack overflow. As soon as I would add the code to catch a problem. It would start working, or stepping though the code would give a different result than running full speed.
Here is a list of things to check while your scratching your head: (In this order)
- Fuses. (This turned out to be my problem). I had my fuses set for an external RC oscillator, instead of a crystal. Fuses are so easy to check, it should be the first thing you do. Scrutinize each setting, read the datasheet for each one.
- Proper capacitors on the external crystal.
- Breadboard noise. I rebuilt my breadboard, it started working better, but not perfect, so I'd assumed it was noise and a memory clobber causing the issues.
- Proper JTAG connection. (JTAG seems more sensitive to noise than SPI/debug wire debugging). To eliminate this as a possible cause, program the flash with the "device programming tool" in AVR Studio, using the generate hex file, and use the verify feature. Write some diagnostic code to print to a serial port, then disconnect the JTAG. If your problems go away, it may have been the tool.
- Memory stomp. These are pretty common, and can be hard to pinpoint, in pointer arithmetic-heavy code. Stack overflows are particularly nasty and difficult to detect, set breakpoints at your "deepest" code paths, and look at the SP register in the debugger. Comment/stub out large local variables, and eliminate recursion.
- Compiler bugs
The last one is actually pretty rare, but it does happen. Odds are you will have found the problem before oyu get to that stage.