Diagnosing "Serious Errors" with Windows Memory Diagnostic
We use fast user switching in Windows XP at my house so I don't have to close all of my running apps just to let another family member check their email. One of the greatest annoyances which up until recently had been very rare was to sit down to click my username and not see any indication of how many programs I had running listed underneath it - a sure sign that the computer had rebooted on its own.
Once you log on after an "unanticipated" reboot, XP tells you that the system has recovered from a "serious error" - i.e. a BSOD happened and Windows rebooted as configured. You can happily send the error report off to Microsoft Windows Crash Analyis and they will give you some info on what might be wrong, but in some cases, the cause of these reboots might keep changing. In my case, I would awaken to find several reboots had happened, and the diagnosed causes would range from an IE plugin (I don't even use IE) to an error in an unknown driver. I took this as a sign that there was actually some kind of hardware problem - most likely a memory issue.
I burned some self-booting memory testing ISO images to a few CD's (namely OCZ's modified memtest86 and Microsoft's Windows Memory Diagnostic). I started with WMD, running the extended tests, and saw hundreds of errors happen almost immediately on the first run of one of the tests. So now that I had found these memory errors, what was the next step? Microsoft suggests in their documentation that their tests will try to determine which memory bank caused the errors if possible, and that you should then flip your DIMM's around to see if you get the errors in the same bank. This would indicate a bad bank. If the errors move to the other bank, that would indicate a bad memory module. So where were my errors happenning? WMD was unable to determine that, so I was left on my own.
Luckily, WMD does show you the hexidecimal memory addresses as the errors occur. Since I knew I had two 1GB modules installed, my adressable memory space was something like 2 * 1024(mb in a gb) * 1024(kb in a mb) * 1024(bytes in a kb) = 2147483648 decimal bytes or 0x80000000 hex bytes. Since I was using two DIMM modules, slot 0 would map to adressable memory up to 0x40000000 and anything above would be in slot 1. Armed with this information, I could tell that all of the errors being reported by WMD could be attributed to slot 1 (not sure why WMD itself could not make this determination).
The next step was to flip the modules and see if the errors jumped to slot 0. I tried this and the errors stayed in slot 1. I then moved one of the modules to slot 2 (my mobo has three slots, slot 0 is DIMMA and slots 1 and 2 are designated as DIMMB1 and DIMMB2). Slots 1 and 2 were grouped together, apart from slot 0, so I was not surprised that slot 2 also showed many errors.
Looks like it's time for a new motherboard. For the time being, I am left with two perfectly good 1GB memory modules but only one reliable DIMM slot. Newegg only sells a handful of AMD Socket A mobos these days (the defective board in question is a Biostar M7NCD Pro) since Athlon XP processors are getting up there in age, but I found a cheap replacement in a used Albatron KX400-8X for $20 from a seller on Anandtech's Sale/Trade Forum which should arrive later this week.
Once you log on after an "unanticipated" reboot, XP tells you that the system has recovered from a "serious error" - i.e. a BSOD happened and Windows rebooted as configured. You can happily send the error report off to Microsoft Windows Crash Analyis and they will give you some info on what might be wrong, but in some cases, the cause of these reboots might keep changing. In my case, I would awaken to find several reboots had happened, and the diagnosed causes would range from an IE plugin (I don't even use IE) to an error in an unknown driver. I took this as a sign that there was actually some kind of hardware problem - most likely a memory issue.
I burned some self-booting memory testing ISO images to a few CD's (namely OCZ's modified memtest86 and Microsoft's Windows Memory Diagnostic). I started with WMD, running the extended tests, and saw hundreds of errors happen almost immediately on the first run of one of the tests. So now that I had found these memory errors, what was the next step? Microsoft suggests in their documentation that their tests will try to determine which memory bank caused the errors if possible, and that you should then flip your DIMM's around to see if you get the errors in the same bank. This would indicate a bad bank. If the errors move to the other bank, that would indicate a bad memory module. So where were my errors happenning? WMD was unable to determine that, so I was left on my own.
Luckily, WMD does show you the hexidecimal memory addresses as the errors occur. Since I knew I had two 1GB modules installed, my adressable memory space was something like 2 * 1024(mb in a gb) * 1024(kb in a mb) * 1024(bytes in a kb) = 2147483648 decimal bytes or 0x80000000 hex bytes. Since I was using two DIMM modules, slot 0 would map to adressable memory up to 0x40000000 and anything above would be in slot 1. Armed with this information, I could tell that all of the errors being reported by WMD could be attributed to slot 1 (not sure why WMD itself could not make this determination).
The next step was to flip the modules and see if the errors jumped to slot 0. I tried this and the errors stayed in slot 1. I then moved one of the modules to slot 2 (my mobo has three slots, slot 0 is DIMMA and slots 1 and 2 are designated as DIMMB1 and DIMMB2). Slots 1 and 2 were grouped together, apart from slot 0, so I was not surprised that slot 2 also showed many errors.
Looks like it's time for a new motherboard. For the time being, I am left with two perfectly good 1GB memory modules but only one reliable DIMM slot. Newegg only sells a handful of AMD Socket A mobos these days (the defective board in question is a Biostar M7NCD Pro) since Athlon XP processors are getting up there in age, but I found a cheap replacement in a used Albatron KX400-8X for $20 from a seller on Anandtech's Sale/Trade Forum which should arrive later this week.
