I have been working with a client and Microsoft on a very difficult issue with their Exchange 2003 system.  A few months ago, a particular store started exhibiting Event ID 623 errors from source ESE – the Extensible (or Exchange) Storage Engine.  Since this error was coming up on a server that was in the process of being decommissioned, the suggestion to “move the users to a new store” was extremely feasible.

But the problem came back 22 days later on one of the 2 stores that the users were moved to, so we knew something else must be up.  I’ll cut to the chase and explain that Microsoft now is very positive of what is happening, just not who is causing it or why it’s happening.

What’s frustrating about this is that all the tools that can be used to look deeper into this problem aren’t available to me as a technician outside of Microsoft.  All I’ve been able to do for my client is set up triggers to cause “Exchange store.exe dumps” which are essentially process freezes followed by private memory dumps to disk.  The good thing is that the end users don’t notice, nor does the Windows 2003 Cluster service.  Also, our Microsoft support team has been great at sharing information with us.

But the problem still remains, that there is nothing at all that I can do to fix this problem.  I can’t run the debug programs (I can run a debug against the process, but not to the same level of detail, due to a lack of published information) that Microsoft has available, despite a very deep understanding of how the ESE runs the EDB, STM, and LOG files (for an outside consultant who just reads voraciously).  This inability to better service my customers frustrates me to no end, whether Microsoft’s technicians are fantastic or not (there have been other times…).

So, while I wait for them to get back to me on yet another dump that has been generated, looking for a very elusive fSearch() operation against one of my client’s many Exchange 2003 stores, I sit on my hands in anticipation, wishing to be able to do more.