Windows 7 - loader lock deadlock in MiniDumpWriteDump

Asked By jfire on 27-Apr-07 06:22 PM
Our application uses MiniDumpWriteDump to capture the stack when throwing
unexpected C++ exceptions. We are seeing that in some situations, this causes
the application to deadlock. What MiniDumpWriteDump appears to do is:

1. Suspend all threads in the process except the calling thread.
2. Attempt to load psapi.dll.

This is the wrong order, as step 1 might have suspended threads holding the
DLL loader lock, in which case step 2 would deadlock. In fact, this is what
happens to our application.

This is using dbghelp.dll version 6.6.7.5.

Here's the stack trace of the thread calling MiniDumpWriteDump:

ntdll!ZwWaitForSingleObject+0x15
ntdll!RtlpWaitOnCriticalSection+0x19c
ntdll!RtlEnterCriticalSection+0xa8
ntdll!LdrLockLoaderLock+0xe4
ntdll!LdrLoadDll+0xc9
kernel32!LoadLibraryExW+0x1b2
kernel32!LoadLibraryExA+0x1f
kernel32!LoadLibraryA+0xaf
dbghelp!Win32LiveSystemProvider::LoadPsApi+0x23
dbghelp!Win32LiveSystemProvider::EnumModules+0x19e
dbghelp!NtWin32LiveSystemProvider::EnumModules+0x1f
dbghelp!GenGetProcessInfo+0x328
dbghelp!MiniDumpProvideDump+0x1ec
dbghelp!MiniDumpWriteDump+0x1c1

Here's the stack trace of the thread that's been suspended while holding the
loader lock:

crypt32!_SEH_prolog+0x5
crypt32!_DllMainCRTStartup+0x52
ntdll!LdrpCallInitRoutine+0x14
ntdll!LdrShutdownThread+0xd2
kernel32!ExitThread+0x2f
kernel32!BaseThreadStart+0x34

Can a Microsoft rep confirm this as a bug in dbghelp.dll version 6.6.7.5,
and if possible offer a workaround?




Skywing [MVP] replied on 27-Apr-07 10:06 PM
You should really be calling MiniDumpWriteDump from a different process
entirely.  Using it on the current process has always been unreliable
(especially if you are dealing with heap corruption or the like).  The usual
route I take is to have a separate watchdog process that is signaled by an
exception handler in the faulting process.  The watchdog process then does
the minidump writing, from a known-good process instead of a crashed (or
potentially otherwise damaged one).

I suppose you could try and make a case for C++ exceptions being not
necessarily indicative of some sort of catastrophic corruption (a-la heap
corruption), depending on your application, but it is still better to just
an isolated process in a known-good state.

--
Ken Johnson (Skywing)
Windows SDK MVP
http://www.nynaeve.net
jfire replied on 27-Apr-07 11:30 PM
I agree with you in the case of non-C++ exceptions where at the point where
the exception filter or handler is invoked the process is no longer in a
well-defined state; in fact I'm in the process of modifying our application
so that non-C++ exceptions go straight to Dr. Watson. (In fact it was in part
one of your blog posts that convinced me to do so -- thanks for that!)

But C++ exceptions are another matter. They have completely well-defined
behavior, and modulo this loader-lock bug it seems eminently reasonable to
use MiniDumpWriteDump in-process to capture the stack. Doing this out of
process adds complexity, overhead, and failure modes, particularly when you
consider that MiniDumpWriteDump is not the end of the line for C++
exceptions: after the application writes the dump we want to unwind the stack
(so that RAII'd resources are released) and most likely do additional
logging, error reporting, and recovery.
Skywing [MVP] replied on 28-Apr-07 01:00 AM
Yeah, C++ exceptions can be a gray area there; I agree.

Even if you preload psapi.dll, NTDLL will still try to grab the loader lock
to increment the loader data table entry reference count corresponding to
that module, so that wouldn't really help here.  Definitely an unpleasant
problem.  In fact, you're probably likely to run into other deadlocks too
( such as if a suspended thread owned the process heap lock ) even if the
loader lock doesn't come around to bite you.

What minidump options were you using when you made the dump, and did you
supply any exception information (if so, marked them as client pointers or
not?) or a callback routine to MiniDumpWriteDump?  (would be good to know
for duplicating the condition - this is something I haven't run into
personally, though it does look like an interesting problem)

--
Ken Johnson (Skywing)
Windows SDK MVP
http://www.nynaeve.net
Ivan Brugiolo [MSFT] replied on 28-Apr-07 09:56 PM
The default UnhadledExceptionFilter refuses doing any reasonably
complex operation in the case the LoadeLock is not being held
by the current thread.
This is done to avoid the problem you have exposed below.
It's not a bug in dbgeng.dll. It's a design constrain that prevents that
operation.

--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
jfire replied on 30-Apr-07 01:16 PM
Ivan,

Your reply was a bit cryptic to me. Are you saying that MiniDumpWriteDump is
not safe to call in-process in any circumstances? This is contrary to what
the API documentation and numerious MSDN examples indicate.

BTW, if this is a bug, it is in dbghelp.dll, not dbgeng.dll.
Ivan Brugiolo [MSFT] replied on 30-Apr-07 01:41 PM
I'm saying that there is no system components that
calls MiniDumpWriteDump in-process, and, the system components
that triggers the faultrep.dll / DrWatson / UnhandledExceptionFitler
have all been written with care to not do unreasonable un-necessary work
in process, and, to not do any work at all if the loader lock is being held.

--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
jfire replied on 04-May-07 01:55 PM
Ok, that's an interesting data point, but it doesn't really answer any of my
questions. To restate them:

Is it intended that MiniDumpWriteDump be safe to call in-process in some
circumstances? Specifically, is it safe to call within an __except filter
expression when the exception is a C++ exception? If so, is this loader lock
deadlock considered a bug, and if so is there a workaround? If not, why does
MSDN documentation and example code imply that calling MiniDumpWriteDump
in-process _is_ safe (see e.g. the second paragraph in "Remarks" at
http://msdn2.microsoft.com/en-us/library/ms680360.aspx)?
Ivan Brugiolo [MSFT] replied on 04-May-07 02:44 PM
It's safe to call MiniDumpWriteDump if the state of the process
is not too compromised and if you don't have the loader lock.
The `holding the loader lock` is deterministic, while `not too compromised`
is a little big vague, but, if you have a major heap corruption coing on,
some stack corruption, or something along those lines, then,
it's hard to express those conditions in a verifiable way.
An exception-filter and/or an exception-handler may be called
while the loader lock is held (for example, from _DllMainCRTStartup /
DllMain).

--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
jfire replied on 04-May-07 02:59 PM
Thanks Ivan, that answers my question. For my purposes, I am willing to make
the assumption that "doing C++ exception handling" is a situation in which
the application is "not too compromised", so that part is taken care of. For
the other part, how would I determine if the application does or does not
hold the loader lock?
jfire replied on 04-May-07 03:45 PM
After thinking about this a bit more, I realized that even if I could check
if the loader lock was held before calling MiniDumpWriteDump, that wouldn't
be sufficient. After the check but before the MiniDumpWriteDump suspends the
other threads, one of those other threads (possibly one I did not create and
have no control over) could take the loader lock, then be suspended by
MiniDumpWriteDump. Classic race condition.

In other words, the "you don't have the loader lock" condition is not quite
correct. It seems to me that one would either have to (a) actually take and
hold the loader lock for the duration of the call to MiniDumpWriteDump or (b)
suspend all threads but the current thread, then check if the process holds
the loader lock, then if not, call MiniDumpWriteDump, then resume the threads.

Ivan, are either of these the intended safe usages of MiniDumpWriteDump
in-process? In my mind they look like complex workarounds for functionality
that should already be in MiniDumpWriteDump.
Ivan Brugiolo [MSFT] replied on 04-May-07 04:14 PM
The condition should be re-phrased as:
MiniDumpWriteDump is safe to use if noboy holds
the loader lock at the same time the call is in progress.
That's the reason why the WER infrastructure in Vista was
carefully desinged to be all out-of-proc, and, in WinXp/Win2003
the code calling faultrep.dll attempted to take all the possible
preacutions that was possible to validate.
MniDumpWriteDump is not designed to be safe from usafe callers.

--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
jfire replied on 04-May-07 04:35 PM
But how could my application satisfy that condition? Even if it never
explicitly creates threads, various Windows APIs it uses might, and I have to
assume those threads could potentially hold or take the loader lock before or
during the call to MiniDumpWriteDump.

If there is no way for the application to guarantee that condition, what you
are saying is equivalent to saying "MiniDumpWriteDump is never safe to call
in-process", and the API documentation needs to reflect that.