mje Posted September 17, 2010 Report Share Posted September 17, 2010 I had a deadlock issue that has been dogging me for almost two weeks now, and I finally understand what's been happening. I figured I'd share the experience, because it to me at least, it seems to be caused by such an esoteric detail that I'm surprised I was even able to track it down. I'm hoping that if at least one other person learns something, the time I spent on this will be somewhat redeemed. So if you will, it's story time. Please examine this little bit of code. The only non-stock VI is a simple read accessor for an object. Everything else is DVR, notifier, or error related. It's LV2010 code. (Ignore the breakpoint please). This little bit of logic has proven to be the bane of my existence for some time now. I'll explain the logic briefly: This VI is meant to stop an asynchronous task that the PumpRef DVR encapsulates. The VI obtains a notifier reference, and checks for an existing notification via a 0 ms timeout. False case: If we don't timeout, that means a previous notification exists and the task has already stopped (this is the case shown in the screenshot), and no operation on the DVR is performed. True case: If we do timeout, this means the asynchronous task might still be running (yes might, just trust me on this, I said it's story time, not thesis time). So we send a signal to the asynchronous task to tell it to stop. This is not shown in the screenshot, it's in the True case of the case structure. We then release the lock on the DVR, and block on the same notifier. One of two things should happen: 1) If the false case above fired, we'll just pass right through the wait since a notification already existed. 2) If the true case fired, at this point we'll block until the asynchronous task returns, then we'll be off to the races because the last thing the task does is signal the notifier. Now there's a huge problem with this. The logic above is sound, but there's a very important implementation caveat about using the Wait on Notification primitive: This function does not remove the message from the notifier. Although a specific instance of the function returns a message only once, other instances of the function or to the Wait on Notification from Multiple function repeat the message until you call the Send Notification function with a new message. Emphasis added. It's the emphasized part that bit me in the behind. The logic of the framework I'm working on is a little more complicated than the simple case I outlined here (big surprise, huh?) and it turns out that the VI will sometimes be called twice in succession. Well, guess what, in that case the logic works like this: First call, first Wait on Notification primitive: Timeout, the asynchronous task is running. A signal is sent (not shown, True case of the structure), and it starts the shutdown sequence. First call, second Wait: Blocks, eventually the asynchronous task returns, signalling the notifier, and the VI ultimately returns. Second call, first Wait: Returns notification, this particular instance of the prim has never seen the notification before. Second call, second Wait: Deadlock. Why a deadlock? Because the second instance of the Wait prim has already received the notification in the first call to the VI. It will never return. The solution you ask? Use a single element queue, and do queue previews. The lesson: Be very careful when reusing notifier primitives if you're expecting to be able to receive old notifications. Bonus points: Reentrancy does not seem to affect primitive reuse. The VI above is in fact reentrant with multiple clones flying around. If clone 1 of the VI fires first, clone 2 will still deadlock. I did not expect this (queue up another few days of debugging)... So cheers, and thanks for paying attention if you kept reading this far. -Michael 1 Quote Link to comment
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.