Lessons learned: Reusing Notifier Primitives

mje · September 17, 2010

I had a deadlock issue that has been dogging me for almost two weeks now, and I finally understand what's been happening. I figured I'd share the experience, because it to me at least, it seems to be caused by such an esoteric detail that I'm surprised I was even able to track it down. I'm hoping that if at least one other person learns something, the time I spent on this will be somewhat redeemed. So if you will, it's story time.

Please examine this little bit of code. The only non-stock VI is a simple read accessor for an object. Everything else is DVR, notifier, or error related. It's LV2010 code.

(Ignore the breakpoint please).

This little bit of logic has proven to be the bane of my existence for some time now.

I'll explain the logic briefly: This VI is meant to stop an asynchronous task that the PumpRef DVR encapsulates. The VI obtains a notifier reference, and checks for an existing notification via a 0 ms timeout.

False case: If we don't timeout, that means a previous notification exists and the task has already stopped (this is the case shown in the screenshot), and no operation on the DVR is performed.

True case: If we do timeout, this means the asynchronous task might still be running (yes might, just trust me on this, I said it's story time, not thesis time). So we send a signal to the asynchronous task to tell it to stop. This is not shown in the screenshot, it's in the True case of the case structure.

We then release the lock on the DVR, and block on the same notifier. One of two things should happen:

1) If the false case above fired, we'll just pass right through the wait since a notification already existed.

2) If the true case fired, at this point we'll block until the asynchronous task returns, then we'll be off to the races because the last thing the task does is signal the notifier.

Now there's a huge problem with this. The logic above is sound, but there's a very important implementation caveat about using the Wait on Notification primitive:

This function does not remove the message from the notifier. Although a specific instance of the function returns a message only once, other instances of the function or to the Wait on Notification from Multiple function repeat the message until you call the Send Notification function with a new message.

Emphasis added. It's the emphasized part that bit me in the behind. The logic of the framework I'm working on is a little more complicated than the simple case I outlined here (big surprise, huh?) and it turns out that the VI will sometimes be called twice in succession. Well, guess what, in that case the logic works like this:

First call, first Wait on Notification primitive: Timeout, the asynchronous task is running. A signal is sent (not shown, True case of the structure), and it starts the shutdown sequence.

First call, second Wait: Blocks, eventually the asynchronous task returns, signalling the notifier, and the VI ultimately returns.

Second call, first Wait: Returns notification, this particular instance of the prim has never seen the notification before.

Second call, second Wait: Deadlock.

Why a deadlock? Because the second instance of the Wait prim has already received the notification in the first call to the VI. It will never return.

The solution you ask? Use a single element queue, and do queue previews.

The lesson: Be very careful when reusing notifier primitives if you're expecting to be able to receive old notifications.

Bonus points: Reentrancy does not seem to affect primitive reuse. The VI above is in fact reentrant with multiple clones flying around. If clone 1 of the VI fires first, clone 2 will still deadlock. I did not expect this (queue up another few days of debugging)...

So cheers, and thanks for paying attention if you kept reading this far.

-Michael

ShaunR · September 17, 2010

I had a deadlock issue that has been dogging me for almost two weeks now, and I finally understand what's been happening. I figured I'd share the experience, because it to me at least, it seems to be caused by such an esoteric detail that I'm surprised I was even able to track it down. I'm hoping that if at least one other person learns something, the time I spent on this will be somewhat redeemed. So if you will, it's story time.

Please examine this little bit of code. The only non-stock VI is a simple read accessor for an object. Everything else is DVR, notifier, or error related. It's LV2010 code.

(Ignore the breakpoint please).

This little bit of logic has proven to be the bane of my existence for some time now.

I'll explain the logic briefly: This VI is meant to stop an asynchronous task that the PumpRef DVR encapsulates. The VI obtains a notifier reference, and checks for an existing notification via a 0 ms timeout.

False case: If we don't timeout, that means a previous notification exists and the task has already stopped (this is the case shown in the screenshot), and no operation on the DVR is performed.

True case: If we do timeout, this means the asynchronous task might still be running (yes might, just trust me on this, I said it's story time, not thesis time). So we send a signal to the asynchronous task to tell it to stop. This is not shown in the screenshot, it's in the True case of the case structure.

We then release the lock on the DVR, and block on the same notifier. One of two things should happen:

1) If the false case above fired, we'll just pass right through the wait since a notification already existed.

2) If the true case fired, at this point we'll block until the asynchronous task returns, then we'll be off to the races because the last thing the task does is signal the notifier.

Now there's a huge problem with this. The logic above is sound, but there's a very important implementation caveat about using the Wait on Notification primitive:

Emphasis added. It's the emphasized part that bit me in the behind. The logic of the framework I'm working on is a little more complicated than the simple case I outlined here (big surprise, huh?) and it turns out that the VI will sometimes be called twice in succession. Well, guess what, in that case the logic works like this:

First call, first Wait on Notification primitive: Timeout, the asynchronous task is running. A signal is sent (not shown, True case of the structure), and it starts the shutdown sequence.

First call, second Wait: Blocks, eventually the asynchronous task returns, signalling the notifier, and the VI ultimately returns.

Second call, first Wait: Returns notification, this particular instance of the prim has never seen the notification before.

Second call, second Wait: Deadlock.

Why a deadlock? Because the second instance of the Wait prim has already received the notification in the first call to the VI. It will never return.

The solution you ask? Use a single element queue, and do queue previews.

The lesson: Be very careful when reusing notifier primitives if you're expecting to be able to receive old notifications.

Bonus points: Reentrancy does not seem to affect primitive reuse. The VI above is in fact reentrant with multiple clones flying around. If clone 1 of the VI fires first, clone 2 will still deadlock. I did not expect this (queue up another few days of debugging)...

So cheers, and thanks for paying attention if you kept reading this far.

-Michael

Knarly one.

You don't get this problem if you acquire a reference before waiting by the way.

mje · September 17, 2010

True, that's another way to solve it. I dislike named objects though since their scope becomes global, so I pretty much treat them as global variables (I don't use them).

mje · September 17, 2010

Also noticed a problem with my last paragraph. It seems the problem only creeps up when consecutive calls to the same clone are made. Either way, reentrancy made for some nice and random behavior depending on how the clones had previously been used.

ShaunR · September 17, 2010

I'm not over-enthusiastic about notifiers (after my initial enthusiasm in seeing the potential). They could have been fantastic for my everyday use but for one caveat. You actually have to be waiting for it to register the notification. You can't (for example) send a notification and when the wait executes, it continues and removes it from the notifier. It will just wait. This behaviour is no good for asynchronous systems since you cannot guarantee that a notifier will be waiting when you send the message. Wait with history doesn't cut it either since the wait will execute multiple times. So you end up synchronising the systems manually to ensure the wait is always executed first, which defeats the object.

ned · September 17, 2010

I'm not over-enthusiastic about notifiers (after my initial enthusiasm in seeing the potential). They could have been fantastic for my everyday use but for one caveat. You actually have to be waiting for it to register the notification. You can't (for example) send a notification and when the wait executes, it continues and removes it from the notifier. It will just wait. This behaviour is no good for asynchronous systems since you cannot guarantee that a notifier will be waiting when you send the message. Wait with history doesn't cut it either since the wait will execute multiple times. So you end up synchronising the systems manually to ensure the wait is always executed first, which defeats the object.

I don't understand what you're saying here... you can send a notification when no one is waiting on it, and it will be seen by the next Wait on Notification unless the "Ignore Previous" input is TRUE.

jdunham · September 22, 2010

I don't understand what you're saying here... you can send a notification when no one is waiting on it, and it will be seen by the next Wait on Notification unless the "Ignore Previous" input is TRUE.

Yeah, ditto. We use notifiers all over the place and we don't have synchronization problems. They are awesome.

Sign In

Lessons learned: Reusing Notifier Primitives

Recommended Posts

mje

ShaunR

mje

mje

ShaunR

ned

jdunham

Join the conversation

Browse

Activity

Important Information