Jump to content

Various Rendezvous Bugs


Recommended Posts

I ran into this dead-lock condition the first time a use SEQs. My approach is to use the error cluster and Select primitive to determine if I will enqueue the original state (which comes directly from the dequeue) or new state information. However, the Enqueue VIs error in terminal is left unwired to ensure that something is always in the queue in either case.

~Dan

Sorry the image didn't upload from work.

Edited by Dan DeFriese
Link to comment

Here's my approach to #1 - Basically my rational is that if the subVI returns an error its output can't be trusted and I don't want it do anymore damage (e.g. stop the bleeding). Since I dequeue outside the error bypass I know that the queue reference is valid so just re-enqueue the existing element.

Thanks for the tip in #2

Of course, now that I've re-read your post over there I realize that your describing a real bug in #3. Good luck!

~Dan

post-3463-127198582646_thumb.png

post-3463-127198583622_thumb.png

Link to comment

I got some CAR numbers from NI:

CAR 222274, 222275 and 222276 respectively

I guess I don't see how #1 and #2 could be considered bugs. The "...runs normally only if no error occurred before the VI or function runs..." is right there in the manual.

P.S. Has anybody considered implementing SEQ with Preview Queue Element and Lossy Enqueue Element instead of dequeue and enqueue?

Edited by Dan DeFriese
Link to comment

P.S. Has anybody considered implementing SEQ with Preview Queue Element and Lossy Enqueue Element instead of dequeue and enqueue?

How's that going to work? The whole point of a SEQ is that all dequeues have no timeout, thus locking the access to the shared resource. Previewing doesn't do that (and it creates a copy of the data), so if you simply want a globally available piece of data, you can use a notifier (preview it to avoid losing old values) or even a global. Incidentally, in 2009 you can simply use a DVR instead of a SEQ.

Link to comment

The whole point of a SEQ is that all dequeues have no timeout, thus locking the access to the shared resource.

Exactly... that's why #1 is not a bug.

Incidentally, in 2009 you can simply use a DVR instead of a SEQ.

Thanks, I'm using 8.6. Our team will be moving to LV2010 when available. However, I don't want to find that I have to rework everything I did for the last 2.5 years b/c the implementation of the queue primatives have changed.

Link to comment

I guess I don't see how #1 and #2 could be considered bugs. The "...runs normally only if no error occurred before the VI or function runs..." is right there in the manual...

Exactly... that's why #1 is not a bug.

It's a Wait at Rendezvous bug, not a SEQ bug. Wait at Rendezvous, a vi shipped with Labview, does not always run normally in cases where no error occurred before the vi runs. Internally a rendezvous works using notifiers on a queue. Every time WaR executes, it obtains a new notifier and immediately waits on that notifier. When the number of notifiers meets or exceeds the rendezvous size, dummy data is enqueued on all the notifiers, releasing them to continue execution. If one of those internal notifiers happens to be closed, Release Waiting Procs throws an error which in turn prevents the notifiers from being put back on the queue, which then prevents all of the other rendezvous vis from being able to dequeue the notifiers.

I haven't yet figured out how Jim was able to invalidate one of the internal notifiers...

Link to comment

It's a Wait at Rendezvous bug, not a SEQ bug. Wait at Rendezvous, a vi shipped with Labview, does not always run normally in cases where no error occurred before the vi runs. Internally a rendezvous works using notifiers on a queue. Every time WaR executes, it obtains a new notifier and immediately waits on that notifier. When the number of notifiers meets or exceeds the rendezvous size, dummy data is enqueued on all the notifiers, releasing them to continue execution. If one of those internal notifiers happens to be closed, Release Waiting Procs throws an error which in turn prevents the notifiers from being put back on the queue, which then prevents all of the other rendezvous vis from being able to dequeue the notifiers.

I haven't yet figured out how Jim was able to invalidate one of the internal notifiers...

Here are my thoughts on why the internal notifier queues are going invalid.

1) The internal notifier queues are created inside WaR

1.1) WaR is reentrant.

2) WaR is called inside a reentrant Dynamic Dispatch Method VI

2.1) Reentrant Dynamic Dispatch VIs must use the Share clones between instances setting.

3) the reentrant Dynamic Dispatch Method VI is called inside several asynchronously running top-level VIs.

The problem is basically that specific instances of WaR are being called inside the call chains of several different top-level VIs. So, there's no way to gaurantee which call chain the cached internal notifier queue was created within (thus binding the queue's lifetime to the lifetime of the top-level VI).

Bottom line: never, ever cache a reference inside the shift register of a reentrant VI which is the creator of the reference.

There is no way to gaurantee that such a VI won't be called by a reentrant Dynamic Dispatch Method VI from multiple asynchronously running top-level VIs.

Now, in defense of whoever designed the refactored (with queues) Rendezvous, Dynamic Dispatch Method VIs didn't exist back in the 8.0 days (which is about the time when the Rendezvous, Semaphores, etc. were refactored with Queues, if memory serves me right).

Link to comment

It's a Wait at Rendezvous bug, not a SEQ bug. Wait at Rendezvous, a vi shipped with Labview, does not always run normally in cases where no error occurred before the vi runs. Internally a rendezvous works using notifiers on a queue. Every time WaR executes, it obtains a new notifier and immediately waits on that notifier. When the number of notifiers meets or exceeds the rendezvous size, dummy data is enqueued on all the notifiers, releasing them to continue execution. If one of those internal notifiers happens to be closed, Release Waiting Procs throws an error which in turn prevents the notifiers from being put back on the queue, which then prevents all of the other rendezvous vis from being able to dequeue the notifiers.

I haven't yet figured out how Jim was able to invalidate one of the internal notifiers...

Thanks... Now, where is that Foot-In-Mouth icon. I guess this one will have to do book.gif . I should have paid more attention to the title of the thread. Now I see that he was in fact discussing the Rendezvous API and suggesting changes to it and NOT the Queue and Notifier primatives. Obviously, it was changes to the latter which had me concerned.

Again, thanks Daklu for clarifying point of the discussion!

~Dan

Link to comment

Now, in defense of whoever designed the refactored (with queues) Rendezvous, Dynamic Dispatch Method VIs didn't exist back in the 8.0 days (which is about the time when the Rendezvous, Semaphores, etc. were refactored with Queues, if memory serves me right).

Actually, I'm pretty sure it was only in 8.6, so that excuse can't be used.

Link to comment

Thanks... Now, where is that Foot-In-Mouth icon.

If you happen to find one let me know. I need it quite often.

Bottom line: never, ever cache a reference inside the shift register of a reentrant VI which is the creator of the reference.

At the risk of being struck down by lightning, I don't think I agree with this. If the reference is never used anywhere else, where should it be cached? As long as the vi is a preallocated clone (WaR is) there shouldn't be an issue. Putting WaR inside a shared reentrant vi is (I think) essentially making that instance a shared clone, which is violating the (undocumented) terms under which WaR is expecting to run. What happens if you make your shared clone dynamic dispatch vi a preallocated clone?

I'm not saying these aren't bugs. Undoubtedly WaR is leaking implementation details which is never a good thing (though often unavoidable.) I don't know that there's a suitable fix NI can implement. Given that WaR is a preallocated clone, it seems to me the purpose of the shift register is to avoid excessive overhead in loops. I ran some timing tests of the current implementation against your proposed fix where WaR creates and destroys the internal notifier queue with every call. The current implementation averaged ~10us per iteration; without the SR it averaged ~32us per iteration. Not a big deal in most desktop apps. Could be a problem in RT apps.

post-7603-12722371985_thumb.png

This does highlight an issue I think NI has not adequately addressed. Developers need a multi-layered api to create solutions to the problems we encounter. Labview's high level api is okay if your needs are very simple and specific. It's low level api is pretty good. It's mid-level api, the stuff that would really help promote better software engineering, is sorely lacking.

Link to comment

At the risk of being struck down by lightning, I don't think I agree with this. If the reference is never used anywhere else, where should it be cached?

I would argue that it shouldn't be cached, by default.

As long as the vi is a preallocated clone (WaR is) there shouldn't be an issue.

Almost. Only as long as there are no dynamic calls in the call chain -- there is the possibility that dynamic calls using VI Server CBR nodes could cause the same WaR bug to appear.

Putting WaR inside a shared reentrant vi is (I think) essentially making that instance a shared clone, which is violating the (undocumented) terms under which WaR is expecting to run.

The difference between a bug and a feature can be debate without end. I don't want this performance "feature", because it causes a "bug" in my code. Since this constraint is not published, I feel it's a bug. :)

What happens if you make your shared clone dynamic dispatch vi a preallocated clone?

Dynamic dispatch VIs cannot be set to preallocated reentrancy -- it will cause them to be broken.

I'm not saying these aren't bugs. Undoubtedly WaR is leaking implementation details which is never a good thing (though often unavoidable.) I don't know that there's a suitable fix NI can implement.

Given that WaR is a preallocated clone, it seems to me the purpose of the shift register is to avoid excessive overhead in loops. I ran some timing tests of the current implementation against your proposed fix where WaR creates and destroys the internal notifier queue with every call. The current implementation averaged ~10us per iteration; without the SR it averaged ~32us per iteration. Not a big deal in most desktop apps. Could be a problem in RT apps.

I would argue that if there's a use case for a performance improvement whose implementation has constraints on modes of use, that this be made an optional, Boolean argument (named something like: reuse messaging queues for minor performance boost).

Cheers,

  • Like 1
Link to comment

See, where would I be without more experienced developers telling me why I'm wrong? Thanks for not sending the lightning.

Dynamic dispatch VIs cannot be set to preallocated reentrancy -- it will cause them to be broken.

Pfft... minor implimentation detail. wink.gif Now where's that foot-in-mouth icon I was looking for...

I would argue that if there's a use case for a performance improvement whose implementation has constraints on modes of use, that this be made an optional, Boolean argument (named something like: reuse messaging queues for minor performance boost).

Yep, I agree. My entire post was based on the premise that your solution was the more restrictive case. Take away my work around and I got nothing.

I don't like the boolean argument though. I'd prefer something like a bridge pattern that allows more extensibility.

Link to comment
  • 3 months later...

here is another issue I faced caused by "Not A Number/Path/Refnum" Function in LabVIEW 2010. But I'm not sure, if this is a bug.

In the past, the rendezvous code was implemented using a CIN, so the reference wasn't actually a LabVIEW reference (which is why the palette has the NaR? VI).

The rendezvous VIs have been refactored internally, but I guess this was kept for backward compatibility (although that doesn't make much sense, since using NaN? on an RV ref would have always resulted in T).

So, strictly speaking, I would say this is not a bug. Ideally, the primitive should have broken the wire, but I'm guessing it still recognizes it as a reference.

Link to comment

Thanks for your hint. I agree with " Ideally, the primitive should have broken the wire...".

Unfortunately, we can't do that either. Those types were written long ago and they use the ancient "this is a datalog refnum of an enum type" trick. So, properly, this is a type that the Not A Refnum primitive can accept, and it is possible that you might actually have a datalog refnum of the type and it might actually be a valid open file.

We got rid of the CIN but we kept the API identical. All of it.

By the way, Wolfram, congratulations on being only the 6th user I've ever heard of using the Rendezvous VIs in the 10 years I've been working on LabVIEW.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.