Jump to content

Notifier signals missed


Recommended Posts

I have some problems with notifiers. The more I have communication trough the notifiers, the more there seems to be missed notifications; although I see no reason they really should be missed. The misses disappear if the program execution speed is slowed down.

I attatch the following LabVIEW 8.0 project that represents the problem. Open the project and run the file "Test Scalability.vi". When it's ran on my computer, it runs some iterations and then freezes because it fails to catch a notification or a notification is not properly send. The reason it freezes it not completely clear to me. The problem may be either in the notification system or even in the scheduler or there could be a bug in my code that I fail to find.

EDIT: I forgot, I tested and the behaviour is still present in LabVIEW 8.20.

Download File:post-4014-1159808271.zip

Link to comment

Hi Jimi,

I think the problem is that you fire the start notifier too soon after loading the reentrant instance.

This forces some of the launched processes to hang, and in the end the loop (in step 3) will hang since it expects all notifiers in a certain order.

I tried to put a delay before the start notifier, and then it worked.

You could just add a response notifier to be sent back after a process has started, and wait for this in your "Open Instance.vi".

/J

I have some problems with notifiers. The more I have communication trough the notifiers, the more there seems to be missed notifications; although I see no reason they really should be missed. The misses disappear if the program execution speed is slowed down.

I attatch the following LabVIEW 8.0 project that represents the problem. Open the project and run the file "Test Scalability.vi". When it's ran on my computer, it runs some iterations and then freezes because it fails to catch a notification or a notification is not properly send. The reason it freezes it not completely clear to me. The problem may be either in the notification system or even in the scheduler or there could be a bug in my code that I fail to find.

EDIT: I forgot, I tested and the behaviour is still present in LabVIEW 8.20.

Download File:post-4014-1159808271.zip

Link to comment
I think the problem is that you fire the start notifier too soon after loading the reentrant instance.

This forces some of the launched processes to hang, and in the end the loop (in step 3) will hang since it expects all notifiers in a certain order.

I tried to put a delay before the start notifier, and then it worked.

You could just add a response notifier to be sent back after a process has started, and wait for this in your "Open Instance.vi".

/J

JFM, thanks for your comments. I don't really understand why it should behave the way you describe. I can first create a notification, then open a VI reference, then pass the notification to this VI and let this VI wait for the notification. I tested it and it works (see the attachment below). Therefore it shouldn't matter if I send the notification before all the reentrant VI instances are ready to wait for it; they should be able to catch the notification later on as they are ready. After all I don't destroy the any of the notifiers until in the very end. Therefore the notifiers should stay present as instance VIs come and go.

About the step 3 loop, it doesn't require the notifications to arrive in certain order as long as all of the return notification arrive. It goes trough them in certain order, but it all notification eventually arrive no matter which order, the loop should pass.

Download File:post-4014-1159812121.zip

Link to comment

I meant that the loop halts when it comes to a missing notification, sorry for that.

Maybe you are right that it is a bug, but I have experienced this behaviour before; with queues/notifiers/occurrences. So it seems like a good practice to always get a confirmation that a process is started, and ready for operation.

/J

About the step 3 loop, it doesn't require the notifications to arrive in certain order as long as all of the return notification arrive. It goes trough them in certain order, but it all notification eventually arrive no matter which order, the loop should pass.
Link to comment

Jimi,

I tried to add an acknowledge notifier that is sent back to the creator VI. This helps a bit, program does not hang as often as before.

I then added a small delay in the creation loop (10~20ms), and together with the ACK-notifier the program then finished all 100 iterations.

Maybe I can accept that you need to get an ACK before proceeding to the next step, but a delay? If the delay is too small, the program still hangs.

Does this mean that a notifier can not be created too close to the previous notifier?

Or, is it the "prepare for reentrant run" that can not be called at this rate?

/J

Link to comment
Sounds like a bug to me. I wonder if this bug exists also for queues and other inter-thread communication nodes.

Seems to occur only for Notifiers.. I have an 8.20 program that initializes 2 Q's and 4 notifiers, and inspite of 100+ms waits, it occasionally will miss an initial notifier (which starts the tcp loops).

Neville.

Link to comment

Take a look at

<labview>\vi.lib\Utility\Wait for All Notifcations.llb\Wait for All Notifications.vi

This shows a correct implementation for how to wait for an entire set of notifications to fire. I'm not sure what is wrong in your code, but maybe by code comparison you can find it. I'm still hunting around, but I just started looking at this and it may take a while to untangle.

Link to comment
Take a look at

<labview>\vi.lib\Utility\Wait for All Notifcations.llb\Wait for All Notifications.vi

This shows a correct implementation for how to wait for an entire set of notifications to fire. I'm not sure what is wrong in your code, but maybe by code comparison you can find it. I'm still hunting around, but I just started looking at this and it may take a while to untangle.

I think I have used the Notifiers correctly. Just to help with your trouble-shooting, I will describe my scenario:

1 Init 2 Named Notifiers

2 Pass each Notifier ref to 2 parallel VI's: PROCESS & TCP. (Both VI's & respective subVIs re-entrant)

3 TCP immediately waits for a notifier.

4 When PROCESS has finished some initialization steps, it fires a notification to TCP to start up.

5 Steps 3 & 4 repeated for another set of PROCESS & TCP.

Notifiers only used once at startup, never again.

Internal to TCP there are two parallel loops that use an unnamed notifier internally to pass TCP session data. These seem to never miss (else TCP would have failed often, but it seemed to be quite robust).

What I found was that one of the TCP vi's would wait at notification forever (missed notification) at startup. It wasn't consistent which one.

I added logic in the PROCESS vi to check notification status and if there was non-zero waiting on notification to fire it again with upto 3 re-tries, but this almost ALWAYS resulted in #waiting on notifiers as 0 (indicating the notification was either read or was fired before the VI was waiting for it).

I also added about 150ms delays BEFORE firiing the notifiers but it would still miss them from time to time.

I just worked around this by removing the notifiers and using single element Q's instead.

Sorry, as you can see, the code is fairly complex, and difficult for me to offer a screenshot or a set of VI's.

Neville.

Link to comment
It's going to take me a while to write this up... I'll post it in the other thread later today or tomorrow.

I eagerly wait for the explanation. As I have read the notifier help files, I have got the impression that my example should have worked. Perhaps if you decide to stick with the current implementation you explain the limits in LabVIEW help in more detail. I'm happy I can help to improve LabVIEW.

EDIT: I assume that the problem is related to this sentence: "If ignore previous is FALSE, each instance of the Wait on Notification function waits if the message in the notifier has a time stamp for the same time that the instance of the function last executed. If the message is new, then the function returns the message."

I only have one instance of notifier wait functions in the loop. As a notification is catched, the time stamp is reseted to the new time value. If there was a notification sent before, in another notification ref, that cannot be catched any more. Am I right?

Link to comment

This is not a bug. :D After analyzing the full test suite, I can say definitively that it is not behavior that should change. :D:D The explanation is not simple, so it may be worth expanding the documentation to talk about this case explicitly.

Short answers:

0) This does not affect queues at all.

1) When waiting for multiple notifiers, use Wait for Multiple Notifications primitive. It exists to handle a lot of the complexity that I'm about to talk about.

2) There's an example program showing how to wait for ALL of a set of multiple notifiers at "<labview>\examples\general\WaitForAll.llb"

Long answer:

It is very easy to think that this is a bug. I was tempted to agree until I watched what was happening in the actual assembly code and I realized that the Wait for Notification nodes were correctly ignoring some notifications.

Terminology:

Notifier: a mailbox, a place in memory where messages are stored.

Notifier Refnum: a reference to one of these mailboxes which is used on the block diagram to access the mailboxes

Notification: a message sent to a notifier

Node: an icon on the block diagram; in this explanation, we're interested mostly in the nodes that represent functions for operating on the notifiers

W4N: shorthand notation for "Wait for Notification" node.

W4MN: shorthand notation for "Wait for Multiple Notifications" node.

The situation in the posted example (leaving out all the reentrant subVIs stuff):

1) N notifiers are created.

2) A loop is iterating over the list of the N, doing W4N on each one.

3) In another section of code, Send Notification is called for each of the N in a random order.

Expected result: The listener loop will check the first notifier. If it has a notification available already, then the node will return the posted notification. If the notifier does not yet have a notification, the node sleeps until the notification is sent. Then the loop repeats, proceeding to the second refnum, and so on. Whether the notification posted to a given notifier in the past or we wait for it to post doesn't affect any other notification. So we expect the loop to finish after all N notifications have arrived.

Observed result: The loop hangs. The W4N node doesn't return for one of the notifiers, as if the notifier did not have a notification. We know it does have a notification, but the node doesn't seem to be catching the message... almost as if the "ignore previous?" terminal had been wired with TRUE.

The explanation:

Each notification -- each message -- has a sequence number associated with it. Each W4N and W4MN node has memory of the sequence number for the latest notification that it returned. In the For Loop, we're testing N notifiers. Remember that these notifiers are getting notifications in a random order. So the node always returns the message for the first notifier. But the second notifier will return only 50% of the time -- depending randomly on whether or not it got its notification before or after the first notifier. If the second notifier got its notification before the first notifier, then the node will consider the notification in the second notifier to be stale -- the notification has a sequence number prior to the sequence number recored for this node.

The sequence number behavior is key to how the node knows not to return values that it has already seen. A given node does not record "sequence number for each refnum". Storing such information would be an expensive penalty to the performance and thus would be of negative value in most cases (the vast majority of the time the notifier nodes are used with either the same refnum every time or with cascading refnums that are accessed in a particular order or where lossy transmission is acceptable). In the cases where you need to hear every message from a set of notifiers, that's what the W4MN node is for -- it records the last message it heard, but when it listens again, it returns all the new messages.

In the particular example, the subVIs doing the posting of notifications are in a bunch of cloned copies of a reentrant VI. Each reentrant subVI waits for a single "start" notification, and then sends its "done" notification in response. The VIs are cloned using Open VI Ref and then told to run using the Run VI method. If we put a "Wait Milliseconds" prim wired with 50 after the Run VI method, the bug appears to go away. This makes the situation seem like a classic race condition bug -- slight delay changes behavior. In this case, adding the delay has a very important effect -- it makes sure that each reentrant subVI goes to sleep on the start notification IN ORDER. So that they wake up in order. So they send their done notifications in order. In other words, adding the delay removes the randomness of the notification order, and so our loop completes successfully.

And that's why this isn't a bug. ;)

Link to comment

Thanks for the comprehensive explanation. As you noticed I also guessed the correct answer just before your reply. Well, you helped me to guess.

And that's why this isn't a bug. ;)

Well, here is something that can make you consider it a bug. The problem can be avoided in this case by using the W4MN. But, what if I embed W4N node to a subVI to encapsulate some behaviour. Then this subVI always has only one node no matter from where it's called. As a result notification may become missed as the subVI taking care of catching notifications is used in multiple places of the application. I think W4MN doesn't solve this issue either as the notifications do not come from a single source and they therefore cannot be passed as an array.

Well, you may say that you can avoid the problem if you use reentrant VI. This is true if you can make all your VIs reentrant. But if you have at least one non-reentrant subVI in your nested subVI hierarchy you end up to have the notifier problem again

Well this has an implication to LabVOOP. If you decide to use OOP to hide your implementation of a system that uses notifiers you are very tempted to use dynamically dispatched class methods. If you do, notifier problem can cause troubles again as dynamically dispatched methods cannot be reentrant. Therefore you end up having only one W4N or W4MN node everywhere in your code. This single node then waits for possibly all the encapsulated notifications in your application. As these notifications can indeed arrive at random order you are very probably having a lock situation.

Did I explain myself clearly. I suppose you must agree that there really is a bug or at least unwanted feature. If I wasn't clear enough, I can provide you an example.

jimi

Link to comment

Aristos,

what if we have a reentrant VI, that acts on received notifiers, loaded N times using VI-server (separate notifier for each instance).

In this process the WFN node is encapsulated in a non-reentrant VI.

Could this scenario also hang due to that notifiers need to arrive in correct order?

The question is really if the W4N node is sharing the sequence-number-memory between the processes?

Do you see other scenarios were we could have this behaviour?

/J

The sequence number behavior is key to how the node knows not to return values that it has already seen. A given node does not record "sequence number for each refnum". Storing such information would be an expensive penalty to the performance and thus would be of negative value in most cases (the vast majority of the time the notifier nodes are used with either the same refnum every time or with cascading refnums that are accessed in a particular order or where lossy transmission is acceptable). In the cases where you need to hear every message from a set of notifiers, that's what the W4MN node is for -- it records the last message it heard, but when it listens again, it returns all the new messages.
Link to comment
If I wasn't clear enough, I can provide you an example.

Haa... gotcha. Here is my example. Class that tries to use notifier for what ever reason. Perhaps to communicate between different parts of the program. Two different notifiers created. Only two. Hangs always. W4MN doesn't help. This is defenitely expected behaviour from what you Aristos explained but I would like to call it a bug. You are responsible for both notifiers and LVOOP, should they work together. They do not work at the moment as you can immediately see.

post-4014-1160508902.png?width=400

An here is the project file as well.

Download File:post-4014-1160508999.zip

EDIT: I tested that if I embed the W4N or W4MN call to reentrant VI everything works ok. However if I embed this reentrant VI into a dynamic method we end up into a hang again as expected. So if you have only one non-reentrant VI in your call chaing to W4MN or W4N then you have good chances of failing to catch a notifier. Bug or not? I modified my project to present this behaviour. The modified project is below.

Download File:post-4014-1160510413.zip

EDIT 2: If you at NI want to keep both the performance but would also want the flexibility to allow the use the notifiers in subVIs, loops and possibly also other unsafe structures the I have the following suggestion. Modify the implementatio so that on compile time you examine the code. If current implementation is absolutely safe i.e. there is only one notifier originating from the current block diagram then use the present implementation of W4N and W4MN. On the other hand if the usage of these nodes is not unsafe, for example if the notifier could origination from front panel control or what ever way the same node can receive multiple different notifier references then at compile time use the implementation that stores the time stamp for each combination of notifier ref and notifier node.

Link to comment
So if you have only one non-reentrant VI in your call chaing to W4MN or W4N then you have good chances of failing to catch a notifier.

Exactly my point.

If a W4N node does not keep a per reference memory, we will see this in many places.

I experienced this in LV7.1.1, but did not have time to track down the bug (I actually implemented a LV2 global that acted as a notifier to solve it).

With Aristos explanation it makes sense, but it surely must be a bug if one notifier can prevent another from wake up, on completely different locations on the block diagram.

/J

Link to comment

I forgot to mention in my solution post yesterday that the correct synchronization tools to be using in this case is Rendezvous. Create a Rendezvous of size N and let each reentrant clone post that it has reached the Rendezvous, then proceed past that point. If you need to transmit data in this case, you can collect the data together in a common array (protect with Semaphore or functional global).

Link to comment
can you please comment the post from Jimi. The post where two parallell pieces of code hang, due to this notifer behaviour.

Do you really argue that this is not a bug?

If you do, wouldn't that be like saying that W4N nodes should not be put in non-reentrant VIs?

Sorry... took me a while to get back to this post.

I really do say this is not a bug. The notifiers were not designed to work in this case, in fact, they were explicitly designed against this case for performance reasons -- the bookkeeping of which message was seen last for each individual refnum is a hit that the vast majority of apps don't worry about. In the 5 versions of LV released so far with the notifier prims (not to mention several prior versions with the VIs-around-CIN-node), this is the first time that anyone has complained (to my knowledge) about the lack of support for this case. As for the question of whether or not the Wait for Notification behavior should change, I'd have to say "no" if for no other reason than we'd be breaking behavior going back multiple versions of LV.

As for the question of whether new prims could be added (a "Wait for Notification with Fine-Grain Memory"), it probably could be done, but I'd worry about the confusion of having two Waits in the palette... it'll require delicacy. I doubt that this will become a priority anytime soon, especially since other synchronization tools exist which I thnk can be hacked to do the work. I'll put it on the back burner of suggestions (which is actually a good place to be... there are good suggestions that have actually fallen off the back of the stove, rolled across the kitchen floor, out the cat door and are sitting in the garden -- LV gets a lot of good suggestions each day!)

Link to comment
I doubt that this will become a priority anytime soon, especially since other synchronization tools exist which I thnk can be hacked to do the work. I'll put it on the back burner of suggestions (which is actually a good place to be... there are good suggestions that have actually fallen off the back of the stove, rolled across the kitchen floor, out the cat door and are sitting in the garden -- LV gets a lot of good suggestions each day!)

Which one of these sync tools is the best for the task? It needs to be able to handle multiple different messages, it needs to be able to send a single message to multiple places.

It's weird that nobody has complained about this before. It only tells that LV is not used to large scale application development very much.

Link to comment
I really do say this is not a bug. The notifiers were not designed to work in this case, in fact, they were explicitly designed against this case for performance reasons -- the bookkeeping of which message was seen last for each individual refnum is a hit that the vast majority of apps don't worry about. In the 5 versions of LV released so far with the notifier prims (not to mention several prior versions with the VIs-around-CIN-node), this is the first time that anyone has complained (to my knowledge) about the lack of support for this case. As for the question of whether or not the Wait for Notification behavior should change, I'd have to say "no" if for no other reason than we'd be breaking behavior going back multiple versions of LV.

I think that the reason you haven't seen that many complaints is that it is hard to find out what happend.

Jimi's first post used a W4N node in a loop (should have been a single W4MN node instead, I know...), but it brought the issue to the surface. As for the explanation, you (and you are the "Perfect Queue" :D ) had to dig deep down, to really understand what was happening, no other Power user could come up with an explanation.

Maybe we could do a poll, and ask LabVIEW developers how they think a W4N node placed in a non-reentrant VI works, when called in two different places? I think the answer is obvious, since we specify a notifier by a reference, we also expect the two W4N calls to be separated. Changing the behaviour to a situation where references are being separated, shuldn't have any impact on older projects. At the most, programs would be using the correct timeout values, and get the expected behaviour, right?

Regarding the performance, I feel that the notifiers are behind the queue operations, e.g. using preview queue element instead of a W4N node (timeout = 0) is much faster. So maybe a speed bump is needed anyway?

Just my 2c

/J

Link to comment
I think that the reason you haven't seen that many complaints is that it is hard to find out what happend.

Jimi's first post used a W4N node in a loop (should have been a single W4MN node instead, I know...), but it brought the issue to the surface. As for the explanation, you (and you are the "Perfect Queue" :D ) had to dig deep down, to really understand what was happening, no other Power user could come up with an explanation.

Maybe we could do a poll, and ask LabVIEW developers how they think a W4N node placed in a non-reentrant VI works, when called in two different places? I think the answer is obvious, since we specify a notifier by a reference, we also expect the two W4N calls to be separated. Changing the behaviour to a situation where references are being separated, shuldn't have any impact on older projects. At the most, programs would be using the correct timeout values, and get the expected behaviour, right?

Oh, no need for a poll. We can all guess how they'd reply -- the same way we all replied. Yeah, I had to go drilling into this because I forgot the rules of a single node with different refnums. The current behavior is actually documented in the online help (not in detail, but it is there). It's just been a while since I looked into it (refactoring the notifiers was the very first project I had when starting at NI years ago).

This isn't so much a reentrant/non-reentrant issue. It's how does a single node that gets different refnums at different times behave. Jimi's original demo was a problem of a Wait for Notification inside a For Loop. Nothing about reentrancy there.

The current node always looks to the future -- this can actually be used to ensure that processes fire in a forward looking order. What looks like a bug in Jimi's two demos (either because each notifier only fires once or because two notifiers have dependencies upon each other) becomes a feature when notifiers are firing independently multiple times. I've had theories before that "If I change XYZ no older code will notice." I no longer maintain such fantasies, and changing the runtime behavior of the notifiers sets off big flashing alarm bells in my head. I'd rather try to add a new node or a terminal for specifying behavior on the current node.

Regarding the performance, I feel that the notifiers are behind the queue operations, e.g. using preview queue element instead of a W4N node (timeout = 0) is much faster. So maybe a speed bump is needed anyway?

Considering that they are exactly the same code, I'd be surprised that there's any speed differences. The notifiers are implemented as a single-element queue that simply overwrites its first element instead of locking on enqueue. It isn't like the code is cloned or anything -- they are litterally running the same code.

Link to comment
This isn't so much a reentrant/non-reentrant issue. It's how does a single node that gets different refnums at different times behave.

What I'm trying to say is that when you put a W4N node in a non-reentrant VI, all instances of that non-reentrant VI could hang due to that notifiers a fired in the wrong order/too close in time. So we would have a system that look exactly the same on the Block Diagram, but behaves completely different (depending on the reentrancy setting in the encapsulation VI).

This gets extremely difficult to debug, since the conflicting VI can be used in many places and deeply nested.

I'd rather try to add a new node or a terminal for specifying behavior on the current node.

Agreed, I was just arguing that the main change would be that code that hangs in the current implementation should no longer do so. I do think that the change would have minor impact on current usage, but better safe than sorry...

A new terminal would be fine, just don't add it as a right-click-to-configure feature.

Considering that they are exactly the same code, I'd be surprised that there's any speed differences.

Its been a while since I benchmarked them, and you are absolutely right :worship: . Notifiers are as fast as queues. I did notice that if I set timeout to 0 and there is no data available, this takes longer than when data are available. This applies to both queues and notifiers.

Link to comment

This discussion is important and it relates to very important general issues, namely code reuse and LabVIEW compile time safety mechanisms.

I think there is an issue in this discussion that nobody has yet mentioned. During the development time the developer doesn't know in which circumstances his/her module is used. Only in projects where there is only one developer, all the code reusage situations are somehow predictable. If the developer writes code that may be used by other as part of some software, notifiers are risky due to the possible program hang feature. So even though the notifiers function well in the case the uses them, the code reuse may lead to mysterious program hang in later times. Especially if one writes a library that may be used by third parties totally independet of the developer, one should not use notifiers due to the fact the library may mysteriously hang.

Then a second issue that is not discussed here yet. It's more general and not related to notifiers directly. I think one of the best features of LabVIEW is strong typing. I am not fan of variants, I'd prefer polymorphic types instead of variants. Strong typing guarantees that it is hard to create bugs as one cannot connect a wire of wrong type to the node. With W4N node it is possible to connect incompatible wire to the node as the developer gets no warning when he connects different notifiers to the same node. As a general issue, I think LabVIEW should be designed in the way that no node can accept wires that are of incompatible type. I would appreciate if NI would try to avoid this kind of primitives which could be easily used incorrectly. It's not even matter of documentation as most users don't read the documentation very well. The primitives should be intuitive and safe. I do not say that this safety should always override performance, there could be a right-click-menu option that would change the behaviour of the node. The default behaviour however should always be the safe behaviour and the non-safe high performance behaviour should be the non-default one.

What I suggest is not adding a new primitive to the language. I suggest modifying the existing primitive so that it has two possible compile time modes: the present mode and a new safe mode. When upgrading VIs to newer version of LabVIEW, the existing nodes would transform into this non-safe mode nodes. The default behaviour in LabVIEW 9.0 would however be the new safe-mode. There could be a red 8.2 watermark in the non-safe node to indicate the difference in behaviour and to encourage developers to check the help for the differences between the safe and non-safe modes.

-jimi-

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.