Producer/consumer and processing loads

Gary Rubin · October 13, 2009

Gurus,

I'm hoping someone here has some sage advice. Here's the situation:

Running on a Core2 Duo (i.e. dual core processor)

VI A dynamically launches VI's B and C.
VI B runs in a continuous loop and manages a DMA from a 3rd-party DSP board. It puts a subset of the data into Named Queue 1 and all of the data into a LV2 global.
VI A reads the data from the LV2 global, processes it, and puts the results in Named Queue 2 in Loop1. Loop2 flushes Named Queue 2 and transmits the contents via TCP/IP on physical port 1.
VI C runs in a continuous loop, flushes Named Queue 1 and transmits the contents via TCP/IP on physical port 2.

I am monitoring the intervals between VI C's outputs. Ideally, VI C should be putting out data every 15ms, regardless of what VI A is doing. In order to try to ensure this, I have done the following:

All shared VIs are reentrant
VIs A, B, and C are all assigned to different Execution Systems
VI's B and C are run without opening their front panel.

I've observed that when I run things with a typical processing load, the time delta between VI C's outputs gets very noisy, with spikes up in the 100s of ms. I can confirm that the issue is not related to the input traffic by disabling the processing stage of VI A. When I do this, I do see VI C's output every 15ms +/- a couple ms. I see the same thing with the processing enabled if I give it a very small processing load (leading to a very low output load).

To me, that points to two possible causes: 1) VI A's processing is monopolizing the system, preventing VI C's loop from running as often as I would like it to. 2) The fact that VI C and VI A are both using TCP/IP writes, although to different ports, is causing some sort of blocking.

Slowing down Loop1 in VI A considerably is not an option.

Any thoughts? Theoretically, What's going on in VI A should not affect timing of dataflow between VI's B and C, but that's what I'm seeing. Does anyone have any tricks they care to share?

Thanks,

Gary

Grampa_of_Oliva_n_Eden · October 13, 2009

Gurus,

I'm hoping someone here has some sage advice. Here's the situation:

Running on a Core2 Duo (i.e. dual core processor)

VI A dynamically launches VI's B and C.

VI B runs in a continuous loop and manages a DMA from a 3rd-party DSP board. It puts a subset of the data into Named Queue 1 and all of the data into a LV2 global.

VI A reads the data from the LV2 global, processes it, and puts the results in Named Queue 2 in Loop1. Loop2 flushes Named Queue 2 and transmits the contents via TCP/IP on physical port 1.

VI C runs in a continuous loop, flushes Named Queue 1 and transmits the contents via TCP/IP on physical port 2.

I am monitoring the intervals between VI C's outputs. Ideally, VI C should be putting out data every 15ms, regardless of what VI A is doing. In order to try to ensure this, I have done the following:
All shared VIs are reentrant

VIs A, B, and C are all assigned to different Execution Systems

VI's B and C are run without opening their front panel.

I've observed that when I run things with a typical processing load, the time delta between VI C's outputs gets very noisy, with spikes up in the 100s of ms. I can confirm that the issue is not related to the input traffic by disabling the processing stage of VI A. When I do this, I do see VI C's output every 15ms +/- a couple ms. I see the same thing with the processing enabled if I give it a very small processing load (leading to a very low output load).

To me, that points to two possible causes: 1) VI A's processing is monopolizing the system, preventing VI C's loop from running as often as I would like it to. 2) The fact that VI C and VI A are both using TCP/IP writes, although to different ports, is causing some sort of blocking.

Slowing down Loop1 in VI A considerably is not an option.

Any thoughts? Theoretically, What's going on in VI A should not affect timing of dataflow between VI's B and C, but that's what I'm seeing. Does anyone have any tricks they care to share?

Thanks,

Gary

Quick check to ensure the LV2 is not a bottleneck. Get rid of teh call to the LV and replace it with data of the same size and type. If ther jitter goes down it may be the LV2 is a bottleneck. NOTE: If you set your LV2 to sub-routine priority, you will pick-up a new call option "skip if Busy" that is very helpful in preventing one thread from hanging waiting on a LV2.

Ben

Gary Rubin · October 13, 2009

Quick check to ensure the LV2 is not a bottleneck. Get rid of teh call to the LV and replace it with data of the same size and type. If ther jitter goes down it may be the LV2 is a bottleneck. NOTE: If you set your LV2 to sub-routine priority, you will pick-up a new call option "skip if Busy" that is very helpful in preventing one thread from hanging waiting on a LV2.

Ben

Thanks Ben, I was not aware of the "skip if busy". That's pretty cool. I don't think it would do much in this case though.

The producer calls an LV2 in a loop, using a "put" state. The consumer calls the LV2 in a loop, using a "get" state. Normally, the consumer waits on the put state to finish running, and the producer waits on the get state to finish. If I were to use the Skip If Busy in the consumer, it wouldn't wait, but it also wouldn't get any data. Because it hasn't gotten any data, there's nothing for the consumer to do, so the loop iterates again and again, until the LV2 is no longer busy, right?

I guess I could see how this could have benefit when using the LV2 for asynchronously passing data into a loop that's busy doing something else that doesn't necessarily require that data every time, but for my case, I don't see it helping.

I will try what you suggested, but I'm pretty sure that the input data passing isn't the problem; I still get good behavior if I pass data in but don't process it.

Gary

ShaunR · October 13, 2009

Gurus,

I'm hoping someone here has some sage advice. Here's the situation:

Running on a Core2 Duo (i.e. dual core processor)

VI A dynamically launches VI's B and C.

VI B runs in a continuous loop and manages a DMA from a 3rd-party DSP board. It puts a subset of the data into Named Queue 1 and all of the data into a LV2 global.

VI A reads the data from the LV2 global, processes it, and puts the results in Named Queue 2 in Loop1. Loop2 flushes Named Queue 2 and transmits the contents via TCP/IP on physical port 1.

VI C runs in a continuous loop, flushes Named Queue 1 and transmits the contents via TCP/IP on physical port 2.

I am monitoring the intervals between VI C's outputs. Ideally, VI C should be putting out data every 15ms, regardless of what VI A is doing. In order to try to ensure this, I have done the following:
All shared VIs are reentrant

VIs A, B, and C are all assigned to different Execution Systems

VI's B and C are run without opening their front panel.

I've observed that when I run things with a typical processing load, the time delta between VI C's outputs gets very noisy, with spikes up in the 100s of ms. I can confirm that the issue is not related to the input traffic by disabling the processing stage of VI A. When I do this, I do see VI C's output every 15ms +/- a couple ms. I see the same thing with the processing enabled if I give it a very small processing load (leading to a very low output load).

To me, that points to two possible causes: 1) VI A's processing is monopolizing the system, preventing VI C's loop from running as often as I would like it to. 2) The fact that VI C and VI A are both using TCP/IP writes, although to different ports, is causing some sort of blocking.

Slowing down Loop1 in VI A considerably is not an option.

Any thoughts? Theoretically, What's going on in VI A should not affect timing of dataflow between VI's B and C, but that's what I'm seeing. Does anyone have any tricks they care to share?

Thanks,

Gary

What execution system is the LV2 global assigned to (same as caller???). There's probably a lot of context switching going on since you cannot encapsulate the global in a single execution system. You basically have a one to many architecture and I would partition it slightly different to take advantage of the execution systems.

VI A dynamically launches VI's B and C.
VI B runs in a continuous loop and manages a DMA from a 3rd-party DSP board. It puts ALL of the data into Named Queue 1 and Named Queue 2.
VI C runs in a continuous loop, flushes Named Queue 1, Extracts the bits it needs, then and transmits the contents via TCP/IP on physical port 2.
VI A reads the data from Named Queue 2,Extracts the bits it needs, processes it, then flushes Named Queue 2 and transmits the contents via TCP/IP on physical port 1.

VI B would run in (say) "Data Aquisition" at "High" Priority.

VI C would run in (say) "Other 1" at "Above Normal" priority.

VI A would run in (say) "Other 2" at" Normal" Priority.

This way you can give your vis hierarchical priorities to determine their reposivenes under loading. You could also get vi B to extract the bits and only put what is required for A and C on the queues (therefore simplifying A and C and reducing memory requirements at the expense of speed) if it has a light loading (if you want). The way described just makes vi B simple and very fast and context switching won't be an issue.

Edited October 13, 2009 by ShaunR

Gary Rubin · October 13, 2009

What execution system is the LV2 global assigned to (same as caller???). There's probably a lot of context switching going on since you cannot encapsulate the global in a single execution system. You basically have a one to many architecture and I would partition it slightly different to take advantage of the execution systems.

VI A dynamically launches VI's B and C.

VI B runs in a continuous loop and manages a DMA from a 3rd-party DSP board. It puts ALL of the data into Named Queue 1 and Named Queue 2.

VI C runs in a continuous loop, flushes Named Queue 1, Extracts the bits it needs, then and transmits the contents via TCP/IP on physical port 2.

VI A reads the data from Named Queue 2,Extracts the bits it needs, processes it, then flushes Named Queue 2 and transmits the contents via TCP/IP on physical port 1.

VI B would run in (say) "Data Aquisition" at High Priority.

VI C would run in (say) "Other 1" at High Priority.

VI A would run in (say) "Other 2" at normal Priority.

This way you can give your vis hierarchical priorities to determine their reposivenes under loading. You could also get vi B to extract the bits and only put what is required for A and B on the queues if it has a light loading (if you want). The way described just makes vi B simple and very fast.

Thanks Shaun,

The LV2 is set at subroutine priority and is therefore same as caller. I see how that would lead to context switching, but what does that actually mean? More overhead associated with calls to the LV2, so each caller spends a little bit more time waiting for it to become available?

I can certainly try replacing the LV2 with a queue - that's pretty quick edit. I feel like we tried that a while back, but that was before the existence of VI C.

The LV2 is set up as a lossy buffer, with an indicator telling us when it starts to drop data. I'd have to use a lossy queue, with another single element queue to pass the overflow status.

ShaunR · October 13, 2009

Thanks Shaun,

The LV2 is set at subroutine priority and is therefore same as caller. I see how that would lead to context switching, but what does that actually mean? More overhead associated with calls to the LV2, so each caller spends a little bit more time waiting for it to become available?

I can certainly try replacing the LV2 with a queue - that's pretty quick edit. I feel like we tried that a while back, but that was before the existence of VI C.

The LV2 is set up as a lossy buffer, with an indicator telling us when it starts to drop data. I'd have to use a lossy queue, with another single element queue to pass the overflow status.

Its not so much waiting for it to become available, it more to do with the CPU having to save state information between switching from the global in one context or another. Have a Google for "context switch" its a big subject. But suffice to say, the least, the better.

With what I have described above, you will never lose data. But the downside of that is that if vi B is producing faster than you are consuming then your queues will grow until you run out of memory. If this is a possibility (and undesirable) then all you need to do is "pause" vi B populating one or both of the queues when the queues are full (fixed length queues) and resume when A or B have caught up. Or (as you rightly say) use a lossy queue. The choice is really if you require sequential losses or random losses. But he above will enable you to easily change how you manage your processes with minimum effort and run most effeciently.

Gary Rubin · October 13, 2009

Its not so much waiting for it to become available, it more to do with the CPU having to save state information between switching from the global in one context or another. Have a Google for "context switch" its a big subject. But suffice to say, the least, the better.

So, that leads me to a question: Do reentrant VI's that are not explicitly set to use the caller's thread involve context switches? Put another way, if a reentrant VI is set to use the caller's thread, does it avoid a context switch?

Gary

ShaunR · October 13, 2009

So, that leads me to a question: Do reentrant VI's that are not explicitly set to use the caller's thread involve context switches? Put another way, if a reentrant VI is set to use the caller's thread, does it avoid a context switch?

Gary

Have a read of this....

http://books.google....itching&f=false

In particular 9.2.3 and (replace the word "Process" with "Execution System") and ask me again.

Gary Rubin · October 13, 2009

Have a read of this....

http://books.google....itching&f=false

In particular 9.2.3 and (replace the word "Process" with "Execution System") and ask me again.

Thanks,

Looks like I ought to get a copy of that book.

I still don't think that addresses the topic of reentrant vi's, at least not that I saw. I think the crux of my question was whether making a VI reentrant somehow overrides the execution system setting.

We're already told to use reentrant VI's when reusing the same VI in two parallel processes. If the answer to the previous question is "no", then do we need to avoid calling reentrant VI's from different threads? Or am I overthinking?

ShaunR · October 13, 2009

Thanks,

Looks like I ought to get a copy of that book.

I still don't think that addresses the topic of reentrant vi's, at least not that I saw. I think the crux of my question was whether making a VI reentrant somehow overrides the execution system setting.

We're already told to use reentrant VI's when reusing the same VI in two parallel processes. If the answer to the previous question is "no", then do we need to avoid calling reentrant VI's from different threads? Or am I overthinking?

Not at all.

Marking a vi as re-entrant means that a full copy of the "executing" code is instantiated in the calling process. This is for both types of re-entrant vi ("clone" and "same copy"). Since copies of the code exisit in the calling processes address space they can be run in parallel. The difference between "clone" and "same copy" is the datapsace. A re-entrant vi marked a "clone" has its own dataspace for every instance you lay down in a diagram. If marked as "same" then all the instancies have only 1 dataspace shared between them. In the case of a LV2 global. If you mark it as "clone" then if you call it from one location it will not contain the same data as if you call it from another, therefore it will only function as required when marked as "Same". However, in doing this you may cross execution system (ES) boundaries if the calling vis are in separate ones (if it is set to "same as caller") or you give it its own ES. And crossing an ES WILL cause a context switch.

By the way, this is all a bit moot if it doesn't resolve you problem

Edited October 13, 2009 by ShaunR

Gary Rubin · October 13, 2009

By the way, this is all a bit moot if it doesn't resolve you problem

True, but it's good stuff to know.

So, just to beat a dead horse:

If I have the reentrant subVI set to Same as Caller, then each instance will be in the thread of the caller and will therefore not require a context switch when called?

BTW, just bought the book.

ShaunR · October 13, 2009

True, but it's good stuff to know.

So, just to beat a dead horse:

If I have the reentrant subVI set to Same as Caller, then each instance will be in the thread of the caller and will therefore not require a context switch when called?

BTW, just bought the book.

Only if set to clone! Oh. And if Labview decided to do a context switch because its run out of threads in that execution system :rolleyes: The downside is all the dataspace allocated for each clone.

You've bought it? No printer then ...lol.

Edited October 13, 2009 by ShaunR

Gary Rubin · October 13, 2009

Only if set to clone!

Right. I had that in my head - just forgot to type it.

ShaunR · October 13, 2009

Right. I had that in my head - just forgot to type it.

So does it work better now?

Gary Rubin · October 13, 2009

So does it work better now?

Don't know - we have to share time on that system and my time is in the morning. I'll give the queue a try tomorrow AM.

Grampa_of_Oliva_n_Eden · October 13, 2009

Don't know - we have to share time on that system and my time is in the morning. I'll give the queue a try tomorrow AM.

Queues kick-arse particularly if you don't fork the wire fedding them. If you have to fork, let a down-stream thread do the forking.

Watch your CPU load while testing. If you have a proccessor or two left over you may be able to "divide and conquer" routine or if all of the CPU is used, then we should turn our concern to making to consumers eat faster.

Ben

Gary Rubin · October 14, 2009

Queues kick-arse particularly if you don't fork the wire fedding them.

Should I read into that statement that forking a queue reference makes a copy of the queue?

Grampa_of_Oliva_n_Eden · October 14, 2009

Should I read into that statement that forking a queue reference makes a copy of the queue?

I was refering to the wire that presents the data to the Enqueue node. AS I understand it (please correct anything I get wrong!).

A queue can trasfer data "in-place". The bufer hold ing the data when it is queued is the same buffer that holds the data when it is dequeud. This is only possible if the buffer is not subject to change as in the case where a single wire feeds a queue but is used for something else that LV thinks could change it. In that case the forked wire creates a new buffer so the queue can transfer in-place and the other code can do whatever. So the fork in the queueing code move the work of copying the bufer to that thread.

If you do not fork the wire, queue the data, then deque and fork to two additional queues (now I am getting carried away) the work of copying the buffer is now done in the recieving thread.

Just sharing how I understand it.

Feel free to correct me.

Ben

Gary Rubin · October 14, 2009

I was refering to the wire that presents the data to the Enqueue node. AS I understand it (please correct anything I get wrong!).

Oh, sorry - I misunderstood your comment. I thought you were referring to the queue refnum wire.

Edited October 14, 2009 by Gary Rubin

Sign In

Producer/consumer and processing loads

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation

Important Information