Queues Vs Ring Buffers

ShaunR · July 11, 2013

ShaunR: I'd like you to try a test to see how it impacts performance. I did some digging and interviews of compiler team. In the Read and the Write VIs, there are While Loops. Please add a Wait Ms primitive in each of those While Loops and wire it with a zero constant. Add it in a Flat Sequence Structure AFTER all the contents of the loop have executed, and do it unconditionally. See if that has any impact on throughput.

Done. MJE mentioned this previously and, until now I have resisted since I was more concerned with absolute speed performance. But it has nasty side effects on thread constrained systems (Dual cores etc) unless they can yield. You lose performance of using subroutine on constrained CPUs and spread obviously increases as we give labview the chance to interfere. But nowhere near as much as the queue or event primitives. So you see an increase in mean and STD but the median and quartile (which I think are better indicators as they are robust to spurii) remains pretty much unchanged.

I've also now modified the index reader in the write to be able to cope with an arbitrary number of multiple readers. We just need to think how we store and manage registration (a reader jumps on board when the writer is halfway through the buffer, for example).

Also added the event version of the test harnesses.

Version 3 is here. After this version I will be switching to a normal versioning number system (x.x.x style) as I think we are near the end of prototyping.

Rolf Kalbermatter · July 11, 2013

I've also now modified the index reader in the write to be able to cope with an arbitrary number of multiple readers. We just need to think how we store and manage registration (a reader jumps on board when the writer is halfway through the buffer, for example).

Well dynamic registration, unless you forbid a once registered reader to unregister, makes everything quite a bit complexer. Since then you can get holes in the index array that would then block the writer at some point or you have to do an additional intermediate refnum/index translater that translates the static refnum index that a reader gets when registering into a correct index into the potentially changing index array. I'm not sure this is worth the hassle as it may as well destroy any time benefits you have achieved with the other ingenious code.

ShaunR · July 11, 2013

Well dynamic registration, unless you forbid a once registered reader to unregister, makes everything quite a bit complexer. Since then you can get holes in the index array that would then block the writer at some point or you have to do an additional intermediate refnum/index translater that translates the static refnum index that a reader gets when registering into a correct index into the potentially changing index array. I'm not sure this is worth the hassle as it may as well destroy any time benefits you have achieved with the other ingenious code.

Indeed.I know the issues and have some ideas (easiest of which is to have a boolean in the index array cluster alongside the R and Cnt). Do you have a suggestion/method/example ?

Aristos Queue · July 11, 2013

Still need to propose a solution for handling abort. The DLL nodes have cleanup call backs that you can register, but that's not going to help us here. If you're going to do it in pure G, I think you have to launch some sort of async VI that monitors for the first VI finishing its execution and then cleans up the references if and only if that async VI was not already terminated by the uninit VI. Either that or the whole system has to move to a C DLL, which means recompiling the code for each target. Undesirable. Anyone have a better proposal for not leaking memory when the user hits the abort button?

Indeed.I know the issues and have some ideas (easiest of which is to have a boolean in the index array cluster alongside the R and Cnt). Do you have a suggestion/method/example ?

I do.

When you do Init, wire in an integer of the number of readers. Have the output be a single writer refnum and N reader refnums. If you make the modification to get rid of the First Call node and the feedback nodes and carry that state in the reader refnums, then you can't have an increase in the number of readers.

mje · July 11, 2013

When you do Init, wire in an integer of the number of readers. Have the output be a single writer refnum and N reader refnums. If you make the modification to get rid of the First Call node and the feedback nodes and carry that state in the reader refnums, then you can't have an increase in the number of readers.

I have a half-implemented version that does something similar. It works by pre-allocating an array of reader cursors representing the positions of each reader. The array size represents the max number of registered readers. All the cursors start as negative indicating nothing is registered for that index. The writer init returns with no registered readers. Reader initialization/registration involves finding an available cursor (negative), setting it to a valid starting value (either zero or some offset depending on where in the lifecycle the writer is). This has forced init/release mechanisms to use a lock to ensure atomic operations on the cursor array to avoid race conditions while doing the read-write sequence but looping on the cursors during enqueue/dequeue operations can happen without a lock as in Shaun's examples. Releasing a reader involves setting it's cursor negative, allowing that cursor index to be reused if necessary.

ShaunR · July 11, 2013

Still need to propose a solution for handling abort. The DLL nodes have cleanup call backs that you can register, but that's not going to help us here. If you're going to do it in pure G, I think you have to launch some sort of async VI that monitors for the first VI finishing its execution and then cleans up the references if and only if that async VI was not already terminated by the uninit VI. Either that or the whole system has to move to a C DLL, which means recompiling the code for each target. Undesirable. Anyone have a better proposal for not leaking memory when the user hits the abort button?

Yeah. Don't want to go to a DLL. Been there, done that, torn holes in all my T-shirts. Is there no way we can utilise the callbacks (I thought that was what the InstanceDataPointer was for).

I do.

When you do Init, wire in an integer of the number of readers. Have the output be a single writer refnum and N reader refnums. If you make the modification to get rid of the First Call node and the feedback nodes and carry that state in the reader refnums, then you can't have an increase in the number of readers.

That wouldn't work well for my use cases since I don't know how many I need up front (could be 10, could be 30 and may change during the program lifetime as modules are loaded and unloaded->plugins).

The rest (about first call etc) is valid. It really needs to be in the next layer up but as that doesn't exist yet, its in the current read and writes. What we (or more specifically, I) don't want is for them to be reliant on shift registers in the users program to maintain the state info. How that is handled will depend whether the next layer is class based or classic labview based which is why I havn't broached it yet. At that layer, I don't envisage a user selectable "Init". More that the init is invoked on a read or write depending on what gets there first (self initialising). I ultimately want to get to a point where you just slap a write VI down and slap read VIs wherever you need them (even in different diagrams) without having to connect wires (see my queue wrapper vi for how this works, although not sure how it will in this case, ATM, since the queue wrapper uses names for multiple instances).

ShaunR · July 12, 2013

I have a half-implemented version that does something similar. It works by pre-allocating an array of reader cursors representing the positions of each reader. The array size represents the max number of registered readers. All the cursors start as negative indicating nothing is registered for that index. The writer init returns with no registered readers. Reader initialization/registration involves finding an available cursor (negative), setting it to a valid starting value (either zero or some offset depending on where in the lifecycle the writer is). This has forced init/release mechanisms to use a lock to ensure atomic operations on the cursor array to avoid race conditions while doing the read-write sequence but looping on the cursors during enqueue/dequeue operations can happen without a lock as in Shaun's examples. Releasing a reader involves setting it's cursor negative, allowing that cursor index to be reused if necessary.

You can't use a negative number with this because the indexes can be negative (strange, I know, but it is to do with wrap-around).

I'm thinking that we could have a boolean with the R and Cnt (a block is converted to a cluster for processing-see the CB GetIndexes3.vi) which tells the index iterator (in the write) whether that index should be considered in the "lowest" check. This would have very little impact on performance (read a few more bytes in the block at a time and 2 AND operators.).

Then you would need a "registration manager". That would be responsible for finding out how many are currently registered, updating the registered count and choosing a slot which has a F flag to give to the reader that wants to register (not disimilar to what you are currently doing). The additional difference would be that it is also capable of resizing the array if there are not enough index slots It could even shrink the array and reorganise if we wanted to be really smart so that the index iterator doesn't process any inactive slots and do away with the choosing of an inactive slot altogether. This only ever needs to happen when something registers/unregisters so even if this is locking, it will not have much of an impact during the general running (now we are starting to get to the Disruptor pattern )

Edited July 12, 2013 by ShaunR

Aristos Queue · July 12, 2013

Hm... the Uninit is going to cause a race condition problem. If we just deallocate the pointers, everything crashes if there's a read still in progress. Either we add a mutex guard for delete -- which blows the whole point of this exercise -- or delete is something that can only execute when Write has written some "I'm done" sentinel value and the Reads have all acknowledge it.

And we have a second problem of guarding against two writes on the same refnum. They can't be proceeding in parallel but we have no guards against that.

Any proposals for how to prevent two writes from happening at the same time on the same refnum? (Or two reads on the same refnum?)

You can't use a negative number with this because the indexes can be negative (strange, I know, but it is to do with wrap-around).

Is there a maximum amount that the indexes can be negative and he can use even more negative numbers? Or some mathematical game like that?

GregSands · July 12, 2013

Version 3 is here.

With Version 3 my dual-core machine gives the following result when the Buffer size is smaller than the number of iterations:

There is still a long wait at an interval determined by the buffer size, but it doesn't usually occur from the start of the run. Both the iteration time and its variability seem to reduce once the long waits start.

Aristos Queue · July 12, 2013

. This only ever needs to happen when something registers/unregisters so even if this is locking, it will not have much of an impact during the general running (now we are starting to get to the Disruptor pattern )

But you would need to acquire the mutex in every read and every write just in case a resize tried to happen.

The only time when you know only one operation is proceeding is when the buffer is empty (all current readers have read all enqueued items) and the write operation is called (assuming we figure out a way to prevent multiple parallel writers on the same refnum, which is solvable if we can find some way to expose the atomic compare-and-swap instruction, which might require changes for LV 2014 runtime engine). That's the only bottleneck operation in this system. In that moment, the write operation could resize the read buffer and then top-swap a new handle for the read indexes into a single atomic instruction into the block of pointers. So if you have the Write VI able to have a mode for Resize (and likely for Uninit, as I mentioned in my previous post) then you could resize.

I don't see any other way to handle the resize of the number of readers. Anyone else attempting to swap out the pointer blocks faces the race condition of someone trying to increment a block and then having their increment overwritten by a parallel operation.

ShaunR · July 12, 2013

With Version 3 my dual-core machine gives the following result when the Buffer size is smaller than the number of iterations:
CB Test BUFFER_FP.png

There is still a long wait at an interval determined by the buffer size, but it doesn't usually occur from the start of the run. Both the iteration time and its variability seem to reduce once the long waits start.

In the writer there is a disable structure. If you disable the currently enabled (and enable the other) it will use the previous index reading method. Does this affect the result you see?

ShaunR · July 12, 2013

Hm... the Uninit is going to cause a race condition problem. If we just deallocate the pointers, everything crashes if there's a read still in progress. Either we add a mutex guard for delete -- which blows the whole point of this exercise -- or delete is something that can only execute when Write has written some "I'm done" sentinel value and the Reads have all acknowledge it.

And we have a second problem of guarding against two writes on the same refnum. They can't be proceeding in parallel but we have no guards against that.

Any proposals for how to prevent two writes from happening at the same time on the same refnum? (Or two reads on the same refnum?)

I don't understand what you are getting at here. Refnums?

The disruptor pattern can cope with multiple writers (ours currently can't). They use a two stage process where a writer "claims" a slot before writing to it. What's the problem with having multiple readers? (isn't that the point!)

We effectively have a reference counter (number of readers) and we have a method of skipping in the writer (the bool field of the pointer indexes which the readers could also use so they don't go ahead and read data-the manager only manipulates that field) so we would only deallocate the pointer when the last is unregistered. The deinit would effectively iterate through unregistering all the readers (it becomes an "unregister all readers" then kill pointer rather than just a pointer killer). I think we just need a NOP for the readers (they need to not read until the registration manager has given them an index slot and they know the current cursor position) Don't know. Just thinking on my feet at the moment but "seems" doable and I think it might give us the same sort of feature as the queues when the handle is destroyed (error out)..

Is there a maximum amount that the indexes can be negative and he can use even more negative numbers? Or some mathematical game like that?

Not really. I have visions of fixing the i64 cursor wrap-around problem mathematically by using the fact that it goes negative but still increases (gets less negative, minus + minus = plus, sort of thing). As I haven't fixed it yet, you could use a negative number as you will hit the same problem at the same point and everything starts at zero.

But you would need to acquire the mutex in every read and every write just in case a resize tried to happen.

The only time when you know only one operation is proceeding is when the buffer is empty (all current readers have read all enqueued items) and the write operation is called (assuming we figure out a way to prevent multiple parallel writers on the same refnum, which is solvable if we can find some way to expose the atomic compare-and-swap instruction, which might require changes for LV 2014 runtime engine). That's the only bottleneck operation in this system. In that moment, the write operation could resize the read buffer and then top-swap a new handle for the read indexes into a single atomic instruction into the block of pointers. So if you have the Write VI able to have a mode for Resize (and likely for Uninit, as I mentioned in my previous post) then you could resize.

I don't see any other way to handle the resize of the number of readers. Anyone else attempting to swap out the pointer blocks faces the race condition of someone trying to increment a block and then having their increment overwritten by a parallel operation.

Well. I think we need to look at the Disruptor code again to see how they handle it (they probably have the same problem). If we can't think of a way to be able to reorganise, we can at least dynamically increase as that doesn't overwrite existing indexes; just adds new ones and disables old ones (haven't seen a realloc but have seen functions to resize handles......somewhere). So we could create, say 5 at a time and increase in blocks. We are back to the MJE kind of approach which I first outlined with the flags then. (I did say IF we were smart )

I think this is where we need to revisit the disruptor pattern details. They talk about consumer barriers, claiming slots and getting the "highest" index rather than just seeing if it is greater (as we currently do). I think we have the circular buffer nailed. It's the management and interfaces we are beginning to brainstorm now and they must have solved these already.

GregSands · July 12, 2013

In the writer there is a disable structure. If you disable the currently enabled (and enable the other) it will use the previous index reading method. Does this affect the result you see?

No, the results are essentially the same.

ShaunR · July 12, 2013

No, the results are essentially the same.

Hmmm. OK. The Elapsed Time.vi is set to subroutine. Does setting it to "Normal" help? (I don't have access to a dual core ATM)

Aristos Queue · July 12, 2013

What's the problem with having multiple readers? (isn't that the point!)

I mean two calls to Read that both use the same number for the ID input.

I don't understand what you are getting at here. Refnums?

I assume that this block of pointers that we're creating would be encapsulated as a single reference datatype that, from the point of view of a user of the API, would look like a single refnum. Put a class wrapper around that cluster of pointers, make the wire 1 pixel cyan. Refnum is simply a shorthand for "cluster of pointer values".

But then there's the question of adding an actual refnum lookup layer, because of this...

We effectively have a reference counter (number of readers) and we have a method of skipping in the writer (the bool field of the pointer indexes which the readers could also use so they don't go ahead and read data-the manager only manipulates that field) so we would only deallocate the pointer when the last is unregistered. The deinit would effectively iterate through unregistering all the readers (it becomes an "unregister all readers" then kill pointer rather than just a pointer killer). I think we just need a NOP for the readers (they need to not read until the registration manager has given them an index slot and they know the current cursor position) Don't know. Just thinking on my feet at the moment but "seems" doable and I think it might give us the same sort of feature as the queues when the handle is destroyed (error out)..

How do you keep this VI from crashing?

How do you know that a Read is not happening at the same time that Deinit is called? Deinit can't unregister the readers if a read operation is in progress because if it destroys any of the pointer blocks, you'll crash the Read. How does a Read know to return an error if it starts working on a refnum that has been unregistered? If all you do is pass in a block of pointers, those pointers could all be deallocated and, again, you'll crash when you try to use them. This is why LV structures like this go through a refnum layer where we guarantee that a refnum that has been destroyed is not going to come back into use so that a read knows to return an error on a deallocated refnum. I don't see any way to have a separate Deinit function without an actual refnum layer guarding your allocated pointer blocks.

Nothing that I have read about the Disruptor guards against such abusive usages. They assume that you just wouldn't code a call to Deinit while a Read is still running. That's not something that a general API can rationally assume. Now, in some languages, crashing might be a perfectly acceptable answer, with documentation that if you get this crash, you coded something wrong, but we try to avoid that in LabVIEW.

ShaunR · July 12, 2013

I mean two calls to Read that both use the same number for the ID input.

Ah. Yes. IC.

Here's what I think we could do.......

The registration manager knows how many and which ones are allocated. It will pass back an available ID index to the reader as part of the registration process (the ID is just an offset from the base pointer in # blocks). When there is a registration request, it scans the booleans in the reader index array (no locks required) and passes the first F it comes across. If all are in use, it would increase the size of the array (setting all new values to F and indexes to the current cursor position) then give the next ID to the reader. The reader then uses that ID and starts its counter at the set cursor position. At this point the manager sets the boolean field for the newly registered reader to T and sets the reader count to +1. The next scan by the write will now iterate through the newly increased size and pick up that the new readers bool is T.

I assume that this block of pointers that we're creating would be encapsulated as a single reference datatype that, from the point of view of a user of the API, would look like a single refnum. Put a class wrapper around that cluster of pointers, make the wire 1 pixel cyan. Refnum is simply a shorthand for "cluster of pointer values".

But then there's the question of adding an actual refnum lookup layer, because of this...

Why can't the pointer cluster actually be the private data of a class? I don't see any reason for refnums if it's going in a class. Instantiating the class creates all the pointers (well. there is no constructor in LV, so until we get my atomic reads and write, they will probably call an init). If the block of pointers you are talking about is to do with the ID. Then we don't need any since the ID number is sufficient and the user will have no idea which IDs are being used as that's internal (as described previously)

How do you keep this VI from crashing?

Screen shot 2013-07-12 at 8.22.08 AM.png

How do you know that a Read is not happening at the same time that Deinit is called? Deinit can't unregister the readers if a read operation is in progress because if it destroys any of the pointer blocks, you'll crash the Read. How does a Read know to return an error if it starts working on a refnum that has been unregistered? If all you do is pass in a block of pointers, those pointers could all be deallocated and, again, you'll crash when you try to use them. This is why LV structures like this go through a refnum layer where we guarantee that a refnum that has been destroyed is not going to come back into use so that a read knows to return an error on a deallocated refnum. I don't see any way to have a separate Deinit function without an actual refnum layer guarding your allocated pointer blocks.

Nothing that I have read about the Disruptor guards against such abusive usages. They assume that you just wouldn't code a call to Deinit while a Read is still running. That's not something that a general API can rationally assume. Now, in some languages, crashing might be a perfectly acceptable answer, with documentation that if you get this crash, you coded something wrong, but we try to avoid that in LabVIEW.

Well. I'm not part of the "Crashing is an acceptable behaviour" club. They are idiots (although I think you have a couple over there at NI when it comes to CLFNs

)

At worst, we could unregister all the readers (set all the booleans to false) then wait 2 weeks to make sure everything has had a chance to read the booleans and exit before we finally crowbar the memory. But I'm sure we can come up with something better than that You're thinking a bit ahead of me at the moment. I tend to think in chunks with a bit of prototyping for feasability (iterative development). I'm only just starting to formulate details about the registration let alone about the API itself.

What do you suggest?

Aristos Queue · July 12, 2013

A more complete rendition of what's in my head...

You currently have a cluster of pointers. We can move those into a class. We can then make the class look like a reference type (because that's exactly what it is). That's good. And that's all we need to do IF we can solve the "references are not valid any more" problem. The ONLY way I know to do that is with some sort of scheme to check a list to see if the pointers are still valid coupled with a way to prevent a recently deallocated number from coming back into use. That's what LabVIEW's refnum scheme provides. Without such a scheme, a deallocated memory pointer comes right back into play -- often immediately because the allocation blocks are the same size, so those are given preference by memory allocation systems. Thus any scheme like this in my view has to add a refnum layer between it and the actual pointer blocks. The list of currently valid pointers is guarded with a mutex -- every read and write operation at its start locks the list, increments the op count if it is still on the list, releases the mutex, does its work, then acquires the mutex again to lower the op count. The delete acquires the mutex, sets a flag that says "no one can raise the op count again", then releases the mutex and waits for the opcount to hit zero then throws away the pointer block.

That's my argument why we need a refnum layer. There might be a way to implement this without that layer, but I do not know what that would be.

PS: Even if we don't need a refnum layer and we find a way to do this with just the pointers stored in a class' private data, when the wire is cyan and single pixel, many people will still refer to it as a refnum (often myself included) because references are refnums in LabVIEW. The substitution is easy to make. Just sayin'. :-)

They are idiots (although I think you have a couple over there at NI when it comes to CLFNs )

We offer you the option of crashing because if you're code is well written, it is faster to execute than for us to guard against the crash. You're free to choose the high performance route. ;-)

ShaunR · July 13, 2013

A more complete rendition of what's in my head...

You currently have a cluster of pointers. We can move those into a class. We can then make the class look like a reference type (because that's exactly what it is). That's good. And that's all we need to do IF we can solve the "references are not valid any more" problem. The ONLY way I know to do that is with some sort of scheme to check a list to see if the pointers are still valid coupled with a way to prevent a recently deallocated number from coming back into use. That's what LabVIEW's refnum scheme provides. Without such a scheme, a deallocated memory pointer comes right back into play -- often immediately because the allocation blocks are the same size, so those are given preference by memory allocation systems. Thus any scheme like this in my view has to add a refnum layer between it and the actual pointer blocks. The list of currently valid pointers is guarded with a mutex -- every read and write operation at its start locks the list, increments the op count if it is still on the list, releases the mutex, does its work, then acquires the mutex again to lower the op count. The delete acquires the mutex, sets a flag that says "no one can raise the op count again", then releases the mutex and waits for the opcount to hit zero then throws away the pointer block.

That's my argument why we need a refnum layer. There might be a way to implement this without that layer, but I do not know what that would be.

As you know. Any mutexes and we will be back to square one.

I'm still not quite getting it. Why do we need an "op count"?. All pointers are considered valid until we unregister all readers and writers. Unregistering a single reader amongst multiple readers doesn't mean we need to free any pointers as there are only three (the buffer, the reader index array and the Cursor). The only time we deallocate anything is when there is no longer any readers or writers at which point we deallocate all pointers.

Now. We already have a list of readers (the booleans set to true in the index array) and we know how many (reader count). The adding and removing of readers, i.e. the manipulation of the booleans and the count, is "locked" by the registration manager (non-reentrant VI boundary). So I see the "issue" as how do we know that any asynchronous readers on the block diagram have read whatever flags and exited and therefore the writer can now exit and pointers can be deallocated (the writer only exits when there are no readers (race condition on startup? Will have to play......).

The writer won't read any reader booleans since by this time the registration manager has set the reader count to zero (this order will change with my proposal below since it will exit before it is zero). So it goes into a NOP state and exits. It can set a boolen that says "I have no readers and have exited". The registration manager already knows there are no readers, so it just needs to know the writer has exited before deallocating pointers.

The readers only need to watch their flag to see if the registration manager has changed it and go into a NOP state and exit. At this point I can see there might be a scenario whereby the reader has read it's flag (which is still OK) and by the time it gets to reading buffers and writing indexes the writer has said "I have no readers and have exited". Well. Lets put another flag in the reader index array that says "I'm active" which is just a copy of the registration managers boolean but only written once all memory reads have completed and causes an exit . Now we have a situation where deallocation can only occur if the booleans controlled by the registration manager are all false (or true depending on which sense is safer) AND the "I am active" booleans controlled by the individual readers are all false (ditto sense) AND the writer has said "I have no readers and have exited". This decomposes into just the writer saying "I have no readers and have exited" as the writer can read both fields whilst it is doing it's thing (it has to read the entire block anyway), AND them together and exit when they are all false (sense again) and set the "I have no readers and have exited".

So in the end-game. The registration manager unsets all the registered booleans, waits for the "I have no readers and have exited" boolean then deallocates the pointers. Does this seem reasonable?

PS: Even if we don't need a refnum layer and we find a way to do this with just the pointers stored in a class' private data, when the wire is cyan and single pixel, many people will still refer to it as a refnum (often myself included) because references are refnums in LabVIEW. The substitution is easy to make. Just sayin'. :-)

Being colour-blind (colour confused is a better term). I really have no opinion on this

We offer you the option of crashing because if you're code is well written, it is faster to execute than for us to guard against the crash. You're free to choose the high performance route. ;-)

Well. Get rid of the "Check For Errors" page then (I've never had an error out via this route since about LV 7.1)

Aristos Queue · July 13, 2013

You Init. Then you Uninit. Then you take that same already-uninitialized pointer block and wire it into a Read or a Write. How does the Read or Write know that the pointer block has been deallocated and it should return an error?

So in the end-game. The registration manager unsets all the registered booleans, waits for the "I have no readers and have exited" boolean then deallocates the pointers. Does this seem reasonable?

Nope... there's no registered Booleans to check if the pointers have been deallocated. So to implement this solution, we would have to say that there's an Init but once allocated, the pointer at least to the Booleans needs to stay allocated until the program finishes running. Otherwise the first operation that tries to check the Booleans will crash. Are you ok with a "once allocated always allocated" approach?

There's still the problem of setting those Booleans. We'll need a test-and-set atomic instruction for "reader is active" -- I don't know of any way to implement that with the current APIs that LabVIEW exposes.

Aristos Queue · July 13, 2013

Let's see...

We have Read 0, Read 1, Writer, and Deinit all executing at the same time.

Here's one possible execution order...

Deinit sets the "NowShuttingDown" bit to high. Then Deinit does an atomic compare-and-swap for the WriteIsActive bit (in other words, "if WriteIsActive bit is false, set it to true, return whether or not the set succeeded". It succeeds. Writer now tries to do an atomic compare-and-swap for the WriteIsActive, discovers the bit is already high and so returns an error -- it decides between "WriterAlreadyInUse" and "ReferenceIsStale" by checking the "NowShuttingDown" bit. Then Read1 does an atomic compare-and-swap on Read1IsActive. DeInit then tries to do the same and discovers the bit is already high, so it goes into a polling loop, waiting for the bit to go low. Read1 finishes its work and lowers the bit. Possibly this repeats several times because LV might be having a bad day and we might get around to several calls to Read1 before the timeslice comes down right to let the DeInit proceed (possible starvation case; unlikely since Writer has already been stopped, but it highlights that had this been the Writer that got ahead of DeInit, we might keep writing indefinitely waiting for the slices to work out). But let's say eventually DeInit proceeds and sets Read1IsActive high. The next Read1 that comes through errors out. Having now blocked all the readers and the writer, DeInit deallocates the buffer blocks and index blocks, but not the block of Boolean values. Any other attempts to read/write will check the Booleans, find that the ops are already in use and then check the NowShuttingDown bit to return the right error. (Note that they can't check NowShuttingDown at the outset because they do not set that bit, which means there'd be an open race condition and a reader might crash because DeInit would throw away its buffer while it is reading.)

The situation above is pretty standard ops control provided you know that the Boolean set will remain valid. If you're ok with leaving that allocation in play for as long as the application runs (without reusing it the next time Init gets called -- once it goes stale it has to stay stale or you risk contaminating pointers that should have errored out with the next iterations buffers) then I think this will work.

ShaunR · July 13, 2013

You Init. Then you Uninit. Then you take that same already-uninitialized pointer block and wire it into a Read or a Write. How does the Read or Write know that the pointer block has been deallocated and it should return an error?
Screen shot 2013-07-12 at 10.37.50 PM.png

As it stands. Yes. The pointer cluster is "on-the-wire" and when we deinit we just leave the cluster alone. But it doesn't have to be (just makes it easier to debug). If I were to start on a "Classic LabVIEW" API, I would probably also shove the pointer cluster into memory and you wouldn't need a "reference" wire at all, hell, even a global variable to contain them (no "not allocated" issues then). With the former, I might use the "CheckPointer" call if it wasn't too expensive and it does what I think it does as belt and braces with a check for null pointer on the cluster (if the cluster doesn't exist then neither do the others and vice versa). But if you are looking at classes I thought you would want to manage all that in the class.....somehow. If I put everything into memory and handle all the scenarios, there isn't much point in a class at all apart from making people feel warm and fuzzy about OOP.

I think the issue you are probably running into is that you are finding the best means to achieve what you need in a class is a DVR. But you can't use them because of the locking overhead. If you find you are looking to a DVR,, then the structure needs to be via the MM functions as they are basically just a method to provide the same function, but without the DVRs locking overhead. Anything else can "probably" be in the class.

Nope... there's no registered Booleans to check if the pointers have been deallocated. So to implement this solution, we would have to say that there's an Init but once allocated, the pointer at least to the Booleans needs to stay allocated until the program finishes running. Otherwise the first operation that tries to check the Booleans will crash. Are you ok with a "once allocated always allocated" approach?

I would be "happy for now". I don't think the unallocated pointers are an insurmountable issue and at worst we just need some state flags. I would come back to it on another iteration to see what needs to change and what the performance impact is of a million flags all over the place.

There's still the problem of setting those Booleans. We'll need a test-and-set atomic instruction for "reader is active" -- I don't know of any way to implement that with the current APIs that LabVIEW exposes.

I don't think we do (need a test 'n set). The premise of the pattern is mutual exclusion through memory barriers only. As long as you only have one writer to any location then no test and set is necessary. As we have solved the issue of accessing individual elements in arrays without affecting others or locking the rest of the array, all we need to ensure is that write responsibility is well defined (only one writer to a single location or block). The only time a test 'n set would be required is if we couldn't guarantee atomic reads and writes of the individual bits (PPC?).

As an aside. Anecdotally, it seems writing all contents to a cluster is an atomic operation in labview and incurs a marginal overhead as opposed to accessing the memory locations independently. The overhead is miniscule in comparison to the extra library calls required to achieve the latter. If this can be confirmed, then it is the fastest way to add atomicity to entire blocks of locations when needed. This is one reason why I use a cluster for the elements in the index array..

Edited July 13, 2013 by ShaunR

Rolf Kalbermatter · July 13, 2013

Well. I'm not part of the "Crashing is an acceptable behaviour" club. They are idiots (although I think you have a couple over there at NI when it comes to CLFNs )

While I'm in the club of trying to avoid crashing whenever possible I find a catch all exception handler that simply catches and throws away exceptions an even less acceptable solution. An exception handler is ok for normal errors where you can have enough information from the exception cause itself to do something about such as retrying a failed operation or such. But it is ABSOLUTELY and DEFINITELY unacceptable for exceptions like invalid pointer accesses. If they happen I want to know about them as soon as possible and have done as little as possible afterwards. As such I find the option in the CLN to actually just continue after such exceptions highly dangerous. There are many people out there who believe that writing less than perfect external code is just fine and just let LabVIEW catch the exception and happily go on. An invalid pointer access or any other error that is caused by this later on (writing beyond a buffer often doesn't cause an immediate error since the memory location is already used by something else in the app and as such completely valid as far as the CPU and OS is concerned) needs to stop the program immediately, finito! There is no excuse for trying to continue anyways. Blame on whoever wrote that crashing code but you do not want LabVIEW to eat your harddrive and what else!

I don't think we do (need a test 'n set). The premise of the pattern is mutual exclusion through memory barriers only. As long as you only have one writer to any location then no test and set is necessary. As we have solved the issue of accessing individual elements in arrays without affecting others or locking the rest of the array, all we need to ensure is that write responsibility is well defined (only one writer to a single location or block). The only time a test 'n set would be required is if we couldn't guarantee atomic reads and writes of the individual bits (PPC?).

If you talk about bits in any form of integer then write access to them is not atomical on any CPU architecture I know off. Even bytes and shorts are highly suspicious even on x86 architecture, since the memory transfer traditionally always happend in 32 bit quantities (and nowadays even in 64 or even 128 bit quantities). Some reading material I consulted suggests that write access to anything but aligned integers (and on 64 bit architectures aligned 64 bit integers) is not guaranteed to be atomical on just about any CPU architecture out there. I can't be sure, and am in fact to lazy to make a long research project about this, so I employ in all my C code cmpxchg() operations when wrriting bits and bytes to structure elements that are not integer aligned or not guaranteed to not share bytes inside the aligned integer adres with other variables (unless of course I can proof positively that there will never be anyone else trying to write to the same integer in any form and in C that means that the routine writing to that address also has to be at least protected or single threaded).

ShaunR · July 13, 2013

Ooooh. I missed this bit.

I'm not sure if I am now answering my modified readers list suggestion with the managers bit AND the readers bit or not. So I'l plough ahead with the assumption that this is in response to that scenario (apologies if that's not the case)

Let's see...

We have Read 0, Read 1, Writer, and Deinit all executing at the same time.

Here's one possible execution order...

Deinit sets the "NowShuttingDown" bit to high. Then Deinit does an atomic compare-and-swap for the WriteIsActive bit (in other words, "if WriteIsActive bit is false, set it to true, return whether or not the set succeeded". It succeeds. Writer now tries to do an atomic compare-and-swap for the WriteIsActive, discovers the bit is already high and so returns an error -- it decides between "WriterAlreadyInUse" and "ReferenceIsStale" by checking the "NowShuttingDown" bit. Then Read1 does an atomic compare-and-swap on Read1IsActive. DeInit then tries to do the same and discovers the bit is already high, so it goes into a polling loop, waiting for the bit to go low. Read1 finishes its work and lowers the bit. Possibly this repeats several times because LV might be having a bad day and we might get around to several calls to Read1 before the timeslice comes down right to let the DeInit proceed (possible starvation case; unlikely since Writer has already been stopped, but it highlights that had this been the Writer that got ahead of DeInit, we might keep writing indefinitely waiting for the slices to work out). But let's say eventually DeInit proceeds and sets Read1IsActive high. The next Read1 that comes through errors out. Having now blocked all the readers and the writer, DeInit deallocates the buffer blocks and index blocks, but not the block of Boolean values. Any other attempts to read/write will check the Booleans, find that the ops are already in use and then check the NowShuttingDown bit to return the right error. (Note that they can't check NowShuttingDown at the outset because they do not set that bit, which means there'd be an open race condition and a reader might crash because DeInit would throw away its buffer while it is reading.)

The situation above is pretty standard ops control provided you know that the Boolean set will remain valid. If you're ok with leaving that allocation in play for as long as the application runs (without reusing it the next time Init gets called -- once it goes stale it has to stay stale or you risk contaminating pointers that should have errored out with the next iterations buffers) then I think this will work.

No. The Read 1 and DeInit are not operating on the same memory location. Therefore there is no requirement for CAS and as only one writer can write to any bit, there is no race condition. Each writes to it's own bit associated with each reader (the regman writes to several locations in the list but it is still the only writer to those locations. The readers each handle their own "active" bit so they each are still the only writer for their bit in the list). The writer only reads both bits for all readers to determine.

a) Is this reader to be included in the lowest check (only the registration managers bit is important for this)

b) Are there any readers at all (all the regmans bits are false AND all the readers' bits are false)->exit and set "Finished".

In this scenario, it is the writer that exits signals everything is OK to kill the pointers (it is monitoring all the bits), not deinit. Deinit unregisters all the readers then waits for the "finished" bit to go high then it deallocates. The"active" bit in each reader is a proxy for the regmans bit in that the regmans bit is read and then the value written to the "active" bit for that reader on exit (no other reads/writes happen after) When the "finished" bit goes high all readers and the writer have already exited and will not be reading/writing to any pointers.

I'm hoping to have a knock-up of the registration over the weekend to test the idea and see what I run into (depends how drunk I get Saturday night-hangovers don't get worse as you get older, they just get longer ).

Edited July 13, 2013 by ShaunR

Aristos Queue · July 14, 2013

No. The Read 1 and DeInit are not operating on the same memory location.

Um... yes, they do. DeInit deallocates all of the pointer blocks allocated by Init. In the Version 2 implementation, Read uses Cursor and Array. I haven't looked at Version 3, but I would expect the number of allocations to increase (per our other discussions here) . The Version 2 implementation will crash. It isn't crashing right now in the code I posted above because LV happily goes ahead and uses deallocated memory in those DLL calls and no other part of LV does an allocate during that window. As soon as you have a program where the program proceeds to allocate data for something else, the whole system falls down and goes boom.

Yes, we need a CAS to allow DeInit to work.

Aristos Queue · July 14, 2013

[Later] Vesion 3 is still using the First Call and the feedback nodes. This is causing your program to hang when you call Init a second time. Just throw a For Loop around your test program and wire a 2 to the N terminal. Run the VI. It won't finish. The First Call value and the Feedback Node value have to be part of the pointer block. End of story.

Also, you crash if a pointer block that has never been initialized is passed in... need to check for NULL on every call. Test this by just hitting Run on your Read or your Write VI. LV goes away. You've got guards on DeInit ... need same guards on read/write.

I fixed both the hang and the crash on never initialized.

Found another crash scenario -- DeInit called twice on the same pointer block. I can't do anything about that or the crash caused by DeInit being called while Read or Write are still running unless/until we figure out how to generate a CAS instruction or some other equivalent atomic instruction.

Circular_BufferV4_AQ.zip

Queues Vs Ring Buffers

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

ShaunR

mje

ShaunR

Posted Images

Join the conversation

Important Information