Mark Zvilius Posted July 4, 2010 Report Share Posted July 4, 2010 I have a producer-consumer app that uses a queue. The data items are rather large matrixes .. on the order of 300kB. Because of the size of the matrixes, I've become interested in optimizing the number of data copies that are performed. I wrote some test code and simplified it down to the following, which uses a 10x10 matrix as the data item. I then used the "Show Buffer Allocations" tool to look at data copies (note the black dots on Initialize Array and Dequeue Element). Questions: 1. Is it really true that a copy is made on the Dequeue operation? That surprises me since there can be only one destination for the queue element. 2. Would I be able to eliminate the copy by wrapping the matrix in a DVR and passing the DVR on the queue? Would the performance hit of using the reference negate the savings? (I'm not sure if I would actually do this; it's more of an learning exercise at the moment.) I appreciate your input, Mark Zvilius Quote Link to comment
jdunham Posted July 5, 2010 Report Share Posted July 5, 2010 1. Is it really true that a copy is made on the Dequeue operation? That surprises me since there can be only one destination for the queue element. Oh yeah, you are certain to get a copy there. You have that matrix on two totally separate wires on your diagram, so a copy is needed for each wire. Now in your trivial case, those two wires will never execute at the same moment (probably!) but if you make the diagram a little more complicated then they could execute at the same time and would absolutely need to have separate copies in memory of whatever might be in the two wires. 2. Would I be able to eliminate the copy by wrapping the matrix in a DVR and passing the DVR on the queue? Would the performance hit of using the reference negate the savings? (I'm not sure if I would actually do this; it's more of an learning exercise at the moment.) This seems like a textbook case for DVRs, assuming you really need the producer-consumer pattern. I don't know too much about the internal implementation of DVRs, but it seems likely that dereferencing the DVR would be a very fast operation, and not something to worry about if your app is also throwing around 300kB arrays. Note that if you branch the wire of the matrix after dereferencing it, LV may still need to copy your data. Jason Quote Link to comment
Mark Smith Posted July 6, 2010 Report Share Posted July 6, 2010 Oh yeah, you are certain to get a copy there. You have that matrix on two totally separate wires on your diagram, so a copy is needed for each wire. Now in your trivial case, those two wires will never execute at the same moment (probably!) but if you make the diagram a little more complicated then they could execute at the same time and would absolutely need to have separate copies in memory of whatever might be in the two wires. I don't think this is correct - the only wire I see branched is the queue ref, so this ensures you get a copy of the queue ref, not the queue element. I'm not privy to the internals of the queue implementation, but I think I do recall that I have read it is a protected implementation of shared memory. And I don't think the queues could execute at exactly the same time since a mutex or some such mechanism protects them so a queue read or write is always an atomic operation - that's the only way one could guarantee the FIFO behavior. Also, I built a simple example following the posted diagram - I get a buffer allocation on the dequeue if it wires to anything at all, even an array size function (which I'm pretty sure doesn't require a copy of the data to operate). And this is without even including the enqueue in the producer loop. So the buffer allocation seems to be part of the implementation of the queue and a DVR around the queue wouldn't help. This seems like a textbook case for DVRs, assuming you really need the producer-consumer pattern. I don't know too much about the internal implementation of DVRs, but it seems likely that dereferencing the DVR would be a very fast operation, and not something to worry about if your app is also throwing around 300kB arrays. Note that if you branch the wire of the matrix after dereferencing it, LV may still need to copy your data. Jason I agree that the DVR would most likely be the most efficient way to go since as I understand it it could be thought of as a "safe" pointer. So you could write data to the DVR location in the producer and read it in the consumer and since the DVR is "safe" in the sense that you don't have to manage access and all operations are still atomic (I think) what you lose is the ability to use non-polling process in the consumer. Now, you'll have to poll the DVR and have some mechanism to determine if the data is fresh. Maybe just include an updated flag in a data structure that includes your array. Then you could operate on the array with an in-place structure and avoid making unnecessary copies. Mark Quote Link to comment
Grampa_of_Oliva_n_Eden Posted July 6, 2010 Report Share Posted July 6, 2010 I don't think this is correct - the only wire I see branched is the queue ref, so this ensures you get a copy of the queue ref, not the queue element. I'm not privy to the internals of the queue implementation, but I think I do recall that I have read it is a protected implementation of shared memory. And I don't think the queues could execute at exactly the same time since a mutex or some such mechanism protects them so a queue read or write is always an atomic operation - that's the only way one could guarantee the FIFO behavior. Also, I built a simple example following the posted diagram - I get a buffer allocation on the dequeue if it wires to anything at all, even an array size function (which I'm pretty sure doesn't require a copy of the data to operate). And this is without even including the enqueue in the producer loop. So the buffer allocation seems to be part of the implementation of the queue and a DVR around the queue wouldn't help. I agree that the DVR would most likely be the most efficient way to go since as I understand it it could be thought of as a "safe" pointer. So you could write data to the DVR location in the producer and read it in the consumer and since the DVR is "safe" in the sense that you don't have to manage access and all operations are still atomic (I think) what you lose is the ability to use non-polling process in the consumer. Now, you'll have to poll the DVR and have some mechanism to determine if the data is fresh. Maybe just include an updated flag in a data structure that includes your array. Then you could operate on the array with an in-place structure and avoid making unnecessary copies. Mark Since the output of the init array is used for multiple iterations, that buffer can not be re-used because the consumer loop mod of that buffer could goof what get queued up next time. I'd expect an "always copy" in the wire after teh init but before the enqueue would let LV transfer the data 'in-place" and eliminate the buffer on the dequeue. Just my two cents, Ben Quote Link to comment
Mark Smith Posted July 6, 2010 Report Share Posted July 6, 2010 Since the output of the init array is used for multiple iterations, that buffer can not be re-used because the consumer loop mod of that buffer could goof what get queued up next time. I'd expect an "always copy" in the wire after teh init but before the enqueue would let LV transfer the data 'in-place" and eliminate the buffer on the dequeue. Just my two cents, Ben Here's what I'm seeing that makes me think you just can't avoid a buffer allocation on a dequeue - I don't know exactly how you could modify this to eliminate the allocation. But maybe I misunderstood the original question. Mark Quote Link to comment
Grampa_of_Oliva_n_Eden Posted July 6, 2010 Report Share Posted July 6, 2010 Here's what I'm seeing that makes me think you just can't avoid a buffer allocation on a dequeue - I don't know exactly how you could modify this to eliminate the allocation. But maybe I misunderstood the original question. Mark To avoids the extra data copy LV has to be able to "see" that the original data buffer is no longer required. In that example the data buffer being queued up is a constant in the code. If LV passed that buffer in-place and the consumer inverted all the bits, IN THE ORIGINAL BUFFER (sorry for yelling) then the next time you ran that code it would pass teh inverter version. I used queues to get data from my DAQ devices. The output from DAQmx is passed to the enqueue and since the wire is not branching, the data gets transfered in-place so the bufer from DAQmx read is the same buffer holding the data after the dequeue operation. Ben Quote Link to comment
jdunham Posted July 6, 2010 Report Share Posted July 6, 2010 Since the output of the init array is used for multiple iterations, that buffer can not be re-used because the consumer loop mod of that buffer could goof what get queued up next time. That's what I was trying to say, but Ben said it better. The data wire doesn't branch, but you've got two 2D array wires, one in each loop, and the upper one (dequeue) could be running at the same time as different data travels on the lower one (enqueue). So you are going to need separate buffers for those two datasets unless you use a DVR to tell the compiler to try to use the same data. In this diagram, the 300ms wait and the private queue make it extremely unlikely that any other data could get into that 2D wire, but I wouldn't really expect the compiler to rely on that in the optimization. Quote Link to comment
Mark Smith Posted July 6, 2010 Report Share Posted July 6, 2010 To avoids the extra data copy LV has to be able to "see" that the original data buffer is no longer required. In that example the data buffer being queued up is a constant in the code. If LV passed that buffer in-place and the consumer inverted all the bits, IN THE ORIGINAL BUFFER (sorry for yelling) then the next time you ran that code it would pass teh inverter version. I used queues to get data from my DAQ devices. The output from DAQmx is passed to the enqueue and since the wire is not branching, the data gets transfered in-place so the bufer from DAQmx read is the same buffer holding the data after the dequeue operation. Ben I still see a buffer allocation here. I don't doubt that you get exactly the performance you report, but I can't create a dequeue operation that doesn't show a buffer allocation . So in order for LabVIEW to "see" a data buffer in a queue is no longer required, LabVIEW has to decide at runtime that the buffer is available? For instance, the first code snippet is perfectly valid at compile time but because no queue exists it fails at runtime. So this means the Show Buffer Allocations can't accurately show the runtime allocations for a queue ? I'm confused! Quote Link to comment
jdunham Posted July 6, 2010 Report Share Posted July 6, 2010 I don't doubt that you get exactly the performance you report, but I can't create a dequeue operation that doesn't show a buffer allocation . So in order for LabVIEW to "see" a data buffer in a queue is no longer required, LabVIEW has to decide at runtime that the buffer is available? For instance, the first code snippet is perfectly valid at compile time but because no queue exists it fails at runtime. So this means the Show Buffer Allocations can't accurately show the runtime allocations for a queue ? I'm confused! How does your first snippet "fail at runtime"? If you mean that it throws an error, well sure, but the correct data, an empty 2D array, will still be transferred onto the blue wire, dropped into the buffer allocated for that purpose. Now here's where I wade into the deep end, exposing near-total ignorance and waiting for AQ to rescue me... For any queue of variably-sized data, I would expect the queue itself to contain an array of handles, where the handle is some kind of LabVIEW data structure (it used to be defined in the CIN manual, if that's still around). The handle is going to point to your 2D array of data, and the LabVIEW memory manager is going to allocate memory from the heap and store pointers to those dynamic memory object in the handle. So when you dequeue an item, you have to allocate a new buffer to store the handle, but it's possible that the handle could point to the same memory in the heap. The queue itself is done with that handle because it's been dequeued, and so it needs to live somewhere else, which is in the newly allocated buffer. So just because you have a new buffer to store the handle (which is small and fixed-size), doesn't necessarily mean you have made a copy of the heap object (which is dynamic and might be huge). However if there is any chance that the 2D array in the heap is needed in some other wire or branch of the same wire, LV will have to make a copy of that part of the heap so that dataflow integrity is assured. So just because you see an allocation dot on the diagram, the question is whether it is 100% required that the heap memory object is also copied. I think using a DVR is a way to try to make sure that the the same memory object is the one in use, but if no other wires are laying claim to the dynamic memory object, it could reuse the buffer at runtime even though your buffer allocations (which is 100% compile-time information) might show a copy. Waiting to be corrected, Jason Quote Link to comment
Mark Smith Posted July 6, 2010 Report Share Posted July 6, 2010 How does your first snippet "fail at runtime"? If you mean that it throws an error, well sure, but the correct data, an empty 2D array, will still be transferred onto the blue wire, dropped into the buffer allocated for that purpose. Now here's where I wade into the deep end, exposing near-total ignorance and waiting for AQ to rescue me... For any queue of variably-sized data, I would expect the queue itself to contain an array of handles, where the handle is some kind of LabVIEW data structure (it used to be defined in the CIN manual, if that's still around). The handle is going to point to your 2D array of data, and the LabVIEW memory manager is going to allocate memory from the heap and store pointers to those dynamic memory object in the handle. So when you dequeue an item, you have to allocate a new buffer to store the handle, but it's possible that the handle could point to the same memory in the heap. The queue itself is done with that handle because it's been dequeued, and so it needs to live somewhere else, which is in the newly allocated buffer. So just because you have a new buffer to store the handle (which is small and fixed-size), doesn't necessarily mean you have made a copy of the heap object (which is dynamic and might be huge). However if there is any chance that the 2D array in the heap is needed in some other wire or branch of the same wire, LV will have to make a copy of that part of the heap so that dataflow integrity is assured. So just because you see an allocation dot on the diagram, the question is whether it is 100% required that the heap memory object is also copied. I think using a DVR is a way to try to make sure that the the same memory object is the one in use, but if no other wires are laying claim to the dynamic memory object, it could reuse the buffer at runtime even though your buffer allocations (which is 100% compile-time information) might show a copy. Waiting to be corrected, Jason By fail at runtime I mean the dequeue will fail (set the error out true) if run without first creating a queue. That's just meant to show that the dequeue will compile without any knowledge of whether or not a queue has been created so I was trying a mental exercise to see how a dequeue would know that a data buffer can be reused if it knows nothing about the queue size or even if the queue exists. That's all. I doubt the buffer ever gets created if the queue ref is bad, however. And I do agree that it was accepted practice in the DBDVR (days before DVR) to use a single element queue as a way to pass data by ref and avoid making copies as you and Ben have pointed out. I'm just trying to make sense of this buffer allocation on dequeue thing at this point - I really am feeling confused. Also, your argument about creating buffers for queue element refs makes a lot of sense except that the Show Buffer Allocations tool specifically shows an array buffer being allocated in my snippets above - if I do the same thing and replace the the array with a string (and the array size with a string length) the Show Buffer Allocations tool now shows the buffer allocation on the dequeue as a string. So the tool thinks a buffer is getting allocated of whatever type of data the queue holds. I know this doesn't help explain anything, but I appreciate the discussion! Mark Quote Link to comment
jdunham Posted July 6, 2010 Report Share Posted July 6, 2010 Also, your argument about creating buffers for queue element refs makes a lot of sense except that the Show Buffer Allocations tool specifically shows an array buffer being allocated in my snippets above - if I do the same thing and replace the the array with a string (and the array size with a string length) the Show Buffer Allocations tool now shows the buffer allocation on the dequeue as a string. So the tool thinks a buffer is getting allocated of whatever type of data the queue holds. Don't think of them as queue element refs. Any array or string wire on a diagram is going to have an allocation of a handle structure, and you will see at least one allocation dot somewhere at the source of that wire. These handles are totally hidden by the implementation of LabVIEW (unless you use CINs), so don't expect to be told "this allocation dot is just a handle, and may or may not become an dynamic object (array or string) on the heap". Regardless of any queues, pretend instead that your array wire is in a case structure. If that case never executes, nothing should ever happen with the heap; no array should ever be created, but your compile-time handle will still exist, and you'll still see a buffer allocation dot. Unless you have very simple code and there is some constant-folding going on, the dynamic heap stuff can only happen at runtime, so there's no way to show the behavior before your VI runs. So I believe it's not a correct assumption that a buffer allocation dot is shown, it means that a fresh copy of dynamic data will be generated as that node executes. The dot makes it more likely that your data will be copied, but if the run-time realizes that no one else needs the data it should be able to point the handle at the existing heap object. There is no way to know that at compile-time (in most cases), which is when the dots are drawn, so you will see a dot if a copy might be made. Jason Quote Link to comment
Mark Smith Posted July 6, 2010 Report Share Posted July 6, 2010 Don't think of them as queue element refs. Any array or string wire on a diagram is going to have an allocation of a handle structure, and you will see at least one allocation dot somewhere at the source of that wire. These handles are totally hidden by the implementation of LabVIEW (unless you use CINs), so don't expect to be told "this allocation dot is just a handle, and may or may not become an dynamic object (array or string) on the heap". Regardless of any queues, pretend instead that your array wire is in a case structure. If that case never executes, nothing should ever happen with the heap; no array should ever be created, but your compile-time handle will still exist, and you'll still see a buffer allocation dot. Unless you have very simple code and there is some constant-folding going on, the dynamic heap stuff can only happen at runtime, so there's no way to show the behavior before your VI runs. So I believe it's not a correct assumption that a buffer allocation dot is shown, it means that a fresh copy of dynamic data will be generated as that node executes. The dot makes it more likely that your data will be copied, but if the run-time realizes that no one else needs the data it should be able to point the handle at the existing heap object. There is no way to know that at compile-time (in most cases), which is when the dots are drawn, so you will see a dot if a copy might be made. Jason I think this explanation makes a lot of sense - the array (or string) handle must exist before the data can exist (the heap gets allocated) and that's what we see as the allocation dot. I do think the LabVIEW tool to show allocations is a little misleading in that the buffer allocation dot that shows LabVIEW just grabbed a 32 bit handle for a possible array looks just like the allocation dot that says LabVIEW is going to make a copy of 200Mb of data from an existing array. No wonder it all seems a little confusing Thanks, Mark Quote Link to comment
Mark Zvilius Posted July 7, 2010 Author Report Share Posted July 7, 2010 Thanks for the discussion. My takeaway is that exactly how queues function wrt element copy is still rather opaque, and I would have to run some performance tests to determine when copies are being made. If I wanted to be absolutely sure, I'd have to wrap my elements (the 2D arrays) in a DVR and pass the DVRs on the queue. Going back to my original diagram I would argue that the queue could be thought of as logically connecting the wire in the Producer loop to the wire in the Consumer loop. Each time through the Producer loop the Initialize Array creates a new buffer to hold the array, and the Consumer could use the same buffer without copying because there are no branches in either the Producer or the Consumer. That's my naive, relatively-new-to-Labview idea of what could be, and I understand that there are probably very good reasons for what is. Thanks again, Mark Z. Quote Link to comment
jdunham Posted July 8, 2010 Report Share Posted July 8, 2010 (edited) Going back to my original diagram I would argue that the queue could be thought of as logically connecting the wire in the Producer loop to the wire in the Consumer loop. Each time through the Producer loop the Initialize Array creates a new buffer to hold the array, and the Consumer could use the same buffer without copying because there are no branches in either the Producer or the Consumer. I think it is like that, but because queues are flexible, that decision to reuse the buffer has to be made at runtime. In fact I believe it could come out differently from one iteration to the next (especially if your program were more complicated). Therefore the dots aren't able to show you the whole reality. Edited July 8, 2010 by jdunham Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.