Rolf Kalbermatter Posted July 9, 2013 Report Share Posted July 9, 2013 Bingo! No. We don't need protected read/modify/store IF we are using the LV memory manager functions. The reason being is that we can pick and chose individual elements of arrays to update. you cannot do this in labview without reading the whole array, modifying and then writing the whole array. Ahh I think you got that wrong. Ever used the inplace structure node? There you have an inplace replace array element without having to read and write the entire array. cmpexch (CAS) is an optimised CPU instruction so the processor needs to support it (if it's INTEL, it's a given, PPC - not so sure). I was hoping this was what SwapBlock used as it could potentially be more efficient than moveblock. But the only reference I can find to it was on a German forum and you provided an answer . I can't see any other reason for SwapBlock other than to wrap cmpexch The only people that can answer that question is NI, really. For my purposes, I don't really need the compare aspect, but an in-place atomic exchange would be useful for the indexes. PPC doesn't have cas directly, but it has synced loads and stores and with a little assembly a cas() can be fairly easily implemented as I have found out in the meantime. As to SwapBlock() I can't remember the details but believe it is more or less an inplace swapping of the memory buffer contents but without any locking of any sort. As such it does more than MoveBlock() which only copies from one buffer to the other, but not fundamentally more. That API comes from the LabVIEW stoneage, when concurrent access was not an issue since all the multithreading in LabVIEW was handled through its own cooperative multithreading execution system so there was in fact no chance that two competing LabVIEW modules could attempt to work on the same memory location without the LabVIEW programmer knowing it if he cared. With preemptive multitasking you can never have this guarantee, as your SwapBlock() call could be preempted anywhere in its executation. One thing about SwapBlock could be interesting, as it does operate on buffers as 4 byte integers if both buffers are 4 byte aligned and the length to operate on is a multiple of 4 bytes. Quote Link to comment
mje Posted July 9, 2013 Report Share Posted July 9, 2013 Bingo! No. We don't need protected read/modify/store IF we are using the LV memory manager functions. The reason being is that we can pick and chose individual elements of arrays to update. you cannot do this in labview without reading the whole array, modifying and then writing the whole array. Correction: you can't do this in LabVIEW without locking the whole array or some shared reference which contains the array. Reading a single element from an array or updating a single element can easily be done with the existing primitives, but to be able to operate in place would require some sort of synchronization mechanism. DVR, LV2, SEQ can easily replace only a single element, but each requires the lockdown of the whole containing construct. Quote Link to comment
Rolf Kalbermatter Posted July 9, 2013 Report Share Posted July 9, 2013 Correction: you can't do this in LabVIEW without locking the whole array or some shared reference which contains the array. Reading a single element from an array or updating a single element can easily be done with the existing primitives, but to be able to operate in place would require some sort of synchronization mechanism. DVR, LV2, SEQ can easily replace only a single element, but each requires the lockdown of the whole containing construct. Well the array has to be locked somehow yes. Even in C you would have to do something like this, unless you can limit the possible accesses significantly in terms of who can write or read to a single element. If that can be done then you could get away with referenced access to the array elements with maybe a cas() mechanism to make really sure. But that can have also significant performance losses. The x86/64 cmpxchg is fairly optimized and ideal for this, but the PPC has a potential for performance loss as the syncing is done through storing the address of the protected memory location into a special register. If someone else does want to protect another address, his attempt will overwrite that register and you loose the reservation and will see this after the compare operation and have to start all over again by reserving the address again. The potential to loose this reservation is fairly small as there are only about two assembly instructions between setting the reservation and checking that it is still valid but it does exist nevertheless. The advantage of the PPC implementation is that the reservation does not lock out bustraffic at all, unless it tries to access the reserved address, while the x86 implementation locks out any memory access for the duration of the cmpxchg operation. Quote Link to comment
mje Posted July 9, 2013 Report Share Posted July 9, 2013 Because the writer only reads the readers indexes and the readers only write to them AND the only important value is the lowest; if it is possible to update a single location in the array without affecting others, then no locking is required at all (memory barriers are sufficient) and no read/modify/write issues arise. Moveblock enables this but any native LabVIEW array manipulation will not (AFAIK). Absolutely. I'm curious though, how realistic of a use case is it where the bottleneck would be the actual copying of an element from the buffer such that allowing independent tasks to simultaneously operate on different buffer elements will actually give a measurable return relative to the execution time of whatever is making these calls? I just don't now the answer to this. I completely get how in other languages you can get gains by using pointers and such and avoiding copies for large data structures by operating directly in the buffer's memory space, but in LabVIEW there's just no way to do that and you'll always need to copy an element into a scope local to each reader. I mean if the index logic is separated from the buffer access logic, does it really matter if only one task can actually access the buffer at a time when it comes time to actually read/write from the buffer? I completely understand that yes, you may get a few hundered nanoseconds out by having one task modify an element while another task reads a different element, but if any real implementation that uses this fast access consumes a hundred times more time, I'd argue there's no real point. What are the relative gains that can be had here? How do those gains scale with the frequency of hitting that buffer, which depend on the number of readers and how fast the cycle time of each reader is? All in, fun stuff. Please don't mind my poking about, the theory of this discussion fascinates me. I doubt I can offer any real insight as far as implementtion goes which any of you haven't already considered. Quote Link to comment
ShaunR Posted July 9, 2013 Author Report Share Posted July 9, 2013 unless you can limit the possible accesses significantly in terms of who can write or read to a single element. If that can be done then you could get away with referenced access to the array elements This is exactly what is being achieved with the LV MM functions although we cannot get reference access, only a "copy from" since to get back into labview we need "by value." Quote Link to comment
ShaunR Posted July 9, 2013 Author Report Share Posted July 9, 2013 Absolutely. I'm curious though, how realistic of a use case is it where the bottleneck would be the actual copying of an element from the buffer such that allowing independent tasks to simultaneously operate on different buffer elements will actually give a measurable return relative to the execution time of whatever is making these calls? I just don't now the answer to this. Well. You answered that in a previous post. If you remember, there was a significant change in times for different sizes of buffer-the bigger it got, the worse it was. As you pointed out. That was because the global variable array forced a copy of the entire array to access a single element. So although you are correct in that with native labview code you can "get" a single element. In reality, a copy of an entire array is required to get it (the only scenario where this isn't the case is where labview uses the "sub-array" wire). The latest version allows a buffer size as large as you want with no impact. So if you want a practical visualisation. Load up the first example and set the buffer to 10,000 and compare it to the second example with a buffer of 10,000. I completely get how in other languages you can get gains by using pointers and such and avoiding copies for large data structures by operating directly in the buffer's memory space, but in LabVIEW there's just no way to do that and you'll always need to copy an element into a scope local to each reader. Agreed. Shame though I mean if the index logic is separated from the buffer access logic, does it really matter if only one task can actually access the buffer at a time when it comes time to actually read/write from the buffer? I completely understand that yes, you may get a few hundered nanoseconds out by having one task modify an element while another task reads a different element, but if any real implementation that uses this fast access consumes a hundred times more time, I'd argue there's no real point. What are the relative gains that can be had here? How do those gains scale with the frequency of hitting that buffer, which depend on the number of readers and how fast the cycle time of each reader is? You can try this yourself. Set all the read and write polymorphic instances to non-reentrant.(the write/read double, write read index etc inside the read and write VIs) You've hit an important point here though. Scalability. This scales really, really well. Make the number of readers, say, 10 and there is a marginal increase in processing time. For queues, it is a fairly linear increase. All in, fun stuff. Please don't mind my poking about, the theory of this discussion fascinates me. I doubt I can offer any real insight as far as implementtion goes which any of you haven't already considered. Oh. I don't know. Now that you have the buffer that doesn't need DVRs, globals, LV2 globals or singletons, it can be put into a class. Basically the incrementing counter (the feedback node in the reads) just needs to be in your private data cluster. Then you would be in a good position to figuring out how to manage the reader registration (The ID parameter) That sort of stuff is only just on the edge of my radar ATM. But nothing to stop you coming up with the class hierarchy for an API since you don't have to worry about locking/contention at that point. Quote Link to comment
Aristos Queue Posted July 10, 2013 Report Share Posted July 10, 2013 However, the locking overhead around the private cluster coupled with atomicity of the private data clster The what? There are no locks around the private data of a class. It unbundles just the same as a cluster. I've got no idea what you're referring to here. (be it an FGV or a LVOOP private data) *blink* These are not equatable in any mental construct I have in my head. I cannot think of any architecture where a storage mechanism can be substituted for a data type or vice versa. I am really and truly lost reading these posts. I'm curious though, how realistic of a use case is it where the bottleneck would be the actual copying of an element from the buffer such that allowing independent tasks to simultaneously operate on different buffer elements will actually give a measurable return relative to the execution time of whatever is making these calls? I just don't now the answer to this. It's fairly common, but that's why forking the wire to two read-only operations generally suffices to cover the use cases. That's why I posted my revised benchmark of a single queue where the output is "forked" to two separate operations (in the benchmark, the two operations -- both Wait Ms calls -- don't actually use the values but they represent operations that need them and do not modify the values). LabVIEW allows two parallel operations to operate on the same by-value data element as long as both forks are read-only operations. That's how you get the pointer sharing in LabVIEW that is so commonly needed in other programming environments. ShaunR is looking for ways to achieve this in LabVIEW by cheating values onto the wires that LV thinks are separate data instances but are not actually. It's going to be unstable in LabVIEW for any non-flat data type without some serious re-education of the inplaceness algorithms to be aware of such unorthodox sharing. Strings, paths, waveforms, arrays, objects -- these are all going to run into problems if you try to avoid duplicating the underlying memory when putting it onto two separate processes. Quote Link to comment
ShaunR Posted July 10, 2013 Author Report Share Posted July 10, 2013 Ahh I think you got that wrong. Ever used the inplace structure node? There you have an inplace replace array element without having to read and write the entire array.Indeed. But it seems to lock the entire array while it does it. When I experimented, it one of the slowest methods. As to SwapBlock() I can't remember the details but believe it is more or less an inplace swapping of the memory buffer contents but without any locking of any sort. As such it does more than MoveBlock() which only copies from one buffer to the other, but not fundamentally more. That API comes from the LabVIEW stoneage, when concurrent access was not an issue since all the multithreading in LabVIEW was handled through its own cooperative multithreading execution system so there was in fact no chance that two competing LabVIEW modules could attempt to work on the same memory location without the LabVIEW programmer knowing it if he cared. With preemptive multitasking you can never have this guarantee, as your SwapBlock() call could be preempted anywhere in its executation. One thing about SwapBlock could be interesting, as it does operate on buffers as 4 byte integers if both buffers are 4 byte aligned and the length to operate on is a multiple of 4 bytes. Ahhhh. Those were the days [misty, wavey lines]. When GPFs where what happened to C programmers and memory management was remembering what meetings to avoid. Where is this documented about SwapBlock (or anything for that matter)? I couldn't even find what arguments to use. I also found a load of variant functions (e.g. LVVariantCopy) but have no idea how to call them. Quote Link to comment
ShaunR Posted July 10, 2013 Author Report Share Posted July 10, 2013 The what? There are no locks around the private data of a class. It unbundles just the same as a cluster. I've got no idea what you're referring to here. *blink* These are not equatable in any mental construct I have in my head. I cannot think of any architecture where a storage mechanism can be substituted for a data type or vice versa. I am really and truly lost reading these posts. I look at it very simplistically. The private data cluster is just a variable that is locked so you can only access it via public methods. FGV is also a variable that is locked so you can only access it with the typdef methods. In my world. There is no difference between a class and an FGV as a storage mechanism. ShaunR is looking for ways to achieve this in LabVIEW by cheating values onto the wires that LV thinks are separate data instances but are not actually. I've got no idea what labview thinks (that's why your input is invaluable). I just want it to do what I need. If labview requires me to make a copy of an element so it is a data copy rather than shared, then I'm ok with that (LVCopyVariant, LVVariantGetContents, LVVarintSetContents et. al. docs would be useful ). But I'm not ok with a copy of an entire array for one element that causes buffer-size dependant performance. The only reason I have gone down this path is because a global copies an entire array so I can get to one element. If you have another way without using the LVMM functions then I'm all ears. I just don't see any though (and I've tried quite a few). Throw me a bone here eh? Quote Link to comment
Aristos Queue Posted July 10, 2013 Report Share Posted July 10, 2013 I look at it very simplistically. The private data cluster is just a variable that is locked so you can only access it via public methods. FGV is also a variable that is locked so you can only access it with the typdef methods. In my world. There is no difference between a class and an FGV as a storage mechanism. Completely wrong. I'm going to state this about five different ways because I'm not sure which metaphor will be most helpful for you here. I'm going to keep offering variations on this theme until I can help you see why because understanding this matters massively to your ability to architect within LabVIEW and because you tend to teach others. Your statement would mean that timestamps are equivalent to an FGV because you can only get to the cluster of individual fields (like "hours") by using a particular function. In fact, it would mean that all data types are equivalent to an FGV because you can only use certain functions to access their data -- right down to only certain functions can access the bits of a numeric. A class does not define a data location. It defines the rules for data *consistency* -- in exactly the same way that the functions that manipulate Path keep the internal data structure consistent. If it helps you better understand why these have nothing in common, go make every method that has a "obj in" and "obj out" of your classes Reentrant. In 99.99% of cases, this will not break the class functionality (the 0.01% are in classes that contain reference types or methods with side-effects). But that same reentrancy change completely destroys the FGV. One is defining data storage. The other defines data behavior. To put it another way, the class and the FGV have *nothing* in common. One is a data type. The other is a storage mechanism. The FGV might store a class or a numeric or any other data type. A bit of data on a wire is local to the executing VI. A bit of data stored in an FGV is global and shared. The object on a wire is local execution and is FULLY ACCESSIBLE by the local calling environment via the functions provided by the class. In the 99.99% case, that means they do not share any state with any other call to that same function. There is no lock on a class' private data any more than there is a lock on the bits inside a numeric control or the fields of a cluster. A class in LabVIEW is NOT a reference data type. There is no single object that every wire points back to. There is just a blob of data that when the wire forks becomes two distinct and independent blobs of data. The methods of the class do nothing more than define the rules for how that blob mutates as it passes through those functions. Users can abuse classes with any number of embedded reference types or methods with side-effects that play havoc with this basic definition, but that isn't particularly common (it may be common these days to wrap references in classes, but it isn't common for most classes to be wrapping references -- most classes are indeed by value types, and most code that I see from customers these days would benefit greatly in terms of code clarity if every single cluster and many of the lone elements (particularly strings) were replaced with a class with no loss or change of functionality). A timestamp is not equivalent to an FGV. A path is not equivalent to an FGV. A cluster is not equivalent to an FGV. In *exactly* the same way, a class is not equivalent to an FGV. An FGV is equivalent to a DVR. It is equivalent to a single-element queue. It is equatable to a notifier. None of these are equatable to an array. or a cluster. or a class. Did any of that make sense? I've stated it as many ways as I can think to state it. I've got no idea what labview thinks (that's why your input is invaluable). I just want it to do what I need. If labview requires me to make a copy of an element so it is a data copy rather than shared, then I'm ok with that (LVCopyVariant, LVVariantGetContents, LVVarintSetContents et. al. docs would be useful ). But I'm not ok with a copy of an entire array for one element that causes buffer-size dependant performance. The only reason I have gone down this path is because a global copies an entire array so I can get to one element. If you have another way without using the LVMM functions then I'm all ears. I just don't see any though (and I've tried quite a few). I think that's my point -- I have a hard time believing that "a disruptor implementation that copies an element out of a buffer N times for each of N processes running in parallel" can ever beat "a single queue that dequeues an element and hands it to N processes running in parallel without making a copy" for any data type beyond basic numerics, and for basic numerics I doubt that the gains are particularly measurable. The data copy is just that expensive. Quote Link to comment
ShaunR Posted July 10, 2013 Author Report Share Posted July 10, 2013 "a disruptor implementation that copies an element out of a buffer N times for each of N processes running in parallel"can ever beat "a single queue that dequeues an element and hands it to N processes running in parallel without making a copy" for any data type beyond basic numerics, and for basic numerics I doubt that the gains are particularly measurable. The data copy is just that expensive. OK. Words aren't working. What is your "single queue that dequeues an element and hands it to N processes running in parallel", implementation of this?: AQ1.vi Quote Link to comment
Aristos Queue Posted July 10, 2013 Report Share Posted July 10, 2013 Here you go: AQ1_AsSingleQueue.vi Quote Link to comment
ShaunR Posted July 10, 2013 Author Report Share Posted July 10, 2013 (edited) Here you go:AQ1_AsSingleQueue.vi Not quite.. The execution time is 500+100 (they aren't parallel processes). Unlike the buffer which is 500 (the greater of the two). Try again please Edited July 10, 2013 by ShaunR Quote Link to comment
mje Posted July 10, 2013 Report Share Posted July 10, 2013 I think two issues are being conflated here. Everything AQ posted with respect to objects is of course correct. However regardless of the nature of an object's structure the data needs to be brought into some scope for it to be operated on. It's that scope that's causing the problem. Using a global variable will demand copying the whole object structure before you can operate to get your local copy of the single buffer element. Using a DVR or a FGV demands an implicit lock, either via the IPE required to operate on the DVR or the nature of a non-reentrant VI for the FGV. So while a class does not have any built in reference or exclusion mechanics, pulling that data into some useful scope such that it can be operated on does. The same issue is what has prevented me from posting an example of how to pull a single element from a buffer without demanding a copy of the entire buffer or running through some sort of mutual exclusion lock. Short of using the memory manager functions as Shaun has already demonstrated I don't see how to do it. I know there are flaws with the memory manager method, I just don't see an alternative without inducing copies or locks. 2 Quote Link to comment
ShaunR Posted July 10, 2013 Author Report Share Posted July 10, 2013 (edited) I think two issues are being conflated here. Everything AQ posted with respect to objects is of course correct. However regardless of the nature of an object's structure the data needs to be brought into some scope for it to be operated on. It's that scope that's causing the problem. Using a global variable will demand copying the whole object structure before you can operate to get your local copy of the single buffer element. Using a DVR or a FGV demands an implicit lock, either via the IPE required to operate on the DVR or the nature of a non-reentrant VI for the FGV. So while a class does not have any built in reference or exclusion mechanics, pulling that data into some useful scope such that it can be operated on does. The same issue is what has prevented me from posting an example of how to pull a single element from a buffer without demanding a copy of the entire buffer or running through some sort of mutual exclusion lock. Short of using the memory manager functions as Shaun has already demonstrated I don't see how to do it. I know there are flaws with the memory manager method, I just don't see an alternative without inducing copies or locks. Well. You have managed to concisely put into a paragraph or two what I have been unsuccessfully trying to get across in about 3 pages of posts . (+1). Edited July 10, 2013 by ShaunR Quote Link to comment
Aristos Queue Posted July 10, 2013 Report Share Posted July 10, 2013 Not quite.. The execution time is 500+100 (they aren't parallel processes). Unlike the buffer which is 500 (the greater of the two).Try again please The processes are parallel -- the computation done on each value is done in parallel with the computation done on the other value. The only reason you can make the argument that they aren't parallel is because you have contrived that one operates on the odd values and the other operates on the even values. If you have a situation like that, just stick the odd values in one queue and the even values in another queue at the point of Enqueue. Now, I realize you can make the system "both loops execute on 2/3rs of the items, with some overlap" or any other fractional division... as you move that closer to 100%, the balance moves toward the copy being the overhead. My argument has been -- and remains -- that for any real system that is really acting on every value, the single queue wins. If you have two processes that have dissimilar execution, then just arrange the first one first in a cascade, like this (which does run at 500 ms): AQ1_CascadeQueue.vi Quote Link to comment
ShaunR Posted July 10, 2013 Author Report Share Posted July 10, 2013 (edited) The processes are parallel -- the computation done on each value is done in parallel with the computation done on the other value. The only reason you can make the argument that they aren't parallel is because you have contrived that one operates on the odd values and the other operates on the even values. If you have a situation like that, just stick the odd values in one queue and the even values in another queue at the point of Enqueue. Now, I realize you can make the system "both loops execute on 2/3rs of the items, with some overlap" or any other fractional division... as you move that closer to 100%, the balance moves toward the copy being the overhead. My argument has been -- and remains -- that for any real system that is really acting on every value, the single queue wins. If you have two processes that have dissimilar execution, then just arrange the first one first in a cascade, like this (which does run at 500 ms): AQ1_CascadeQueue.vi I disagree. The processes are most certainly not parallel,even though your computations are (as is seen from the time). In your second example you are attempting to "pipeline" and now using 2 queues (I say attempting becauseit doesn't quite work in this instance). You are a) only processing 100 values instead of all 200.(true case in bottom loop never gets executed) b) lucky there is nothing in the other cases because pipelining is "hybrid" serial (they have something to say about that in the video) c) lucky that the shortest is last (try swapping the 5 with the -1 so -1 is top with a) fixed->back to 600ms) d) No different to my test harness (effectively) if you place the second enqueue straight after the bottom loops dequeue instead of after the case statement (which fixes c). Edited July 10, 2013 by ShaunR Quote Link to comment
GregSands Posted July 10, 2013 Report Share Posted July 10, 2013 I was going to suggest using a User Event, which which I'd always thought of as a One-to-Many framework. Just tried it, and this "works" with a time of 500ms, and with proper parallel execution, but Profiling shows that the data is copied on one of the event structure readers. It looks as though multiple registrations sets up separate event "queues", and Generate User Event then sends the data independently to each one. I had never realised it behaved like this, though thinking about it now, I guess it's not surprising. Quote Link to comment
Aristos Queue Posted July 11, 2013 Report Share Posted July 11, 2013 a) only processing 100 values instead of all 200.(true case in bottom loop never gets executed) b) lucky there is nothing in the other cases because pipelining is "hybrid" serial (they have something to say about that in the video) c) lucky that the shortest is last (try swapping the 5 with the -1 so -1 is top with a) fixed->back to 600ms) d) No different to my test harness (effectively) if you place the second enqueue straight after the bottom loops dequeue instead of after the case statement (which fixes c). Ok. That makes sense. I think I see where this fits in now. Quote Link to comment
ShaunR Posted July 11, 2013 Author Report Share Posted July 11, 2013 I was going to suggest using a User Event, which which I'd always thought of as a One-to-Many framework. Just tried it, and this "works" with a time of 500ms, and with proper parallel execution, but Profiling shows that the data is copied on one of the event structure readers. It looks as though multiple registrations sets up separate event "queues", and Generate User Event then sends the data independently to each one. I had never realised it behaved like this, though thinking about it now, I guess it's not surprising. Indeed. Events are a more natural topological fit (as I think I mentioned many posts ago). Initially, I was afraid that the OS would interfere more with events than queues (not knowing exactly how they work under the hood, but I did know they each had their own queue). For completeness, I'll add an event test harness so we can explore all the different options. Quote Link to comment
Aristos Queue Posted July 11, 2013 Report Share Posted July 11, 2013 Ok... the hang when restarting your code is coming from the First Call? and the Feedback Nodes. I'm adding allocation to your cluster to store those values. It needs a "First Call?" boolean and N "last read index" where N is the number of reader processes. Quote Link to comment
ShaunR Posted July 11, 2013 Author Report Share Posted July 11, 2013 (edited) Ok... the hang when restarting your code is coming from the First Call? and the Feedback Nodes. I'm adding allocation to your cluster to store those values. It needs a "First Call?" boolean and N "last read index" where N is the number of reader processes. The first call (in the write) is because when everything starts up all indexes are zero. The readers only continue when the write index is greater so they sit there until the cursor index is 1. With the writer, it has to check that the the lowest reader index.isn't the same as its current write. At start-up,, when everything is zero, it is the same therefore if you don't have the First Call, it will hang on start. Once the writer and reader indexes get out of sync, everything is fine (until the I64 wraps around of course-needs to be addressed). If you have a situation where you reset all the indexes and the cursor back to zero AND it is not the first call; it will hang as neither the readers or the writer can proceed. Edited July 11, 2013 by ShaunR Quote Link to comment
mje Posted July 11, 2013 Report Share Posted July 11, 2013 I spent an hour or so last night starting to code my own solution and got as far as muddling with pointers for indexing then stopped for the night. It occurred to me while I was mulling the problem through my head are the memory manager calls re-entrant? If I'm going about working directly with pointers using functions such as DSNewPtr, DSNewPClr, MoveBlock, and DSDisposePtr much as Shaun did in an effort to circumvent the lock mechanisms behind DVRs and FGVs, is it even possible to have two MoveBlock calls executing at the same time? Obviously this demands that each call be made from a different thread and the CINs aren't configured to use the UI thread, but the documentation is pretty much headers only as far as I can tell and doesn't really indicate either way. I'm hoping to quantify whether there are any gains to be made by leaving behind the DVR in favor of lower level memory management of the reader/writer indices. I'm still going to be keeping the buffer proper as a native LabVIEW implementation (DVR) since I see no other way to be able to store non-flat structures by poking about memory directly without invoking expensive operations like flattening/unflattening. My hypothesis any gains that may be had will be modest enough to not warrant the increased CPU load of polling the indexing loops. Just after posting this I realized the re-entrant nature of MoveBlock is probably irrelevant if it is only being used for indexes. These bits of data are so small there's likely no measurable difference in practice if the calls were forced serial or not. Might be relevant if playing with larger structures, but as I said, I plan on keeping the buffer in native LabVIEW. It will still be interesting to test my hypothesis though to see if dancing around the reference locking mechanism saves anything. Yes, I'm a scientist, hypothesis testing is what I do... 1 Quote Link to comment
Aristos Queue Posted July 11, 2013 Report Share Posted July 11, 2013 mje: Yes, those functions are reentrant. ShaunR: I'd like you to try a test to see how it impacts performance. I did some digging and interviews of compiler team. In the Read and the Write VIs, there are While Loops. Please add a Wait Ms primitive in each of those While Loops and wire it with a zero constant. Add it in a Flat Sequence Structure AFTER all the contents of the loop have executed, and do it unconditionally. See if that has any impact on throughput. Details: With While Loops, LabVIEW uses some heuristics to decide how much time to give to one loop before yielding to allow other parallel code to run. The most common heuristic is a thread yield every 55 microseconds, give or take, when nothing else within the While Loop offers a better recommendation. You are writing a polling loop that I don't think LV will recognize as a polling loop because the sharing of data is happening "behind the scenes" where LV can't see it. A Wait Ms wired with a zero forces a clump yield in LabVIEW -- in other words, it forces the LV compiler to give other clumps a chance to run. If there are no other clumps pending, it just keeps going with the current VI. Because this is a polling loop, if the loop fails to terminate on one iteration, it does no good to iterate again unless/until the other VI has at least had a chance to update. The yield may help improve the throughput a bit. These bits of data are so small there's likely no measurable difference in practice if the calls were forced serial or not. Might be relevant if playing with larger structures, but as I said, I plan on keeping the buffer in native LabVIEW. If these were not reentrant, the impact would be huge. The size of the data doesn't matter at all -- you would see a substantial performance hit if those functions were no-ops and non-reentrant. Just the overhead of the mutex lock and thread swapping would be substantial. Quote Link to comment
mje Posted July 11, 2013 Report Share Posted July 11, 2013 mje: Yes, those functions are reentrant. ShaunR: I'd like you to try a test to see how it impacts performance. I did some digging and interviews of compiler team. In the Read and the Write VIs, there are While Loops. Please add a Wait Ms primitive in each of those While Loops and wire it with a zero constant. Add it in a Flat Sequence Structure AFTER all the contents of the loop have executed, and do it unconditionally. See if that has any impact on throughput. Thanks for the confirmation. Regarding the yielding, I saw no measurable difference when I made this a similar modification earlier, though the tests were in a virtual environment. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.