Jump to content

ShaunR

Members
  • Posts

    4,871
  • Joined

  • Days Won

    296

Everything posted by ShaunR

  1. Done. MJE mentioned this previously and, until now I have resisted since I was more concerned with absolute speed performance. But it has nasty side effects on thread constrained systems (Dual cores etc) unless they can yield. You lose performance of using subroutine on constrained CPUs and spread obviously increases as we give labview the chance to interfere. But nowhere near as much as the queue or event primitives. So you see an increase in mean and STD but the median and quartile (which I think are better indicators as they are robust to spurii) remains pretty much unchanged. I've also now modified the index reader in the write to be able to cope with an arbitrary number of multiple readers. We just need to think how we store and manage registration (a reader jumps on board when the writer is halfway through the buffer, for example). Also added the event version of the test harnesses. Version 3 is here. After this version I will be switching to a normal versioning number system (x.x.x style) as I think we are near the end of prototyping.
  2. Take a look at Dispatcher in the CR It may provide most of your TCPIP architecture as it is publisher/subscriber and can cope with many clients/servers.
  3. The first call (in the write) is because when everything starts up all indexes are zero. The readers only continue when the write index is greater so they sit there until the cursor index is 1. With the writer, it has to check that the the lowest reader index.isn't the same as its current write. At start-up,, when everything is zero, it is the same therefore if you don't have the First Call, it will hang on start. Once the writer and reader indexes get out of sync, everything is fine (until the I64 wraps around of course-needs to be addressed). If you have a situation where you reset all the indexes and the cursor back to zero AND it is not the first call; it will hang as neither the readers or the writer can proceed.
  4. Indeed. Events are a more natural topological fit (as I think I mentioned many posts ago). Initially, I was afraid that the OS would interfere more with events than queues (not knowing exactly how they work under the hood, but I did know they each had their own queue). For completeness, I'll add an event test harness so we can explore all the different options.
  5. I disagree. The processes are most certainly not parallel,even though your computations are (as is seen from the time). In your second example you are attempting to "pipeline" and now using 2 queues (I say attempting becauseit doesn't quite work in this instance). You are a) only processing 100 values instead of all 200.(true case in bottom loop never gets executed) b) lucky there is nothing in the other cases because pipelining is "hybrid" serial (they have something to say about that in the video) c) lucky that the shortest is last (try swapping the 5 with the -1 so -1 is top with a) fixed->back to 600ms) d) No different to my test harness (effectively) if you place the second enqueue straight after the bottom loops dequeue instead of after the case statement (which fixes c).
  6. Will Smith beat up some aliens, aparently
  7. Well. You have managed to concisely put into a paragraph or two what I have been unsuccessfully trying to get across in about 3 pages of posts . (+1).
  8. I have faith in you !
  9. Indeed. However a variant is constructed (constant on the diagram), but from what AQ was saying about descriptors, a simple copy is not adequate since the "void" variant has the wrong ones. I'm now wondering about the LVVariantGetContents and LVVariantSetContents which may be the correct way to modify a variants data and cause the descriptor to be updated appropriately.
  10. Hmmmm. Not sure what this "top swap" actually is. Is that just swapping handles/pointers so basically the pointer in the buffer then points to the empty variant for a read? That would be fine for a write, but for a read the variant needs to stay in the buffer. Can you demonstrate with my original example?
  11. Not quite.. The execution time is 500+100 (they aren't parallel processes). Unlike the buffer which is 500 (the greater of the two). Try again please
  12. OK. Words aren't working. What is your "single queue that dequeues an element and hands it to N processes running in parallel", implementation of this?: AQ1.vi
  13. +1. Plenty for me to play with here. Ta very muchly (did you mean 4 or 8 bytes for the pointer size rather than 8 or 16?)If the variant claims right of destruction, what happens to the copies in the buffer? Is this the reason why it crashes on deallocation?
  14. I look at it very simplistically. The private data cluster is just a variable that is locked so you can only access it via public methods. FGV is also a variable that is locked so you can only access it with the typdef methods. In my world. There is no difference between a class and an FGV as a storage mechanism. I've got no idea what labview thinks (that's why your input is invaluable). I just want it to do what I need. If labview requires me to make a copy of an element so it is a data copy rather than shared, then I'm ok with that (LVCopyVariant, LVVariantGetContents, LVVarintSetContents et. al. docs would be useful ). But I'm not ok with a copy of an entire array for one element that causes buffer-size dependant performance. The only reason I have gone down this path is because a global copies an entire array so I can get to one element. If you have another way without using the LVMM functions then I'm all ears. I just don't see any though (and I've tried quite a few). Throw me a bone here eh?
  15. Indeed. But it seems to lock the entire array while it does it. When I experimented, it one of the slowest methods. Ahhhh. Those were the days [misty, wavey lines]. When GPFs where what happened to C programmers and memory management was remembering what meetings to avoid. Where is this documented about SwapBlock (or anything for that matter)? I couldn't even find what arguments to use. I also found a load of variant functions (e.g. LVVariantCopy) but have no idea how to call them.
  16. Now you have switched to byte arrays; you don't need a terminating string. But we still need to get rid of the flatten and unflatten which are too slow.
  17. Well. You answered that in a previous post. If you remember, there was a significant change in times for different sizes of buffer-the bigger it got, the worse it was. As you pointed out. That was because the global variable array forced a copy of the entire array to access a single element. So although you are correct in that with native labview code you can "get" a single element. In reality, a copy of an entire array is required to get it (the only scenario where this isn't the case is where labview uses the "sub-array" wire). The latest version allows a buffer size as large as you want with no impact. So if you want a practical visualisation. Load up the first example and set the buffer to 10,000 and compare it to the second example with a buffer of 10,000. Agreed. Shame though You can try this yourself. Set all the read and write polymorphic instances to non-reentrant.(the write/read double, write read index etc inside the read and write VIs) You've hit an important point here though. Scalability. This scales really, really well. Make the number of readers, say, 10 and there is a marginal increase in processing time. For queues, it is a fairly linear increase. Oh. I don't know. Now that you have the buffer that doesn't need DVRs, globals, LV2 globals or singletons, it can be put into a class. Basically the incrementing counter (the feedback node in the reads) just needs to be in your private data cluster. Then you would be in a good position to figuring out how to manage the reader registration (The ID parameter) That sort of stuff is only just on the edge of my radar ATM. But nothing to stop you coming up with the class hierarchy for an API since you don't have to worry about locking/contention at that point.
  18. This is exactly what is being achieved with the LV MM functions although we cannot get reference access, only a "copy from" since to get back into labview we need "by value."
  19. Indeed it is. But it is what makes it usable. Bingo! No. We don't need protected read/modify/store IF we are using the LV memory manager functions. The reason being is that we can pick and chose individual elements of arrays to update. you cannot do this in labview without reading the whole array, modifying and then writing the whole array. This is why I'm not worried about registration. Originally I had an array for the indexes in the global variable. But the obvious race conditions (read/modify/write) caused me to revert to single indexes fixed to the number of readers I was testing so I could get on and benchmark. It is/was an interim solution as, at the time, it wasn't clear whether it was worth spending the time working out a registration scheme if the performance was atrocious. Now I'm not using a global. This is a no-brainer and doesn't require additional logic, just a pointer to an array of indexes. The writer only needs to know how many are registered and then can do an array min on all of them. I'm thinking of 2 arrays, one for the readers' counts and one for the readers' R indexes as the two calls are probably more efficient than one call to a 2d flat array with all the gubbins required to extract the dimensions.(That's just a gut feeling). This part could, of course, also be done in native labview. It's the readers that are the problem....... Because the writer only reads the readers indexes and the readers only write to them AND the only important value is the lowest; if it is possible to update a single location in the array without affecting others, then no locking is required at all (memory barriers are sufficient) and no read/modify/write issues arise. Moveblock enables this but any native LabVIEW array manipulation will not (AFAIK). cmpexch (CAS) is an optimised CPU instruction so the processor needs to support it (if it's INTEL, it's a given, PPC - not so sure). I was hoping this was what SwapBlock used as it could potentially be more efficient than moveblock. But the only reference I can find to it was on a German forum and you provided an answer . I can't see any other reason for SwapBlock other than to wrap cmpexch The only people that can answer that question is NI, really. For my purposes, I don't really need the compare aspect, but an in-place atomic exchange would be useful for the indexes.
  20. Registering etc isn't a problem. The problem is the access to the indexes and the buffer without locking (mutexes etc) which is where the pattern derives its performance gains. My first choice was a class since the buffer and/or the reader/writer indexes can be held in each instance. However, the locking overhead around the private cluster coupled with atomicity of the private data clster means the performance degrades significantly (we are talking ms rather than us). You need a way of having indexes and buffer controls being able to be accessed independently (e.g. you cannot have them all in one cluster) and break access restrictions so that a reader can simultaneously read data when a writer is writing without waiting (which you cannot do if you wrap anything in a non-reentrant VI). You end up in a kind of no-mans land where you need a global storage (i.e. you need something like a DVR or LV2 global) with no locking mechanisms (requires reentrant like access behaviour). The closest native labview object that fulfills that is the global variable. If someone else has an alternative. I'm all ears.
  21. It's just too slow to serialise (~10x slower).
  22. PXI comes in windows or RT flavours. RT (aka Pharlap ETS) is a cut-down windows kernel so for your purpose it makes no difference. VxWorks is for Power PC platforms so you will only see that in [some] CRIOs or Fieldpoint units. I wouldn't worry about it. I was just pointing out that byte alignment padding is not the same across platforms and assuming clusters are byte aligned can yield unexpected results on them.
  23. Only on windows. Mac and linux is 4-byte boundaries and VXworks is 8 byte boundaries.
  24. Nope. The first release was using a global variable and an array index. That was after looking at classes, DVRs and LV2 globals, Most work OK until you get to the writer needing to know the index of all the readers Do you have something specific in mind?
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.